## THEx Model

The following section illustrates how to call and run a model in THEx infrastructure. There are three models: the binary classifiers (BinaryModel), the One-Vs-All classifier (that aggregates the binary results, the OvAModel), and the KDE multiclass classifier which creates a unique KDE for each class and normalizes over those likelihoods (MultiModel). 

These are the following parameters the models handle:
- __cols__ [default=None] : List of column/feature names to use ; the default is all numeric columns
- __col_matches__ [default=None]: An alternative to passing in column names. Here a list of strings may be passed on, and any column containing one of these strings will be used. If both cols and col_matches are set, only col_matches is used 
- __num_runs__ [default=None]: The number of trials to run and average results over. For each trial, 80% of data will be randomly selected for training, and 20% for testing. 
- __folds__ [default=None] : The number of folds to run over, in k-fold cross-validation. If both num_runs and folds are passed in, num_runs will be used.
- __transform_features__ [default=True]: Derives colors from adjacent magnitudes, using dictionary ORDERED_MAGS in thex_data/data_consts.py
- __min_class_size__ [default=9]: Each class must contain at least this number of samples for it to be used. 
- __max_class_size__ [default=None]: Classes with more than this number of samples will be randomly sampled down to this number
- __pca__ [default=None]: Number of components to reduce down to using PCA, by default there is no PCA
- __class_labels__ [default=None]: List of classes to limit analysis to. List of all classes is in thex_data.data_consts, ORDERED_CLASSES
- __data__ [default=None]: Optional parameter for testing particular sets of data. By default, we collect the data from the file in thex_data.data_consts DATA_PATH file, but this parameter may be used to pass in particular datasets. It must be a list of the training and testing Pandas DataFrames: [train_df, test_df] 
- __nb__ [default=False]: Boolean on applying Naive Bayes. If True, a unique KDE is created for each dimension. If False, we use multivariate KDE. 
- __priors__ [default=False]: Boolean on using frequency-based priors. Calculated for each class as proportion of dataset.
- __lsst_test__ [default=None]: Groups Ib, Ic, Ib/c, and their subclasses into a single class, Ibc. Overwrites class_labels with custom ones for LSST testing.
- __data_file__ [default=DATA_PATH]: .FITS file to use for data, defaults to DATA_PATH which is set in thex_data.data_consts file
- __linear_calib__ [default=False]: Calibrates probabilities by fitting a line to the empirical probabilites of the training.
- __lsst_test__ [default=False]: Groups Ib, Ic, Ib/c into Ibc. If no class labels are given, uses Ia, II, 91bg, TDE, and Ibc as labels.
- __Zmodel__ [default=False]: Use only redshift as a feature.
- __balanced_purity__ [default=False]: Report performance as balanced purity instead of standard purity. (Balanced Purity is purity if all class sizes were equal)
- __case_code__ [default=False]: Use only those samples with these case codes. Use ["A1", "F1", "B1", "G1"].

In [None]:
%matplotlib inline  
from models.binary_model.binary_model import BinaryModel
from models.ind_model.ind_model import OvAModel
from models.multi_model.multi_model import MultiModel
mags = ["g_mag",  "r_mag", "i_mag", "z_mag", "y_mag",
        "W1_mag", "W2_mag",
        "J_mag", "K_mag", "H_mag"]

In [None]:
model1 = MultiModel(
       cols = mags, 
       folds = 6,     
        min_class_size = 3,
        max_class_size = 4800,
       transform_features = True,
       case_code = ["A1", "F1", "B1", "G1"],
#        balanced_purity = True,
       priors=True,
       lsst_test= True) 

# model1.run_model()