## THEx Models
Welcome to THEx models. Models can be run through Python interpreters- like this Jupyter Notebook! This Notebook will walk you through how to run the models. First, lets import all the current models implemented.

In [None]:
from models.hmc_model.hmc_model import HMCModel
from models.nb_model.nb_model import NaiveBayesModel

Next, let's run one of the models. We just need 3 things to run the models:
1. Have our repo properly setup (please view the README for extensive directions)
2. Have an idea of what features we'd like to filter on.

If you are unfamiliar with THEx there is just 1 thing to keep in mind: our dataset is massively disparate. This means there is no single row of data that has values across every single feature. Actually, our dataset looks like this picture:

![title](figures/thexdataset.png)

No 1 row goes all the way across. So, we need to select some columns to filter on. Our models allow 2 ways of filtering on columns:

- **cols** : Specific column names, provided as a list of strings. For example: ["NED_SDSS_u", "NED_SDSS_g", "NED_SDSS_r"]

- **col_match** : String to match column names on, provided as a list of strings. For example: ["AllWISE", "GALEX"] will filter on all columns containing those strings, which turns out to be: AllWISE_W1mag, AllWISE_W2mag, AllWISE_W3mag, AllWISE_W4mag, AllWISE_Jmag, AllWISE_Hmag, AllWISE_Kmag, NED_GALEX_FUV, NED_GALEX_NUV



In [None]:
HMCModel(col_match = ["ALLWISE", "GALEX"])

And you're done! That's all you need to run our models. There are many optional parameters to be aware of though:
- **test_on_train**: (default = False) : Boolean flag that if True, will test on training data. This helps to evaluate how well the model captures patterns in the training data. 
- **folds** (default = 3) : Number of folds to use in k-fold Cross Validation. 
- **incl_redshift** (default = False) : Boolean flag that if True, will use redshift as a feature. 
- **top_classes** (default = 10) : Maximum number of classes to include; selected by popularity. For example: 10 most popular classes.
- **subsample** (default = 200) : Number to randomly subsample to for over-represented classes.
- **derive_diffs** (default = False) : Use differences between adjacent columns as features (For example: g - r).
- **one_all** (default = None) : List of classes to run model on; will place all other classes into 'Other' category (for example, pass in ["Ia", "II"] to have Ia, II, and Other as only classes).


Let's change some of these flags around in Naive Bayes to get an idea of their impact:

In [None]:
NaiveBayesModel(col_match = ["ALLWISE", "GALEX", "PS1"], 
                test_on_train = True,
                incl_redshift = True)