# Advice and Good Practice

Predictive modeling is a term that encompasses machine learning, artificial intelligence, etc., that focuses on **predictions**. While the primary interest of predictive modeling is to generate accurate predictions, a second concern is **interpretability**.

Note the importance of **expert judgments**:
   - "In the end, predictive modeling is not a substitute for intuition, but a compliment."
   - "Traditional experts make better decisions when they are provided the results of statistical predictions."

Things to watch out in preparation of **data**
- **Leakage**: Information about labels sneaks into features
- **Sample bias**: Test inputs and deployment inputs have different distributions
- **Nonstationary**: When the thing you are modeling changes over time
    - **Covaraiate Shift**: input distribution changes over time
    - **Concept Shift**: correct output for given input changes over time
    - **Domain Shift**: the discrepancy between training and inference data.

Look at [data preprocessing](data_preprocessing.ipynb) for more detailed suggestions.

It is important to **understand the predictors** (also see the point next)
 - Predictor sets may contain **numerically redundant information**.
 - Predictor sets may contain **missing values**.
 - Predictors may be **sparse**
 - Predictors may not be relevant to responses - **feature selection** is the process of determining the minimum set of relevant predictors needed.
 - One needs to be mindful of the relation between the number of samples vs. the number of predictors. Regularization may come in handy in these situations.

It is generally a good practice to **visualize the data**: see `scatterplot` from `pandas.plotting`, and there is also a `scatter` option to choose from in `DataFrame.plot`.

**Before ML**
  - If the predictors are of low-dimension, a scatterplot probably suffices. 
  - If there are multiple predictors, plots that help understand the cross-relationship between predictors are needed
  
**After ML**
  - Check both in-sample and out-of-sample model fit vs. ground-truth, again you can use the scatter plot. Better yet to identify cases where model does a lousy job.

For any given problem, if possible, **evaluate several models**.
 - Before building a fancy ML model, invest some time building a linear (baseline) model
     - Build a regularized linear model may help with understanding what/which predictors help.
     - If the fancier model cannot beat linear, perhaps something is wrong with the fancy model (e.g. hyperparameter), or you simply don't have enough data to beat the simpler model.
     - Needless to say, always prefer simpler model if performance is comparable, since the simpler one is usually cheaper to train and easier to deploy.
     - By comparing the fancy model to the baseline model, sometimes it may help to shed light on what feature/model quality is key in the success of the fancy model. This is called ablative analysis in [Andrew Ng's slides](http://cs229.stanford.edu/materials/ML-advice.pdf)
 - Often it is also helpful to get an upper bound on achievable performance, e.g. a model that achieves the Bayes classifier in classification. 
     - This is to get an idea about the best performance you will get.
     - For a given hypothesis space, or a class of model, one can try overfit as much as possible without any regularization, possibly even with a randomized sub-sample in the training set. If overfitting cannot be achieved, you probably need to think about a different model.
 - Model assessment for supervised machine learning problems are usually addressed using [**cross-validation**](cross_validation.ipynb), which is an estimate of prediction error.
 - Traditional statistical learning, which emphasizes interpretability, often use [**information criterions and all kinds of metrics**](evaluation_metrics_and_information_criterions.ipynb), more than often calculated by resampling.
 - Visualization can be useful in gauging model performance (see the point above about scatterplot between ground-truth and fitted).

Avoid **premature statistical optimization**
- Very often, it is not clear what parts of a system are easy or difficult to build, and which parts you need to spend lots of time focusing on.
- The only way to find out what needs work is to implement something quickly, and find out what parts break.

(Similar principle in software engineering.)

## Reference

- Applied Predictive Modeling, Chapters 1-2
- MLEDU: Lectures 1, 12, 14.
- < Hands-on Machine Learning >, Chapter 2
- Andrew Ng's slides: [Advice for Applying Machine Learning](http://cs229.stanford.edu/materials/ML-advice.pdf)