For any given problem, if possible, **evaluate several models** during ML. 
- Before building a fancy ML model, invest some time building a **baseline model** 
     - For instance, a regularized linear model may help with understanding what/which predictors help.
     - If the fancier model cannot beat linear, perhaps something is wrong with the fancy model (e.g. hyperparameter), or you simply don't have enough data to beat the simpler model.
     - Needless to say, always prefer simpler model if performance is comparable, since the simpler one is usually cheaper to train and easier to deploy.
     - By comparing the fancy model to the baseline model, sometimes it may help to shed light on what feature/model quality is key in the success of the fancy model. This is called **ablative analysis** in [Andrew Ng's slides](http://cs229.stanford.edu/materials/ML-advice.pdf)
     
- Model assessment for supervised machine learning problems are usually addressed using [**cross-validation**](cross-validation-and-backtesting.ipynb), which is an estimate of prediction error.
- Traditional statistical learning, which emphasizes interpretability, often use [**information criterions and all kinds of metrics**](evaluation-metrics-and-information-criterions.ipynb), more than often calculated by resampling.
- Visualization can be useful in gauging model performance (see the point above about scatterplot between ground-truth and fitted).
- It may be commendable to take deployment constraints into account when picking the models.
 

Get an **upper bound on achievable performance**, e.g. a model that achieves the Bayes classifier in classification, or human-level performance, or state-of-the-art or old system.
  - This is to get an idea about the best performance you will get, and useful in scoping.
  - For a given hypothesis space, or a class of model, one can **try overfit as much as possible without any regularization, possibly with a randomized sub-sample or even one sample in the training set**. If overfitting cannot be achieved, you probably need to think about a different model.
  - Note that in comparing to human-level performance, **make sure human has the same data as the machine learning model**. For instance, it is not 'human will be able to detect the defect when looking at the phone', rather it is 'human can detect the defect by looking at the same pictures inspected by the machine learning algorithm'.
 

Avoid **premature statistical optimization**
- Similar principle in software engineering.
- Very often, it is not clear what parts of a system are easy or difficult to build, and which parts you need to spend lots of time focusing on.
- The only way to find out **what needs work is to implement something quickly, and find out what parts break**; this echoes the best practice to collect just enough data to get started quickly.