# Model Evaluation

We've discussed two models: Linear Regression and Logistic Regression. Both of these models have a long history in statistics and when talking about them we emphasized mostly the *explanatory* use of those models: interpreting the *importance* of coefficients and trying to determine if the values were *credible* based on the evidence (data). We talked a little bit about the *predictive* function of those models when we looked at linear regression and talked about the $\sigma$ and $R^2$ of the regressions. But the point of view was mostly the statistician's point of view.

In this chapter, we're going to focus more broadly on model evaluation and improvement. This encompasses two general and important questions in model development: how good is my model and can I make it better? Note that when we talked about the *explanatory* function of linear regression, we never asked ourself, "could getting more data make the model better?". Now we're going to start to address these kinds of questions.

We'll start with model evaluation from the machine learning point of view which focuses on cross-validation. Cross-validation is a form of *backtesting* that allows you to get a general idea of the performance of your model on new data. It focuses on prediction as the main goal of a model (although it need not exclusively be so).

When we're done with cross-validation, we may have a model with an estimated 9% error rate. We can then ask ourselves, can we make it better? The next section will focus on this question. If we have a model with a certain performance, what should we focus on? The answer is not always, "get more data".

Finally, we'll start to move into a discussion of live testing or A/B testing and the problems that arise with that process both in terms of experimental design and engineering infrastructure.

Obviously, some of these topics bleed back and forth into each other and the process is very iterative. For example, in linear regression we talked about adding key variables, evaluating the model, adding interaction terms, evaluating the model, and so on. Cross validation can certainly fit into this process as it is closely related to bootstrapping.

* [Two Cultures](cultures.ipynb)
* [Loss and Evaluation Metrics](metrics.ipynb)
* [Cross Validation](cross.ipynb)
* [Bias/Variance Tradeoff](bias.ipynb)
* [Model Improvement](improvement.ipynb)
* [A/B Testing](testing.ipynb)
* [Conclusion](conclusion.ipynb)