<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#A-Machine-Learning-Perspective-on-Regression" data-toc-modified-id="A-Machine-Learning-Perspective-on-Regression-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>A Machine-Learning Perspective on Regression</a></span></li><li><span><a href="#Parameter-Optimization" data-toc-modified-id="Parameter-Optimization-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Parameter Optimization</a></span><ul class="toc-item"><li><span><a href="#Assessing-Performance" data-toc-modified-id="Assessing-Performance-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Assessing Performance</a></span></li></ul></li><li><span><a href="#Complexity-Optimization-and-Cross-Validation" data-toc-modified-id="Complexity-Optimization-and-Cross-Validation-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Complexity Optimization and Cross-Validation</a></span><ul class="toc-item"><li><span><a href="#Model-Testing-and-Validation" data-toc-modified-id="Model-Testing-and-Validation-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Model Testing and Validation</a></span></li></ul></li><li><span><a href="#Feature-Selection" data-toc-modified-id="Feature-Selection-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Feature Selection</a></span></li></ul></div>

## A Machine-Learning Perspective on Regression

The goal of regression is to find a function

$\vec{y} = f(\vec{x}) + \vec{\epsilon}$

where $f$ is the model, $x$ is the model input, $y$ is the model output, and $\epsilon$ is the error between the model in the data. The model inputs, $\vec{x}$ are often called the **features** of a data point. In the previous example we created features using transformations of $x$ like polynomials and Gaussian functions. Sometimes, features may also be given in the dataset (e.g. multiple inputs correspond to a single output). Other times, the model input may be data that does not have obvious vector-based features (e.g. images, audio, molecules, etc.). In this case, we can think of the features as "fingerprints" of some more complex raw input data.

Of course representing the model as $f$ is a gross oversimplification. The function must have some form, and it usually requires **parameters**. Previously we considered general linear regression models of the form:

$y_i = \sum_j \beta_j X_{ij} + \epsilon_i$

where the **parameters** are given by $\vec{\beta}$. We also considered non-linear regression with Gaussian functions, which required more parameters, $\vec{\beta}$, $\vec{\mu}$, and $\vec{\sigma}$. We saw that in order to optimize these parameters we had to put them into a single vector. We could consider this to be a parameter vector, $\vec{W} = [\vec{\beta}, \vec{\mu}, \vec{\sigma}], and re-write the model more generally as:

$\vec{y} = f(\vec{x}, \vec{W}) + \vec{\epsilon}$

We also had to decide on how many parameters to include. In the case of polynomial regression this corresponded to the order of the highest polynomial, while for Gaussian regression it corresponded to the number of Gaussian functions to include. This number of parameters to include is called a **hyperparameter**. Hyperparameters control the complexity of the final model, and the parameters will depend on the hyperparameters, so we can think of the parameters as being a function of the hyperparameters, $\vec{W}(\vec{\eta}}$. If we put all this together we get a model form of:

* **parameters**, $\vec{W}$, that define its behavior (e.g. slope, intercept, mean, etc.)
* **hyperparameters**, $\vec{\eta}$, that define the structure of the model (e.g. the number of polynomial terms)

$\vec{y} = f(\vec{x}, \vec{W}(\vec{\eta}) + \vec{\epsilon}$

Machine learning differs from regular regression in that it seeks to optimize $\vec{W}$ (parameter optimization), $\vec{\eta}$ (complexity optimization) in order to **obtain a model that generalizes to new input data**. Machine learning also sometimes involves selecting $\vec{x}$ (feature selection) or generating $\vec{x}$ from non-vectorized data such as text or images (feature generation).

## Parameter Optimization

### Assessing Performance

## Complexity Optimization and Cross-Validation

### Model Testing and Validation

## Feature Selection