# Lecture 19 LIVE

## Topics for Today
- Definition of Hyperparameters
- Understanding Code for Plotting Synthetic Data
- Taylor Series

## Hyperparameters Recap

A **hyperparameter** is a parameter that governs a class of machine learning models, which is not affected by the training or fitting part of the pipeline.

An example of a hyperparamter is the degree of the polynomial that is used in regression.

### Conceptual Question:

**Q1: Why don't we let the computer automatically optimize the degree of the polynomial when trying to find a model that best fits the data?**

*Type your answer here*

### Coding Question

Let's try to understand this code from lecture

```python
def make_data(N, err=1.0, rseed=1):
    # randomly sample the data
    rng = np.random.RandomState(rseed)
    X = rng.rand(N, 1) ** 2
    y = 10 - 1. / (X.ravel() + 0.1)
    if err > 0:
        y += err * rng.randn(N)
    return X, y
```
**Q2: What does the code** `rng.rand(N,1)**2` **do?**

**Q3: What does the code `X.ravel()` do? Why does it make sense to define `X` as it is?**

*Type your answer here.*

### Math/LaTeX Question

**Q4, Part A: In the cell below, write the equation that would describe `y` above if `err=0` was given.**

Here I've given some $\LaTeX$ code for you to use

```latex
$$ y = 5 + \frac{4x^2}{x - 4}$$
```
which renders to $$ y = 5 + \frac{4x^2}{x - 4}$$ in a Markdown cell.

$$y = 10 - \frac{1}{x^2 + 0.1}$$

**Q4, Part B: Does this function have a finite Taylor series representation?**

*Type your answer here.*

### Coding Exploration

**Q5, Part A: Modify the `make_data` function to generate data with x values that range from 0 to 5.**

**Q5, Part B: Using your new `make_data` function. Generate 50 samples and plot polynomial regressions with degrees 4,5,6, and 7.**


```python
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

#**kwargs is like *args in that they allow a variable number of arguments to be passed
def PolynomialRegression(degree=2, **kwargs):
    return make_pipeline(PolynomialFeatures(degree),
                         LinearRegression(**kwargs))

## Creating Data

import numpy as np

def make_data(N, err=1.0, rseed=1):
    # randomly sample the data
    rng = np.random.RandomState(rseed)
    X = rng.rand(N, 1) ** 2
    y = 10 - 1. / (X.ravel() + 0.1)
    if err > 0:
        y += err * rng.randn(N)
    return X, y

X, y = make_data(40)

## Plotting Data and the Fits

import matplotlib.pyplot as plt
import seaborn; seaborn.set()  # plot formatting

X_test = np.linspace(-0.1, 1.1, 500)[:, None]

plt.scatter(X.ravel(), y, color='black')
axis = plt.axis()
for degree in [1, 3, 5]:
    y_test = PolynomialRegression(degree).fit(X, y).predict(X_test)
    plt.plot(X_test.ravel(), y_test, label='degree={0}'.format(degree))
plt.xlim(-0.1, 1.0)
plt.ylim(-2, 12)
plt.legend(loc='best');
```

## Model Validation

**Q6: Run the following code with your new data where x ranges from 0 to 5.**

**WHAT IS THE BEST CHOICE OF DEGREE OF POLYNOMIAL?**

```python
from sklearn.model_selection import validation_curve
degree = np.arange(0, 21)
train_score, val_score = validation_curve(PolynomialRegression(), X, y,
                                          'polynomialfeatures__degree', degree, cv=7)

plt.plot(degree, np.median(train_score, 1), color='blue', label='training score')
plt.plot(degree, np.median(val_score, 1), color='red', label='validation score')
plt.legend(loc='best')
plt.ylim(0, 1)
plt.xlabel('degree')
plt.ylabel('score');
```