## Generalised linear models

In this notebook, we're going to examine generalised linear models with polynomial coefficients using a toy dataset. We'll look at examples of overfitting and underfitting, and how these can be combatted using **regularisation**.

In [180]:
import numpy as np

## Generate toy data
We're using 30 samples from a 1D cosine function with Gaussian noise added.

What we'd like is for our regression model to learn the generating function from these 30 samples, without overfitting to the noise.

In [181]:
def generating_function(x):
    return np.cos(1.5 * np.pi * x)

num_samples = 30

X = np.sort(np.random.rand(num_samples))
y = generating_function(X) + np.random.randn(num_samples) * 0.1

** Exercise: Print a sample of X and y so you can see what the inputs and corresponding outputs look like.**

** Exercise: Plot a graph of the generating function with X as the domain. Plot the (X, y) samples on the same graph.**

## Building linear models

**Exercise: Fit a linear regression model using a polynomial of degree 2. Hint: Check out the sklearn documentation on PolynomialFeatures and LinearRegression.**

**Exercise: With the model you've built, use the code below to plot a graph showing how the model looks compared to the actual generating function (over an artificial test set).**

```
X_test = np.linspace(0, 1, 100)
plt.plot(X_test, model.predict(X_test[:, np.newaxis]), label="Model")
plt.plot(X_test, generating_function(X_test), label="Generating function")
plt.scatter(X, y, label="Samples")
plt.xlim((0, 1))
plt.ylim((-2, 2))
plt.legend(loc="best")
plt.show()
```

**Exercise: Print the coefficients from your model. What do these coefficients mean?**

**Exercise: Run the above code again but now with the degree set to 4, and then 15. What do you notice about the graph and coefficients?**

**Exercise: Let's see if we can fix the problem with the degree-15 model without decreasing the degree itself. Build Lasso, Ridge and ElasticNet models with the degree still set to 15. Adjust the parameters of these models until you can produce at least one graph that looks good.**

**Exercise: Imagine that you've built the perfect model and now you want to hand it over to an engineer so that they can use it to make predictions. Is it possible to save this model so that it can be reused later without needing sklearn?**