# Lab 03 - Model Fitting

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn import datasets
%matplotlib inline
sns.set_style("darkgrid")

import sys
sys.path.append('../')
from lib.processing_functions import convert_to_pandas

## Exercise goals:

- get comfortable with the model fitting flow
- train several common machine learning algorithms 

---
## Exercise 1: Regression

We will fit a regression estimator to Boston dataset, and have a look at the results.

In [None]:
# load the Boston dataset
X, y = convert_to_pandas(datasets.load_boston())
X.head()

### 1.1 Fit simple linear model

Fit a simple `LinearRegression` estimator with model hyperparameter `normalize` set to `True` to the data:

```python
# TODO: Replace <FILL IN> with appropriate code
# 1) import estimator class
<FILL IN> 

#2) initialize estimator model with specific parameters
reg = <FILL IN>

#3) fit model on the data
reg.<FILL IN>

#4) predict output for input data
y_pred = reg.<FILL IN>

print(reg)
```

In [None]:
%load ../answers/03_01_fit_linear.py

### 1.2 Prediction result

Use the `truth_vs_prediction` function below to plot `y` versus `y_pred`:

In [None]:
def truth_vs_prediction(y, y_pred):
    """plot truth version predictions"""
    fig, ax = plt.subplots()
    fig.set_size_inches(8.0, 8.0)
    y_max = y.max()
    ax.plot(y, y_pred, 'o', markersize=3)
    ax.plot([0, y_max],[0, y_max],':k')
    ax.set_xlabel('truth')
    ax.set_ylabel('prediction')
    ax.set_xlim(0, y_max)
    ax.set_ylim(0, y_max)
    ax.set_aspect('equal')
    ax.set_title("truth versus prediction")

# plot truth versus prediction
truth_vs_prediction(y, y_pred)

**Question**: Does the truth versus prediction plot show systematic errors in the predictions? If so, what systematic errors do you see?

Apply the estimator's `.score()` method to the `X` and `y` data: 

```python
# TODO: Replace <FILL IN> with appropriate code
# compute estimator's default score for X and y data
score = reg.<FILL IN>
```

In [None]:
%load ../answers/03_02_score_linear.py

In [None]:
print('prediction score: {}'.format(score))

**Question**: Which scoring measure is used by the `score` method of this estimator (hint: `help(reg.score)`)?

### 1.3 Estimated model parameters

Extract and visualize the estimated model coeffients:

```python
# TODO: Replace <FILL IN> with appropriate code
# extract estimated coeficients
coef = reg.<FILL IN>
```

In [None]:
%load ../answers/03_03_coefficient.py

In [None]:
# plot the estimated coefficients
feature_coef = pd.DataFrame(list(zip(X.columns, coef)), columns =['feature', 'coefficient'])
feature_coef.set_index('feature').plot(kind='bar', figsize=(10.0, 5.5))

**Question**: What could be the reason that the estimated NOX coefficient is that large in size? Is it due to its awesome predictive power, or could something else be at play here? (hint: look at the descriptive statistics we computed in Lab 1)

Didn't we also fit an intercept? Print its value below:

```python
# TODO: Replace <FILL IN> with appropriate code
# extract the intercept
intercept = reg.<FILL IN> 
```

In [None]:
%load ../answers/03_04_intercept.py

In [None]:
print("estimated intercept: {}".format(intercept))

**Question**: How should we initialize the `LinearRegression` estimator if we do not want to fit the intercept? 

---
## Exercise 2: Classification 

Let's do some classification on the digits dataset.

In [None]:
# load the digits dataset
X, y = convert_to_pandas(datasets.load_digits())

### 2.1 Fit linear SVM model

Fit a the classifier `svm.LinearSVC` with `'l2'` `penalty` to the data:

```python
# TODO: Replace <FILL IN> with appropriate code
# 1) import estimator class
<FILL IN> 

#2) initialize estimator model with specific parameters
clf = <FILL IN> 

#3) fit model on the data
clf.<FILL IN> 

#4) predict output for input data
y_pred = clf.<FILL IN> 

print(clf)
```

In [None]:
%load ../answers/03_05_fit_svm.py

### 2.1 Model score

Apply the estimator's `score` method to the `X` and `y` data: 

```python
# TODO: Replace <FILL IN> with appropriate code
# compute score for X and y data
score = clf.<FILL IN> 
```

In [None]:
%load ../answers/03_06_score_svm.py

In [None]:
print('prediction score: {}'.format(score))

**Question**: Which scoring metric is used by the `score` method of this estimator? Why do you think our score is this good?

### 2.3 Estimated model parameters

Let's have a look at the fitted model coefficients:

```python
# TODO: Replace <FILL IN> with appropriate code
# extract estimated coefficients
coef = clf.<FILL IN> 
```

In [None]:
%load ../answers/03_07_coefficient.py

In [None]:
coef.shape

Notice how the shape of the estimated `coef_` matrix is (10, 64); our classifier fitted ten models to solve this classification problem. 

The digits dataset has ten classes, this means here we are solving a [multiclass classification](https://en.wikipedia.org/wiki/Multiclass_classification classication) problem. The estimator deals with this problem by first converting it into a set of ten binary classification problems, using a so-called one-versus-rest strategy. It then fits one model for each binary problem. We will discuss this approach in more detail later on in this course.

Run the cell below to show the absolute mean estimated coefficients for the fitted models. Notice that resulting plot shows the absolute feature coefficient size mapped to their place in the digit image:

In [None]:
mean_abs_coefs = np.abs(clf.coef_).mean(axis=0)
plt.matshow(mean_abs_coefs.reshape(8, 8))

**Question**: The figure above shows that the estimated coefficents for the features at left and right border of the image are almost all zero or small. How does this relate to the result from Lab 1?

---
## Exercise 3: Dimensionality reduction

Let's apply the unsupervised method of PCA to the digits datasets to reduce its dimensionality. 

In [None]:
# load the digits dataset
X, y = convert_to_pandas(datasets.load_digits())

### 3.1 Fit PCA model

Fit a `PCA` estimator with parameter `n_components` set to 2 to the data:

```python
# TODO: Replace <FILL IN> with appropriate code
# 1) import estimator class
<FILL IN>

#2) initialize estimator model with specific parameters
dec = <FILL IN>

#3) fit model on the data
dec.<FILL IN>

#4) transform the input data
X_trans = dec.<FILL IN>

print(dec)
```

In [None]:
%load ../answers/03_08_fit_pca.py

### 3.2 Transformation result

Print the shape of the tranformed feature data:

In [None]:
new_shape = X_trans.shape
print("shape after transform: {}".format(new_shape))

**Question**: What would be the shape of `X_trans` if we initialize the PCA with `n_components=3`?

Run the cell below to visualize this transformed feature data: 

In [None]:
fig, ax = plt.subplots(figsize=(8, 6))
fig.set_size_inches(11.5, 8.0)
cax = ax.scatter(X_trans[:, 0], X_trans[:, 1], c=y, edgecolor='none', 
                 alpha=0.5, cmap=plt.cm.get_cmap('nipy_spectral', 10))
cbar = fig.colorbar(cax, label='digit label', ticks=range(10))
cbar.set_clim(-0.5, 9.5)
fig.suptitle('transformed digit data', fontsize=16)

**Question**: Which digits seem to the easiest to keep apart, and which ones are harder to separate?

In [None]:
%load ../answers/03_questions.py