# Machine Learning with Python

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

## 2.2 Regression

In this notebook we will discuss some examples of supervised learning algorithms applied to regression. 


### Example data

We will use the *autoMpg* dataset as an example:


In [None]:
from sklearn.datasets import fetch_openml
mpg = fetch_openml(name='autoMpg', version=1, parser='auto')

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(mpg.data, mpg.target, random_state=0)


Remember that we have some preprocessing to do on this dataset. 

Because we need to apply the same transformation to both the training and testing data, it will be convenient to wrap these steps using [`ColumnTransformer`](https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html#columntransformer) and [`Pipeline`](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline).

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.impute import IterativeImputer
from sklearn.preprocessing import StandardScaler

# Defines preprocessing transformations for specified columns
ct = ColumnTransformer([('encode', OneHotEncoder(), ['origin']),
                        ('impute', IterativeImputer(), ['horsepower'])],
                       remainder='passthrough') 

# Defines individual steps in a workflow
pipe = Pipeline([('preprocessing', ct),
                 ('scaling', StandardScaler())])



In [None]:
# Now we can fit the whole pipeline in one step (using training data only)...
pipe.fit(X_train)

# ... and use it transform the features (both the training and the testing sets)
X_train_ = pipe.transform(X_train)
X_test_ = pipe.transform(X_test)

Now we are ready to do some regression.

### Linear Model

Linear regression uses Ordinary Least Squares to find the optimal model.


In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Use only one feature
feature_number = 3
X_train_1 = X_train_[:, np.newaxis, feature_number]
X_test_1 = X_test_[:, np.newaxis, feature_number]

lm_1 = LinearRegression()
lm_1.fit(X_train_1, y_train)

In [None]:
y_pred = lm_1.predict(X_train_1)

print("Coefficients: \n", lm_1.coef_)

plt.scatter(X_train_1, y_train, c='m' )
plt.plot(X_train_1, y_pred, c="k")
plt.title("linear model + training data")
plt.xlabel(pipe.get_feature_names_out()[feature_number])
plt.ylabel("mpg")
plt.show()

We evaluate on the test data:

In [None]:
y_pred = lm_1.predict(X_test_1)

# The mean squared error
print("Mean squared error, MSE = %.2f" % mean_squared_error(y_test, y_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination, r2 = %.2f" % r2_score(y_test, y_pred))

plt.scatter(X_test_1, y_test, c='c')
plt.plot(X_test_1, y_pred, c='k')
plt.title("linear model + test data")
plt.xlabel(pipe.get_feature_names_out()[feature_number])
plt.ylabel("mpg")
plt.show()

We can improve the model performance by including all the features: 

In [None]:
lm = LinearRegression()
lm.fit(X_train_, y_train)

In [None]:
print("Coefficients: \n", lm.coef_)

In [None]:
y_pred = lm.predict(X_test_)

# The mean squared error
print("Mean squared error, MSE = %.2f" % mean_squared_error(y_test, y_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination, r2 = %.3f" % r2_score(y_test, y_pred))


### Regularisation

OLS places no restrictions on the values of the coefficients. 

However, to avoid overfitting we may prefer to find a model that still fits the training data well, but has coefficients that are (mostly) small. This idea is called *regularisation*.


In [*ridge regression*](https://scikit-learn.org/stable/modules/linear_model.html#ridge-regression-and-classification), we add an additional L2 penalty on top of the squared error.

In [None]:
from sklearn.linear_model import Ridge
ridge = Ridge(alpha=0.2)
ridge.fit(X_train_,y_train)

In [None]:
print("Coefficients: \n", ridge.coef_)

In [None]:
y_pred = ridge.predict(X_test_)

# The mean squared error
print("Mean squared error, MSE = %.2f" % mean_squared_error(y_test, y_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination, r2 = %.3f" % r2_score(y_test, y_pred))



In [*LASSO regression*](https://scikit-learn.org/stable/modules/linear_model.html#lasso), we use an L1 penalty instead. This tends to produce coefficients that are exactly 0, hence they can be removed from the model. LASSO can therefore be used as an automated form of feature selection.

In [None]:
from sklearn.linear_model import Lasso
lasso = Lasso(alpha=0.5)
lasso.fit(X_train_,y_train)

In [None]:
print("Coefficients: \n", lasso.coef_)

In [None]:
y_pred = lasso.predict(X_test_)

# The mean squared error
print("Mean squared error, MSE = %.2f" % mean_squared_error(y_test, y_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination, r2 = %.3f" % r2_score(y_test, y_pred))


### Nonlinear regression

Note that many supervised learning algorithms can be used for both classification and regression with only minor adaptations:

### k-Nearest Neighbours

In [None]:
from sklearn.neighbors import KNeighborsRegressor
knn = KNeighborsRegressor()
knn.fit(X_train_,y_train)

In [None]:
y_pred = knn.predict(X_test_)

# The mean squared error
print("Mean squared error, MSE = %.2f" % mean_squared_error(y_test, y_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination, r2 = %.3f" % r2_score(y_test, y_pred))


### Decision tree

In [None]:
from sklearn.tree import DecisionTreeRegressor
tree = DecisionTreeRegressor(random_state=0)
tree.fit(X_train_,y_train)

In [None]:
y_pred = tree.predict(X_test_)

# The mean squared error
print("Mean squared error, MSE = %.2f" % mean_squared_error(y_test, y_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination, r2 = %.3f" % r2_score(y_test, y_pred))


In [None]:
from sklearn.tree import plot_tree
fig = plt.figure(figsize=(25,20))
plot_tree(tree, 
          impurity=False, 
          filled=True,
          max_depth=2, 
          feature_names=pipe.get_feature_names_out().tolist())

### Random forest

In [None]:
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators=100,random_state=0)
rf.fit(X_train_,y_train)

In [None]:
y_pred = rf.predict(X_test_)

# The mean squared error
print("Mean squared error, MSE = %.2f" % mean_squared_error(y_test, y_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination, r2 = %.3f" % r2_score(y_test, y_pred))



### Neural network

In [None]:
from sklearn.neural_network import MLPRegressor
nn = MLPRegressor(hidden_layer_sizes=(100),max_iter=10000)
nn.fit(X_train_,y_train)

In [None]:
y_pred = nn.predict(X_test_)

# The mean squared error
print("Mean squared error, MSE = %.2f" % mean_squared_error(y_test, y_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination, r2 = %.3f" % r2_score(y_test, y_pred))



### Support Vector Machine

In [None]:
from sklearn.svm import SVR
svr = SVR(kernel='rbf')
svr.fit(X_train_,y_train)

In [None]:
y_pred = svr.predict(X_test_)

# The mean squared error
print("Mean squared error, MSE = %.2f" % mean_squared_error(y_test, y_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination, r2 = %.3f" % r2_score(y_test, y_pred))



## Exercise

Train a regressor of your choice on the `wine_quality_white` dataset.

In [None]:
from sklearn.datasets import fetch_openml
w = fetch_openml(name='wine-quality-white',version=1)

Evaluate your model on the test data.

Does your model do better than Ordinary Least Squares?