# Regression

The family of algorithms where the targets are continous values (rather than labels/categories)

**Index**
* [Linear Models](#Linear-models)
  * [Ordinary least squares](#Ordinary-least-squares)
  * [Ridge regression](#Ridge-regression)

In [None]:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score

## Linear Models

As seen on [classification](./classification.ipynb/#Logistic-Regression), regression's linear models basically assumes that any target in the data can be approximated by a linear combination of its features.

### Ordinary least squares
Ordinary least squares seeks to minimize the residual sum of squares between the observed targets in the dataset and the targets predicted by the linear approximation.

**!important**
Relies on the independence of the features, that is, they are no linearly dependent (no correlation).

In [None]:
# Load the diabetes dataset
d0 = datasets.load_diabetes()
# print(d0.DESCR)  # Print the description of the dataset
X, y = d0.data, d0.target

# Use only one feature, no#2: BMI
X = X[:, np.newaxis, 2]

# Train/test split
X_train, X_test = X[:-20], X[-20:]
y_train, y_test = y[:-20], y[-20:]

# Train the model
regr = linear_model.LinearRegression()
regr.fit(X_train, y_train)

# Make some predictions
y_hat = regr.predict(X_test)

# Print some outcomes
print('Coefficient:                  {}'.format(regr.coef_))
print('Mean squared error:           {}'.format(
    mean_squared_error(y_test, y_hat).round()))
print('Coefficient of determination: {}'.format(
    r2_score(y_test, y_hat).round(2)))  # 1.0 is best

# Plot the results
sns.set()
cmap = sns.cubehelix_palette(rot=-.2, as_cmap=True)

_, ax = plt.subplots(figsize=(10, 6))
sns.scatterplot(x=X_test.ravel(), y=y_test, hue=y_test, ax=ax,)
ax.plot(X_test, y_hat)

plt.show()

### Ridge regression

Ridge regression, adds a regularization parameter to avoid above collinearity issues.

[Exmaple from scikit learn](#https://scikit-learn.org/stable/auto_examples/linear_model/plot_ridge_coeffs.html#sphx-glr-auto-examples-linear-model-plot-ridge-coeffs-py)

[[Index]](#Regression)

In [None]:
X, y, w = datasets.make_regression(n_samples=10, n_features=10, coef=True, random_state=1, bias=3.5, noise=1)

coefs, errors = list(), list()

alphas = np.logspace(-6, 6, 200)
clf = linear_model.Ridge()

# Train the model with all above alphas
for a in alphas:
    clf.set_params(alpha=a)
    clf.fit(X, y)
    coefs.append(clf.coef_)
    errors.append(mean_squared_error(clf.coef_, w))
    
_, ax = plt.subplots(1, 2, figsize=(15, 6))

ax[0].plot(alphas, coefs)
ax[1].plot(alphas, errors)

ax[0].set_xscale('log')
ax[1].set_xscale('log')

plt.show()