Linear Regression Example
=========================================================
This example uses the only the first feature of the `diabetes` dataset, in
order to illustrate a two-dimensional plot of this regression technique. The
straight line can be seen in the plot, showing how linear regression attempts
to draw a straight line that will best minimize the residual sum of squares
between the observed responses in the dataset, and the responses predicted by
the linear approximation.
The coefficients, the residual sum of squares and the coefficient
of determination are also calculated.

Website: https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html#sphx-glr-auto-examples-linear-model-plot-ols-py

Github: https://github.com/scikit-learn/scikit-learn/blob/master/examples/linear_model/plot_ols.py

Dataset: https://scikit-learn.org/stable/datasets/index.html#diabetes-dataset

In [None]:
# Code source: Jaques Grobler
# License: BSD 3 clause

In [None]:
# display plots in Notebook cell
%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score

In [None]:
# Load the diabetes dataset
diabetes = datasets.load_diabetes()
type(diabetes)
#print('data:', diabetes.data)
print('DESCR:', diabetes.DESCR)
print('feature_names:', diabetes.feature_names)

In [None]:
# Use only one feature
diabetes_X = diabetes.data[:, np.newaxis, 2]  # BMI
#len(diabetes_X)
type(diabetes_X)

In [None]:
# target
diabetes_y = diabetes.data[:, np.newaxis, 3]  # BP

In [None]:
# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]

In [None]:
# Split the targets into training/testing sets
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]
#diabetes_y_train[0:10]

In [None]:
from sklearn.model_selection import train_test_split

#split in training and test data
diabetes_X_train, diabetes_X_test, diabetes_y_train, diabetes_y_test = train_test_split(diabetes_X, diabetes_y, random_state=42)

In [None]:
# Create linear regression object
regr = linear_model.LinearRegression()

In [None]:
# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)

In [None]:
# Make predictions using the testing set
diabetes_y_pred = regr.predict(diabetes_X_test)

In [None]:
# The coefficients
print('Coefficients: \n', regr.coef_)
# The mean squared error
print("Mean squared error: %.2f"
      % mean_squared_error(diabetes_y_test, diabetes_y_pred))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % r2_score(diabetes_y_test, diabetes_y_pred))


In [None]:
# Plot outputs
plt.scatter(diabetes_X_test, diabetes_y_test,  color='black')
plt.plot(diabetes_X_test, diabetes_y_pred, color='blue', linewidth=3)

plt.xticks(())
plt.yticks(())

plt.show()

In [None]:
# plot the dataset with dots of size 10
plt.scatter(diabetes_X, diabetes_y, s=10)

plt.title("Diabetes sample")
plt.xlabel("feature diabetes_X (BMI)")
plt.ylabel("target diabetes_y (BP)")
plt.xticks()
plt.autoscale(tight=True)
# draw a slightly opaque, dashed grid
plt.grid(True, linestyle='-', color='0.75')

plt.plot(diabetes_X_test, diabetes_y_pred, color='RED', linewidth=3)

plt.show()