
# Linear Regression Example

This example uses the only the first feature of the `diabetes` dataset, in
order to illustrate a two-dimensional plot of this regression technique. The
straight line can be seen in the plot, showing how linear regression attempts
to draw a straight line that will best minimize the residual sum of squares
between the observed responses in the dataset, and the responses predicted by
the linear approximation.

The coefficients, the residual sum of squares and the variance score are also
calculated.



In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
%matplotlib inline

## Load the data

In [None]:
# Load the diabetes dataset
diabetes = datasets.load_diabetes()

## See what type of object we loaded

In [None]:
type(diabetes)

## See what attributes the object has

In [None]:
dir(diabetes)

## See what type the data object is

In [None]:
type(diabetes.data)

## See what its dimensions are

In [None]:
diabetes.data.shape

## See what the dimensions the diabetes object has

In [None]:
diabetes.target.shape

## List the first few rows of the data

In [None]:
diabetes.data[1:6,:]

## List the first few rows of the target

In [None]:
diabetes.target[1:10]

## Pick out column 2 (the third column - numbering starts at zero)

In [None]:
# Use only one feature
diabetes_X = diabetes.data[:, np.newaxis, 2]
print(diabetes_X.shape)
diabetes_X[1:10]

## Split the data into training and test

In [None]:
# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
print(diabetes_X_train.shape)
print(diabetes_X_test.shape)

## Split the target the same way

In [None]:
# Split the targets into training/testing sets
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]

## Create a regression object

In [None]:
# Create linear regression object
regr = linear_model.LinearRegression()
type(regr)

## List its attributes

In [None]:
dir(regr)

## Train the model

In [None]:
# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)

## List the coefficients

In [None]:
# The coefficients
print('Coefficients: \n', regr.coef_)

## Predict y for a specific x

In [None]:
regr.predict(0.5)

## compute the mean squared error and explained variance

In [None]:
# The mean squared error
print("Mean squared error: %.2f"
      % np.mean((regr.predict(diabetes_X_test) - diabetes_y_test) ** 2))

# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % regr.score(diabetes_X_test, diabetes_y_test))

## Plot the fitted line and test data

In [None]:

# Plot outputs
plt.scatter(diabetes_X_test, diabetes_y_test,  color='black')
plt.plot(diabetes_X_test, regr.predict(diabetes_X_test), color='blue',
         linewidth=3)

plt.xticks(())
plt.yticks(())

plt.show()

## Exercise:  Repeat process, using the fourth column of the data as the predictor

In [None]:

# 