# Linear Regression

In this example, we will use linear regression for the first feature of the diabetes dataset as done in https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html.  We will split the data into train and test datasets to measure performance. 

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split

In [None]:
diabetes = datasets.load_diabetes()  # load dataset
diabetes_X = diabetes.data[:, 0]  # get first feature
X_train, X_test, y_train, y_test = train_test_split(diabetes_X,
                                                    diabetes.target,
                                                    test_size=0.20,
                                                    random_state=42,
                                                    shuffle=True)

In [None]:
X_train = X_train.reshape(-1, 1)
X_test = X_test.reshape(-1, 1)
y_test = y_test.reshape(-1, 1)
y_train = y_train.reshape(-1, 1)

In [None]:
print("Size of training set is {}".format(X_train.shape))
print("Size of the Label training set is {}".format(y_train.shape))
print("Size of the Label training set is {}".format(y_test.shape))
print("Size of the training set is {}".format(X_test.shape))

As you can see, with the simple train_test_split the partion of the dataset is easy.

### Important note is that train_test_split creates 1D arrays.  Data needs reshaped via Linear Regression.

In [None]:
# Create linear regression object
regr = linear_model.LinearRegression()

In [None]:
# Train model
regr.fit(X_train, y_train)

In [None]:
# Make predictions using test data
y_pred = regr.predict(X_test)

In [None]:
# Coefficients
print("Coefficients:", regr.coef_)

# Mean Squared Error
print("Mean Squared Error: %.2f" % mean_squared_error(y_test, y_pred))

# Variance
print("Variance: %.2f" % r2_score(y_test, y_pred))

In [None]:
plt.scatter(X_test, y_test, color="black")
plt.plot(X_test, y_pred, color="blue", linewidth=3)
plt.xlabel("Glu")
plt.ylabel("Advancement of Diabetes")
plt.show()