### Module 3 Learning #2: Machine Learning using Linear Regression
In this activity, we are going to again fit a linear model to a set of data, but we are going to do it as a machine learning method.  This is to show you that prediction using machine learning is very similar to statistical prediction using linear regression.<P>
    
We will call this a machine learning method to fit a linear model:<P>
1. Obtain and isolate the data
2. Split the data into training & test datasets
3. Format the data for the algorithm
4. Create the model and train it
5. Use the test set to measure the model's performance
6.  Use the model to make new predictions


In [None]:
from sklearn import datasets, linear_model # Both the data & the linear model
from sklearn.metrics import mean_squared_error, r2_score # Some metrics
import pandas as pd # Used to store data in a DataFrame
import numpy as np # We will use the np linear model
import matplotlib.pyplot as plt # Visualization module
%matplotlib inline

### 1. Obtain and isolate the data

In [None]:
# Same process as before
data = datasets.load_linnerud()
df = pd.DataFrame(data=data.target, columns=data.target_names)
print('Size of data (Rows,Cols):',df.shape)
df.head(4) # Show just the first 4

In [None]:
# Isolate two columns
X = df['Weight'] # We'll call this our independent variable. It is a pandas Series datatype.
y = df['Waist'] # This will be our dependent variable
# Plot the data using the matplotlib library
plt.scatter(X, y, c ="blue")
plt.title("Weight vs. Waist")
plt.xlabel("Weight (lbs)")
plt.ylabel("Waist (inches)")
plt.show()

### 2. Split the data into training and test sets
In machine learning, we spit our data into training and test sets. Once we train our model, then we use the test set to evaluate how well our model performs. This is a very common practice.<P>
    
In this example, we have only 20 rows of data, which is much too few, but we'll get the idea.<P>
    
Let's put 15 rows (75%) in the training set and leave 5 rows (25%) to evaluate how good the model is performing.

In [None]:
# Split the data into training/testing sets
X_train = X[:-5] # Exclude the last 5 rows (15 rows of data)
X_test = X[-5:]  # Include the only the last 5 rows (5 rows of data)

# Split the targets into training/testing sets
y_train = y[:-5] # Exclude the last 5 rows
y_test = y[-5:] # Include the last 5 rows

### 3. Format the data for the algorithm

This is a special situation and takes a little data preparation.

The sklearn.linear_model is expecting the X data in a pandas dataframe. Currently, the X data are pandas Series datatype, which is a single array.<P>
    
We will convert the X data into single-column dataframes

In [None]:
type(X_train)

In [None]:
# Prepare the data for the sklearn.linear_model()
# numpy.reshape(): https://numpy.org/doc/stable/reference/generated/numpy.reshape.html
# This will create a 2d arrary from the pandas series
X_train = X_train.to_frame()
X_test = X_test.to_frame()
type(X_train)

### 4. Create the model and train it

In [None]:
# Create linear regression object
model = linear_model.LinearRegression()
#
# Train the model using the training sets
model.fit(X_train, y_train)
#
#Pull out the coefficients
x_coef = model.coef_.item(0)
intercept = model.intercept_.item(0)
print('The linear model has equation of:')
print('y = ',x_coef,'* x + ',intercept)

### 5. Use the test set to measure the model's performance

In [None]:
# Make predictions of waist size using weights from the test dataset
y_pred = model.predict(X_test)
# Compare a few waist predictions to the actual waist measurements from the test data
for row in range(3):
    print('Prediction:', y_pred.item(row),'Actual:',y[row],'Error:',abs(y_pred.item(row) - y[row]))

In [None]:
# Now, use the waist size prediction and the true waist size to see how well our model does
print("Coefficient of determination: %.2f" % r2_score(y_test, y_pred))

### 6. Use the model to make new predictions

In [None]:
# Use the model to predict my waist size
myWeight = np.array(210).reshape(-1,1) # 1 row, 1 column, 2d array
myPredictedWaist = model.predict(myWeight)
myTrueWaist = 36
print("My predicted waist is {} while my actual waist is {} for an error of: {} inches."\
      .format(myPredictedWaist.item(0), myTrueWaist, myPredictedWaist.item(0) - myTrueWaist))

### Plot the test data and the prediction linear model

In [None]:
# Plot the just test data 
plt.scatter(X_test, y_test, c ="blue")
# Create a list of integers to on the x-axis
x_nums = np.arange(130,250,5).reshape(-1,1) # Integers between low and high, as a 2d array
# From each of those integers, predict the y-axis value using the linear model prediction
y_nums = model.predict(x_nums) # Predicted values for each of these
plt.plot(x_nums,y_nums, c = 'red') # Predicted linear line
plt.title("Weight vs. Waist")
plt.xlabel("Weight (lbs)")
plt.ylabel("Waist (inches)")
plt.show()