### Module 3 Learning: Linear Regression as an example of Machine Learning
In this activity, become familiar with the concept of a mathematical model used to predict a value.  This concept will carry us forward to using more sophisticated machine learning models to predict increasingly complex data.<P>
    
We might call this a "statistical" method. We will contrast this to the "machine learning" method in the next activity. <P>
    
References:<P>
- https://www.youtube.com/watch?v=b0L47BeklTE
-https://data36.com/linear-regression-in-python-numpy-polyfit/

In [None]:
from sklearn import datasets # We'll get a data set from this module
import pandas as pd # Used to store data in a DataFrame
import numpy as np # We will use the np linear model
import matplotlib.pyplot as plt # Visualization module
%matplotlib inline

### Get some data to work with

In [None]:
# Import an example dataset
# From: https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset
data = datasets.load_linnerud()
# Convert the data to a pandas DataFrame
df = pd.DataFrame(data=data.target, columns=data.target_names)
print('Size of data (Rows,Cols):',df.shape)
df.head(4) # Show just the first 4

In [None]:
# Isolate two columns to which we will fit a model
X = df['Weight'] # We'll call this our independent variable. It is a pandas Series datatype.
y = df['Waist'] # This will be our dependent variable
# Print the first 3 rows of each Series
print(X.head(3))
print(y.head(3))

In [None]:
# Plot the data using the matplotlib library
plt.scatter(X, y, c ="blue")
plt.title("Weight vs. Waist")
plt.xlabel("Weight (lbs)")
plt.ylabel("Waist (inches)")
plt.show()

In [None]:
# Fit a model to the data
model = np.polyfit(X,y,1) # polyfit() is used to perform a least squares fit using a deg = 1 line.
print('The linear model has equation of:')
print('y = ',model[0],'* x + ',model[1]) # intercept/constant
#print(model[1]) # x coefficient

In [None]:
# Use the model to predict my waist size
predict = np.poly1d(model) # uses the model to predict waist size from a given weight.
myWeight = 210
myWaist = predict(myWeight)
print("My predicted waist is {0} while my actual waist is 36.0 for an error of: {1} inches.".format(myWaist, myWaist - 36))

In [None]:
# Let's plot the predicted line
#
# Create a list of integers to on the x-axis
X_pred = range(130,250) # Integers between low and high
# From each of those integers, predict the y-axis value using the linear model prediction
y_pred = predict(X_pred) # Predicted values for each of these
#
# Plot both the original data points and the predicted line
plt.scatter(X,y, c = 'blue') # Original data
plt.plot(X_pred,y_pred, c = 'red') # Predicted linear line
plt.title("Weight vs. Waist")
plt.xlabel("Weight (lbs)")
plt.ylabel("Waist (inches)")
plt.show()

In [None]:
# This isn't a perfect model, right?
# But, how good is it? Let's use a common metric called R^2
from sklearn.metrics import r2_score # import the coefficient of determination (R^2) from skearn library
# Caculate this metric using the original y and the predicted y
r2_score(y,predict(X))