# Optional Lab: Linear Regression using Scikit-Learn

There is an open-source, commercially usable machine learning toolkit called [scikit-learn](https://scikit-learn.org/stable/index.html). This toolkit contains implementations of many of the algorithms that you will work with in this course.



## Tools
You will utilize functions from scikit-learn as well as matplotlib and NumPy. 

In [8]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler
from lab_utils_multi import load_house_data
from lab_utils_common import dlc
np.set_printoptions(precision=3)
plt.style.use('./deeplearning.mplstyle')

### Load the data set

In [9]:
X_train, y_train = load_house_data()
X_features = ['size(sqft)','bedrooms','floors','age']

### Scale/normalize the training data

In [10]:
scaler = StandardScaler()
X_norm = scaler.fit_transform(X_train)
print(f"Peak to Peak range by column in Raw        X:{np.ptp(X_train,axis=0)}")   
print(f"Peak to Peak range by column in Normalized X:{np.ptp(X_norm,axis=0)}")

Peak to Peak range by column in Raw        X:[2.406e+03 4.000e+00 1.000e+00 9.500e+01]
Peak to Peak range by column in Normalized X:[5.845 6.135 2.056 3.685]


### Create and fit the regression model

In [11]:
sgdr = SGDRegressor(max_iter=1000)
sgdr.fit(X_norm, y_train)
print(sgdr)
print(f"number of iterations completed: {sgdr.n_iter_}, number of weight updates: {sgdr.t_}")

SGDRegressor()
number of iterations completed: 111, number of weight updates: 10990.0


### View parameters
Note, the parameters are associated with the *normalized* input data. The fit parameters are very close to those found in the previous lab with this data.

In [12]:
b_norm = sgdr.intercept_
w_norm = sgdr.coef_
print(f"Intercept (bias) in Normalized space: {b_norm}")
print(f"Weights in Normalized space: {w_norm}")

Intercept (bias) in Normalized space: [363.149]
Weights in Normalized space: [109.912 -20.938 -32.345 -38.078]


### Make predictions
Predict the targets of the training data. Use both the `predict` routine and compute using $w$ and $b$.

In [13]:
# predict the price of a 1650 sqft, 3 bedroom, 2 floor, 18 year old house
X_test = np.array([1650,3,2,18]).reshape(1,-1)
X_test_norm = scaler.transform(X_test)
y_pred = sgdr.predict(X_test_norm)
print("Predicted price of a 1650 sqft, 3 bedroom, "+
      f"2 floor, 18 year old house: ${y_pred[0]*1000:.2f}")

Predicted price of a 1650 sqft, 3 bedroom, 2 floor, 18 year old house: $405046.17
