# Module 15 - Gradient Descent
**Author** - Jacob Buysse

* This notebook will explore the math behind Gradient Descent with the application of Linear Regression.
* We will be training a model to predict the house price given some spatial features of the house.
* We will compare several learning rates
* We will compare the performance of our algorithm to the public libraries.

In this notebook we will be using the following libraries:

In [121]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression

Let us configure our graphs for font size, high DPI, and automatic layout.

In [2]:
matplotlib.rc('axes', labelsize=16)
matplotlib.rc('figure', dpi=150, autolayout=True)

## Dataset

TODO: Load the real dataset

In [71]:
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing(as_frame=True)
df = pd.DataFrame(data=housing.data, columns=housing.feature_names)
df['bias'] = 1
df['price'] = housing.target
df

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,bias,price
0,8.3252,41.0,6.984127,1.023810,322.0,2.555556,37.88,-122.23,1,4.526
1,8.3014,21.0,6.238137,0.971880,2401.0,2.109842,37.86,-122.22,1,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.802260,37.85,-122.24,1,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,1,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,1,3.422
...,...,...,...,...,...,...,...,...,...,...
20635,1.5603,25.0,5.045455,1.133333,845.0,2.560606,39.48,-121.09,1,0.781
20636,2.5568,18.0,6.114035,1.315789,356.0,3.122807,39.49,-121.21,1,0.771
20637,1.7000,17.0,5.205543,1.120092,1007.0,2.325635,39.43,-121.22,1,0.923
20638,1.8672,18.0,5.329513,1.171920,741.0,2.123209,39.43,-121.32,1,0.847


In [98]:
X = df[['bias', 'AveRooms', 'AveBedrms']].values
y = df.price.values
print(f"X is {X.shape}, y is {y.shape}")

X is (20640, 3), y is (20640,)


In [99]:
m = y.shape[0]
n = X.shape[1]
print(f"{m:,} samples, {n} features")

20,640 samples, 3 features


In [116]:
def GetCost(X, y, w):
    # Using MSE (Mean Squared Error) with the 1/2 factor for clean derivatives
    predictions = np.matmul(X,  w)
    errors = predictions - y
    return sum(np.square(errors)) / (2 * len(y))

In [115]:
def GradientDescent(X, y, w, alpha, iterations):
    history = []
    for index in range(iterations):
        # Our predictions are just the product of our samples X times our weight vector
        predictions = np.matmul(X, w)
        # Our errors are the difference between our predictions and our truth values y
        errors = predictions - y
        # The gradient vector is the Jacobian matrix derivative of our cost
        # function with respect to the weights.  This comes out to the transpose
        # of X (n x m) times the errors (m x 1) to get an n x 1 vector that we
        # then scale down by the number of samples m.
        gradient = (1 / len(y)) * np.matmul(X.T, errors)
        # Change the weights by the learning rate alpha times the gradient
        w = w - alpha * gradient
        # Determine the new cost
        cost = GetCost(X, y, w)
        # Keep track of the cost history so we can plot the progress of the algorithm
        history.append(cost)
    # Return the final weights and the cost history
    return w, history

In [117]:
# Execute our algorithm starting with random weights, a learning rate of 0.01, and iterating 1000 times
w, history = GradientDescent(X, y, np.random.rand(n), 0.01, 1000)

In [119]:
print(f"Final weights: {w}")

Final weights: [ 1.37068562  0.13861639 -0.10325765]


In [113]:
history[:10]

[np.float64(1.008199472555868),
 np.float64(0.8794192217735854),
 np.float64(0.8293712967487973),
 np.float64(0.8097959847385631),
 np.float64(0.8020150640957192),
 np.float64(0.7987993994609405),
 np.float64(0.7973510579610879),
 np.float64(0.7965871392595496),
 np.float64(0.7960885214454891),
 np.float64(0.79569298840585)]

In [120]:
# Show a predicted price to see how much we are in the ballbark
print(f"Predicted {np.matmul(X[0], w)}, actual {y[0]}")

Predicted 2.2330839179266744, actual 4.526


In [125]:
model = LinearRegression(fit_intercept=False, n_jobs=-1)
model.fit(X, y)
print(f"Linear Regression parameters: {model.coef_}")

Linear Regression parameters: [ 2.01051376  0.31729619 -1.51782097]


In [128]:
pred = model.predict(X)
print(f"Predicted {pred[0]}, actual {y[0]}")

Predicted 2.6725910960546946, actual 4.526
