Hands-on Assignment: Implementing Linear Regression with Gradient Descent — [CommonLounge](https://www.commonlounge.com/discussion/766ff7abcdce44d28de929e15e8ff4fa)

In [0]:
## Import stuff
 
import numpy as np
from sklearn import datasets, linear_model, metrics

# Dataset

In [0]:
## Load the diabetes dataset
diabetes = datasets.load_diabetes()
diabetes_X = diabetes.data # matrix of dimensions 442x10
 
# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
 
# Split the targets into training/testing sets
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]

# Implementing Linear Regression with Gradient Descent

In [44]:
diabetes.keys()

dict_keys(['data', 'target', 'DESCR', 'feature_names', 'data_filename', 'target_filename'])

In [45]:
print(diabetes.DESCR)

.. _diabetes_dataset:

Diabetes dataset
----------------

Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of n =
442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.

**Data Set Characteristics:**

  :Number of Instances: 442

  :Number of Attributes: First 10 columns are numeric predictive values

  :Target: Column 11 is a quantitative measure of disease progression one year after baseline

  :Attribute Information:
      - Age
      - Sex
      - Body mass index
      - Average blood pressure
      - S1
      - S2
      - S3
      - S4
      - S5
      - S6

Note: Each of these 10 feature variables have been mean centered and scaled by the standard deviation times `n_samples` (i.e. the sum of squares of each column totals 1).

Source URL:
https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html

For more information see:
Bra

In [92]:
# train
X = diabetes_X_train
y = diabetes_y_train

# train: init
W = np.random.uniform(low=-0.1, high=0.1, size=diabetes_X.shape[1])
b = 0.0
 
learning_rate = 0.1
epochs = 100000
N = len(X)
# train: gradient descent
for i in range(epochs):
    # calculate predictions
    y_predict = X.dot(W) + b #np.dot(W,X.T) + b
 
    # calculate error and cost (mean squared error)
    error = y - y_predict
    mean_squared_error = np.mean(np.power(error,2))
 
    # calculate gradients
    w_gradient = -(1.0/N) * error.dot(X)
    b_gradient = -(1.0/N) * np.sum(error)
 
    # update parameters
    W = W - learning_rate * w_gradient
    b = b - learning_rate * b_gradient
 
    # diagnostic output
    if i % 5000 == 0: print("Epoch %d: %f" % (i, mean_squared_error))

Epoch 0: 29468.870364
Epoch 5000: 3048.219578
Epoch 10000: 2941.418070
Epoch 15000: 2927.458826
Epoch 20000: 2924.753127
Epoch 25000: 2923.795458
Epoch 30000: 2923.195599
Epoch 35000: 2922.694243
Epoch 40000: 2922.231023
Epoch 45000: 2921.789131
Epoch 50000: 2921.362955
Epoch 55000: 2920.950110
Epoch 60000: 2920.549251
Epoch 65000: 2920.159430
Epoch 70000: 2919.779892
Epoch 75000: 2919.410006
Epoch 80000: 2919.049225
Epoch 85000: 2918.697073
Epoch 90000: 2918.353135
Epoch 95000: 2918.017040


# Check accuracy

In [93]:
# test
X = diabetes_X_test
y = diabetes_y_test
 
# calculate predictions + calculate error and cost (same code as above)
y_predict = X.dot(W) + b
error = y - y_predict
mean_squared_error = np.mean(np.power(e,2))
print('Coefficients: \n', W)
print("Mean squared error: %.2f" % mean_squared_error)
print("="*120)

Coefficients: 
 [   3.66171929 -234.66436592  519.39535678  325.58171667 -176.10723917
  -16.44047148 -180.05906974  108.06179546  502.78362705   78.96999074]
Mean squared error: 1994.64
