**Exercise 4: Gradient Descent for Linear Regression**

*CPSC 381/581: Machine Learning*

*Yale University*

*Instructor: Alex Wong*


**Prerequisites**:

1. Enable Google Colaboratory as an app on your Google Drive account

2. Create a new Google Colab notebook, this will also create a "Colab Notebooks" directory under "MyDrive" i.e.
```
/content/drive/MyDrive/Colab Notebooks
```

3. Create the following directory structure in your Google Drive
```
/content/drive/MyDrive/Colab Notebooks/CPSC 381-581: Machine Learning/Exercises
```

4. Move the 04_exercise_gradient_descent.ipynb into
```
/content/drive/MyDrive/Colab Notebooks/CPSC 381-581: Machine Learning/Exercises
```
so that its absolute path is
```
/content/drive/MyDrive/Colab Notebooks/CPSC 381-581: Machine Learning/Exercises/04_exercise_gradient_descent.ipynb
```

In this exercise, we will optimize a linear function for the regression task using the gradient descent for mean squared and half mean squared losses. We will test them on several datasets.


**Submission**:

1. Implement all TODOs in the code blocks below.

2. Report your training, validation, and testing scores.

```
Report validation and testing scores here.

For full credit, your mean squared error scores for models optimized using mean_squared and half_mean_squared losses on Diabetes dataset should be no more than 15% worse the mean squared error scores achieved by sci-kit learn's linear regression model across training, validation and testing splits. Your mean squared error scores on California housing price dataset should be no more than 20% worse.
```

3. List any collaborators.

```
Collaborators: Doe, Jane (Please write names in <Last Name, First Name> format)

Collaboration details: Discussed ... implementation details with Jane Doe.
```

Import packages

In [9]:
import numpy as np
import sklearn.datasets as skdata
import sklearn.metrics as skmetrics
from sklearn.linear_model import LinearRegression as LinearRegressionSciKit
import warnings

warnings.filterwarnings(action='ignore')
np.random.seed = 1

Implementation of our Gradient Descent optimizer for mean squared and half mean squared loss

In [10]:
class GradientDescentOptimizer(object):

    def __init__(self):
        pass

    def _compute_gradients(self, w, x, y, loss_func):
        '''
        Returns the gradient of mean squared or half mean squared loss

        Arg(s):
            w : numpy[float32]
                d x 1 weight vector
            x : numpy[float32]
                d x N feature vector
            y : numpy[float32]
                1 x N groundtruth vector
            loss_func : str
                loss type either mean_squared', or 'half_mean_squared'
        Returns:
            numpy[float32] : d x 1 gradients
        '''

        # TODO: Implements the _compute_gradients function
        if loss_func == 'mean_squared':
            gradients = (np.matmul(w.T, x) - y) * x

            return 2.0 * np.mean(gradients, axis=1, keepdims=True)
        elif loss_func == 'half_mean_squared':
            gradients = (np.matmul(w.T, x) - y) * x

            return np.mean(gradients, axis=1, keepdims=True)
        else:
            raise ValueError('Unsupported loss function: {}'.format(loss_func))


    def update(self, w, x, y, alpha, loss_func):
        '''
        Updates the weight vector based on mean squared or half mean squared loss

        Arg(s):
            w : numpy[float32]
                d x 1 weight vector
            x : numpy[float32]
                d x N feature vector
            y : numpy[float32]
                1 x N groundtruth vector
            alpha : float
                learning rate
            loss_func : str
                loss type either 'mean_squared', or 'half_mean_squared'
        Returns:
            numpy[float32] : d x 1 weights
        '''

        # TODO: Implement the optimizer update function

        return w - alpha * self._compute_gradients(w, x, y, loss_func)


Implementation of Linear Regression with Gradient Descent optimizer

In [11]:
class LinearRegressionGradientDescent(object):

    def __init__(self):
        # Define private variables
        self.__weights = None
        self.__optimizer = GradientDescentOptimizer()

    def fit(self, x, y, T, alpha, loss_func='mean_squared'):
        '''
        Fits the model to x and y by updating the weight vector
        using gradient descent

        Arg(s):
            x : numpy[float32]
                d x N feature vector
            y : numpy[float32]
                1 x N groundtruth vector
            T : int
                number of iterations to train
            alpha : float
                learning rate
            loss_func : str
                loss function to use
        '''

        # TODO: Implement the fit function
        self.__weights = np.zeros([x.shape[0], 1])

        for t in range(1, T + 1):

            # TODO: Compute loss function
            loss = self._compute_loss(
                x=x,
                y=y,
                loss_func=loss_func)

            if (t % 10000) == 0:
                print('Step={}  Loss={:.4f}'.format(t, loss))

            # TODO: Update weights
            self.__weights = self.__optimizer.update(
                w=self.__weights,
                x=x,
                y=y,
                alpha=alpha,
                loss_func=loss_func)

    def predict(self, x):
        '''
        Predicts the label for each feature vector x

        Arg(s):
            x : numpy[float32]
                d x N feature vector
        Returns:
            numpy[float32] : 1 x N vector
        '''

        # TODO: Implements the predict function

        return np.matmul(self.__weights.T, x)

    def _compute_loss(self, x, y, loss_func):
        '''
        Returns the gradient of the mean squared or half mean squared loss

        Arg(s):
            x : numpy[float32]
                d x N feature vector
            y : numpy[float32]
                1 x N groundtruth vector
            loss_func : str
                loss type either 'mean_squared', or 'half_mean_squared'
        Returns:
            float : loss
        '''

        # TODO: Implements the _compute_loss function
        predictions = self.predict(x)

        if loss_func == 'mean_squared':
            # TODO: Implements loss for mean squared loss
            loss = np.mean((predictions - y) ** 2)
        elif loss_func == 'half_mean_squared':
            # TODO: Implements loss for half mean squared loss
            loss = 0.50 * np.mean((predictions - y) ** 2)
        else:
            raise ValueError('Unsupported loss function: {}'.format(loss_func))

        return loss

Implementing training and validation loop for linear regression

In [12]:
# Load Diabetes and California housing prices dataset
datasets = [
    skdata.load_diabetes(),
    skdata.fetch_california_housing()
]
dataset_names = [
    'Diabetes',
    'California housing prices'
]

# Loss functions to minimize
dataset_loss_funcs = [
    ['mean_squared', 'half_mean_squared'],
    ['mean_squared', 'half_mean_squared']
]

# TODO: Select learning rates (alpha) for mean squared and half mean squared loss
dataset_alphas = [
    [1, 1],
    [1e-7, 2.5e-7]
]

# TODO: Select number of steps (T) to train for mean squared and half mean squared loss
dataset_Ts = [
    [100000, 100000],
    [2000000, 100000]
]

for dataset_options in zip(datasets, dataset_names, dataset_loss_funcs, dataset_alphas, dataset_Ts):

    dataset, dataset_name, loss_funcs, alphas, Ts = dataset_options

    '''
    Create the training, validation and testing splits
    '''
    x = dataset.data
    y = dataset.target

    # Shuffle the dataset based on sample indices
    shuffled_indices = np.random.permutation(x.shape[0])

    # Choose the first 80% as training set, next 10% as validation and the rest as testing
    train_split_idx = int(0.80 * x.shape[0])
    val_split_idx = int(0.90 * x.shape[0])

    train_indices = shuffled_indices[0:train_split_idx]
    val_indices = shuffled_indices[train_split_idx:val_split_idx]
    test_indices = shuffled_indices[val_split_idx:]

    # Select the examples from x and y to construct our training, validation, testing sets
    x_train, y_train = x[train_indices, :], y[train_indices]
    x_val, y_val = x[val_indices, :], y[val_indices]
    x_test, y_test = x[test_indices, :], y[test_indices]

    '''
    Trains and tests Linear Regression model from scikit-learn
    '''
    # TODO: Initialize scikit-learn linear regression model without bias
    model_scikit = LinearRegressionSciKit(fit_intercept=False)

    # TODO: Trains scikit-learn linear regression model
    
    model_scikit.fit(x_train, y_train)


    print('***** Results of scikit-learn linear regression model on {} dataset *****'.format(
        dataset_name))

    # TODO: Test model on training set
    predictions_train = model_scikit.predict(x_train)

    score_mse_train = skmetrics.mean_squared_error(y_train, predictions_train)
    print('Training set mean squared error: {:.4f}'.format(score_mse_train))

    score_r2_train = skmetrics.r2_score(y_train, predictions_train)
    print('Training set r-squared scores: {:.4f}'.format(score_r2_train))

    # TODO: Test model on validation set
    predictions_val = model_scikit.predict(x_val)

    score_mse_val = skmetrics.mean_squared_error(y_val, predictions_val)
    print('Validation set mean squared error: {:.4f}'.format(score_mse_val))

    score_r2_val = skmetrics.r2_score(y_val, predictions_val)
    print('Validation set r-squared scores: {:.4f}'.format(score_r2_val))

    # TODO: Test model on testing set
    predictions_test = model_scikit.predict(x_test)

    score_mse_test = skmetrics.mean_squared_error(y_test, predictions_test)
    print('Testing set mean squared error: {:.4f}'.format(score_mse_test))

    score_r2_test = skmetrics.r2_score(y_test, predictions_test)
    print('Testing set r-squared scores: {:.4f}'.format(score_r2_test))

    '''
    Trains and tests our linear regression model using different solvers
    '''

    # Take the transpose of the dataset to match the dimensions discussed in lecture
    # i.e., (N x d) to (d x N)
    x_train = np.transpose(x_train, axes=(1, 0))
    x_val = np.transpose(x_val, axes=(1, 0))
    x_test = np.transpose(x_test, axes=(1, 0))
    y_train = np.expand_dims(y_train, axis=0)
    y_val = np.expand_dims(y_val, axis=0)
    y_test = np.expand_dims(y_test, axis=0)

    for loss_func, alpha, T in zip(loss_funcs, alphas, Ts):

        # TODO: Initialize our linear regression model
        model_ours = LinearRegressionGradientDescent()

        print('***** Results of our linear regression model trained with {} loss, alpha={} and T={} on {} dataset *****'.format(
            loss_func, alpha, T, dataset_name))

        # TODO: Train model on training set
        model_ours.fit(x_train, y_train, T, alpha, loss_func)
        # TODO: Make pedictions
        predictions_train = model_ours.predict(x_train)

        # TODO: Test model on training set using mean squared error and r-squared
        score_mse_train = model_ours._compute_loss(x_train, y_train, loss_func)
        print('Training set mean squared error: {:.4f}'.format(score_mse_train))
        score_r2_train = skmetrics.r2_score(np.squeeze(y_train), np.squeeze(predictions_train))
        print('Training set r-squared scores: {:.4f}'.format(score_r2_train))

        # TODO: Test model on validation set using mean squared error and r-squared
        predictions_val = model_ours.predict(x_val)

        score_mse_val = model_ours._compute_loss(x_val, y_val, loss_func)
        print('Validation set mean squared error: {:.4f}'.format(score_mse_val))
        score_r2_val = skmetrics.r2_score(np.squeeze(y_val), np.squeeze(predictions_val))

        print('Validation set r-squared scores: {:.4f}'.format(score_r2_val))

        # TODO: Test model on testing set using mean squared error and r-squared
        predictions_test = model_ours.predict(x_test)

        score_mse_test = model_ours._compute_loss(x_test, y_test, loss_func)
        print('Testing set mean squared error: {:.4f}'.format(score_mse_test))
        score_r2_test = skmetrics.r2_score(np.squeeze(y_test), np.squeeze(predictions_test))
        print('Testing set r-squared scores: {:.4f}'.format(score_r2_test))


***** Results of scikit-learn linear regression model on Diabetes dataset *****
Training set mean squared error: 25924.5928
Training set r-squared scores: -3.2354
Validation set mean squared error: 28170.6028
Validation set r-squared scores: -4.0233
Testing set mean squared error: 27257.1553
Testing set r-squared scores: -4.9347
***** Results of our linear regression model trained with mean_squared loss, alpha=1 and T=100000 on Diabetes dataset *****
Step=10000  Loss=25940.3928
Step=20000  Loss=25931.6810
Step=30000  Loss=25927.7733
Step=40000  Loss=25926.0199
Step=50000  Loss=25925.2332
Step=60000  Loss=25924.8801
Step=70000  Loss=25924.7217
Step=80000  Loss=25924.6507
Step=90000  Loss=25924.6188
Step=100000  Loss=25924.6045
Training set mean squared error: 25924.6045
Training set r-squared scores: -3.2354
Validation set mean squared error: 28162.2910
Validation set r-squared scores: -4.0219
Testing set mean squared error: 27261.0160
Testing set r-squared scores: -4.9355
***** Results