# Learning rate

The learning rate hyperparameter is one of the most important in the training of a machine learning model. In this notebook, we will see its influence on training a model using Stochastic Gradient Descent.

In [4]:
# make sure the required packages are installed
%pip install numpy pandas scikit-learn matplotlib seaborn --quiet

# import the required modules
import pandas as pd
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler

from typing import List, Dict

import utils

random_state = 42

Note: you may need to restart the kernel to use updated packages.


## Data preparation

We use the Statistics Online Computational Resource (SOCR) dataset for human heights (inches) and weight (pounds). We load the dataset scale the X values.

In [5]:
(X_train, y_train), (X_test, y_test) = utils.load_dataset_from_csv('data/height_weight.csv',
                      ['Height'], 'Weight', 0.2, random_state=random_state)
# Scale X_train and X_test using StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

## Different learning rates

We train the model with different learning rates and epochs and show the results of evaluating the model with the test set.

In [6]:
def compute_mses_for_learning_rates(X_train: pd.DataFrame, y_train: pd.Series, X_test: pd.DataFrame, y_test: pd.Series,
                                    learning_rates: List[float], epochs: List[int]) -> Dict[int, Dict[float, float]]:
    """
    Compute the Mean Squared Error (MSE) for different learning rates and epochs using the SGDRegressor model
    :param X_train: The input data (independent variables) for training 
    :param y_train: The output data (dependent variable) for training 
    :param X_test: The input data (independent variables) for testing
    :param y_test: The output data (dependent variable) for testing
    :param learning_rates: The learning different rates to be used
    :param epochs: The number of epochs to be used
    :return: A dictionary containing the MSE values for each epoch(first key) and learning rate (second key)
    """
    import warnings; warnings.filterwarnings('ignore')  # ignore the warnings for small max_iter values
    # Initialize a list to store the MSE values
    mse_values_per_n_epochs = dict()
    # For each learning rate
    for epoch in epochs:
        mse_values_per_n_epochs[epoch] = dict()
        for learning_rate in learning_rates:
            # Create and train the SGDRegressor model
            model = SGDRegressor(eta0=learning_rate, max_iter=epoch, random_state=random_state)
            model.fit(X_train, y_train)
            # Predict the values for the test set
            y_pred = model.predict(X_test)
            # Compute the Mean Squared Error (MSE) for the test set
            mse = mean_squared_error(y_test, y_pred)
            mse_values_per_n_epochs[epoch][learning_rate] = mse
    return mse_values_per_n_epochs


# Create and train SDGRegressor models for different learning rates and epochs
learning_rates = [0.0001, 0.001, 0.01, 0.1, 1.0, 10]
epochs = [1, 10, 100, 1000]
mse_values = compute_mses_for_learning_rates(X_train, y_train, X_test, y_test, learning_rates, epochs)
# Show the MSE values for each learning rate and epoch
table = pd.DataFrame(mse_values)  # row indexes are learning rates, column indexes are epochs, cell values are MSEs
print(f"Minimum MSE value: {table.min().min():.6f}.")  # The first min() gets the minimum value of each column,
                                                       # the second min() gets the minimum value of the resulting Series.
print("Mean Squared Error (MSE) for different learning rates and epochs:")
table

Minimum MSE value: 102.477695.
Mean Squared Error (MSE) for different learning rates and epochs:


Unnamed: 0,1,10,100,1000
0.0001,10440.323654,1400.98215,102.492753,102.49195
0.001,283.518016,102.482946,102.488639,102.488639
0.01,102.533908,102.477695,102.477695,102.477695
0.1,104.119767,102.665155,102.916523,102.916523
1.0,106.680738,102.803322,103.969483,103.969483
10.0,465.419403,129.847802,111.682534,111.682534


## ✨ Questions ✨

The following questions are very important to understand the behavior of the learning rate and training with GD and, in general, of neural networks.
1. What happens when the learning rate is too small?
2. What happens when the learning rate is too large?
3. In general, what happens when the number of epochs is too small? 

### Answers 

*Write your answers here.*
