# Gradient Descent and Linear Algebra Review

In this notebook, we review:
* Referencing sklearn documentation
* `learning rate` - a standard hyperparameter for gradient descent algorithms
    * How learning rate impacts a model's fit.
* Introductory elements of linear algebra.
    * Vector shape
    * Dot products
    * Numpy arrays

In the next cell we import the necessary packages for the notebook and load in a dataset containing information about diatebetes patients. 

**Data Understanding**

The documentation for this dataset provides the following summary:

> *Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442 diabetes patients, as well as the response of interest, a quantitative measure of disease progression one year after baseline.*

In [1]:
# Sklearn's gradient descent linear regression model
from sklearn.linear_model import SGDRegressor

# Pandas and numpy
import pandas as pd
import numpy as np

# Train test split
from sklearn.model_selection import train_test_split

# Load Data
from sklearn.datasets import load_diabetes
data = load_diabetes()
df = pd.DataFrame(data['data'], columns=data['feature_names'])
df['target'] = data['target']

# Jupyter configuration
%config Completer.use_jedi = False

df.head(3)

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6,target
0,0.038076,0.05068,0.061696,0.021872,-0.044223,-0.034821,-0.043401,-0.002592,0.019908,-0.017646,151.0
1,-0.001882,-0.044642,-0.051474,-0.026328,-0.008449,-0.019163,0.074412,-0.039493,-0.06833,-0.092204,75.0
2,0.085299,0.05068,0.044451,-0.005671,-0.045599,-0.034194,-0.032356,-0.002592,0.002864,-0.02593,141.0


In [2]:
df['age'].describe()

count    4.420000e+02
mean    -3.639623e-16
std      4.761905e-02
min     -1.072256e-01
25%     -3.729927e-02
50%      5.383060e-03
75%      3.807591e-02
max      1.107267e-01
Name: age, dtype: float64

# Gradient Descent 

## 1. Set up a train test split

In the cell below, please create a train test split for this dataset, setting `target` as the response variable and all other columns as independent variables. You may want to do this by creating `X` and `y` variables to then pass into the `train_test_split` function.
* Set the random state to `2021`
* Use the default for the size of the test set (`test_size=.25`)

In [3]:
# Your code here
X = df.drop(columns = 'target')
y = df['target']

In [4]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=2021)
train_df = pd.concat([X_train, y_train], axis = 1, ignore_index=False)
test_df = pd.concat([X_test,y_test], axis = 1, ignore_index=False)

In [5]:
train_df.head(3)

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6,target
338,-0.063635,-0.044642,-0.033151,-0.033214,0.001183,0.024051,-0.024993,-0.002592,-0.022512,-0.059067,214.0
90,0.012648,-0.044642,-0.025607,-0.040099,-0.030464,-0.045155,0.078093,-0.076395,-0.072128,0.011349,98.0
271,0.038076,0.05068,0.008883,0.04253,-0.042848,-0.021042,-0.039719,-0.002592,-0.018118,0.007207,127.0


## 2. Initialize an SGDRegressor model

Now, initialize an `SGDRegressor` model.
* Set the random state to `2021`

In [6]:
# Your code here
sgdr = SGDRegressor(random_state = 2021)

## 3. Fit the model

In the cell below, fit the model to the training data.

In [7]:
# Your code here
#sgdr.fit(X_train) #THIS CAUSED AN ERROR. IT ASKED FOR A Y PARAMETER
sgdr.fit(X_train, y_train)



SGDRegressor(random_state=2021)

At this point in the program, you may have become accustomed to ignoring pink warning messages –– mostly because `pandas` returns many unhelpful warning messages. 

It is important to state that, generally, you should not default to ignoring warning messages. In this case the above pink warning message is quite informative!

The above warning message tells us that our model failed to converge. This means that our model did not find the minima of the cost curve, which we usually want! The warning offers the suggestion:

> *Consider increasing max_iter to improve the fit.*


`max_iter` is an adjustable hyperparameter for the `SGDRegressor` model.

## 4. Explore `max_iter`

Let's zoom in on this parameter for a second.

In [8]:
# Run this cell unchanged
from src.questions import *
question_4.display()

VBox(children=(Output(layout=Layout(bottom='5px', width='auto')), RadioButtons(layout=Layout(flex_flow='column…

## 5. Update the max_iter

In the cell below, initialize a new `SGDRegessor` model with `max_iter` set to `10000`. 
* Set the random state to `2021`

In [9]:
# Your code here
sgdr = SGDRegressor(random_state = 2021, max_iter = 10000)

The model converged! This tells us that the model just needed to run for longer to reach the minima of the cost curve. 


But how do you find the necessary number of iterations? 

In the cell below, we have written some code that shows you how to find the required number of iterations programmatically. This code is mostly being provided in case you ever need it, so don't stress if it feels intimidating!

In truth, there is a different hyperparameter we tend to use to help our models converges. 

In [10]:
# Run this cell unchanged
import warnings

# Loop over a range of numbers between 1000 - 10,000
for i in range(1000, 10000, 500):
    # Catch the ConvergenceWarning
    with warnings.catch_warnings():
        # If a warning is produced, throw an error instead
        warnings.filterwarnings('error')
        # Place the model fit inside a try except block to catch the error
        try:
            model = SGDRegressor(max_iter=i, random_state=2021)
            model.fit(X_train, y_train)
            # If the model fits without a ConvergenceWarning stop the for loop
            break
        except Warning:
            # If the model returns a ConvergenceWarning, move to the next iteration.
            continue
            
# Print the number of iterations that allowed the model to converge
print('Max iterations needed for convergence:', i)

Max iterations needed for convergence: 6500


### Let's zoom in on the *learning rate*!

## 6. What is the default setting for alpha (learning rate) for the `SGDRegressor`? 

In [11]:
# Run this cell unchanged
question_6.display()

VBox(children=(Output(layout=Layout(bottom='5px', width='auto')), RadioButtons(layout=Layout(flex_flow='column…

*For the question above, look at the 'alpha' parameter in the class*

## 7. Update the alpha to .01 and set the max_iter to 1500

In [12]:
# Your code here
sgdr = SGDRegressor(random_state = 2021, max_iter = 1500, alpha = .01, learning_rate='optimal')
sgdr.fit(X_train, y_train)

SGDRegressor(alpha=0.01, learning_rate='optimal', max_iter=1500,
             random_state=2021)

### Lindsey's helpful comment in the slack
Hi all! Weird thing about questions 6-9: alpha is an odd term being used for two different things in this algorithm, if you look at sklearn's documentation for the SGDRegressor! <br>
For Question 7 - please also use the parameter learning_rate='optimal'

## 8. Did the model converge? - True or False

In [13]:
# Run this cell unchanged
question_8.display()

VBox(children=(Output(layout=Layout(bottom='5px', width='auto')), RadioButtons(layout=Layout(flex_flow='column…

*We think it converges because we didn't get a pink error*

## 9. Select the answer that best describes how alpha impacts a model's fit

In [14]:
# Run this cell unchanged
question_9.display()

VBox(children=(Output(layout=Layout(bottom='5px', width='auto')), RadioButtons(layout=Layout(flex_flow='column…

### Lindsey's helpful comment in the slack
For Question 9 - discuss alpha as it's being used by learning rate, don't worry about the first sentence in the documentation where it discusses being used as a penalty term

# Linear Algebra 

## 10. When finding the dot product for two vectors, the length of the vectors must be the same.

In [15]:
# Run this cell unchanged
question_10.display()

VBox(children=(Output(layout=Layout(bottom='5px', width='auto')), RadioButtons(layout=Layout(flex_flow='column…

## 11. Please select the solution for the dot product of the following vectors.

$vector_1 = \begin{bmatrix} 10&13\\ \end{bmatrix}$

$vector_2= \begin{bmatrix} -4&82\\ \end{bmatrix}$


In [16]:
# Run this cell unchanged
question_11.display()

VBox(children=(Output(layout=Layout(bottom='5px', width='auto')), RadioButtons(layout=Layout(flex_flow='column…

## 12. How do you turn a list into a numpy array?

In [17]:
# Run this cell unchanged

question_12.display()

VBox(children=(Output(layout=Layout(bottom='5px', width='auto')), RadioButtons(layout=Layout(flex_flow='column…

## 13. Please find the dot product of the following vectors

In [18]:
vector_1 = [
               [ 0.80559827,  0.29916789,  0.39630405,  0.92797795, -0.13808099],
               [ 1.7249222 ,  1.59418491,  1.95963002,  0.64988373, -0.08225951],
               [-0.50472891,  0.74287965,  1.8927091 ,  0.33783705,  0.94361808],
               [ 0.99034854, -1.0526394 , -0.33825968, -0.40148036,  1.81821604],
               [-0.7298026 , -0.88302624,  0.49319177, -0.02758864,  0.33430167],
               [ 0.85938167, -0.71149948, -1.8434118 ,  0.89097775,  0.53842254]
                                                                                    ]


vector_2 = [
              [ 0.13288805],
              [-2.50839814],
              [-0.90620828],
              [ 0.09841538],
              [ 1.86783262],
              [ 1.98903307]
                               ]

#### *MY ANSWER IS WRONG*

In [19]:
# Your code here
vector_1 = np.array(vector_1).reshape(5,6)
vector_2 = np.array(vector_2)
vector_1.dot(vector_2)

array([[ 2.26183062],
       [-4.76584672],
       [-3.99252429],
       [-3.17338692],
       [ 1.0872375 ]])

In [20]:
print(vector_1)

[[ 0.80559827  0.29916789  0.39630405  0.92797795 -0.13808099  1.7249222 ]
 [ 1.59418491  1.95963002  0.64988373 -0.08225951 -0.50472891  0.74287965]
 [ 1.8927091   0.33783705  0.94361808  0.99034854 -1.0526394  -0.33825968]
 [-0.40148036  1.81821604 -0.7298026  -0.88302624  0.49319177 -0.02758864]
 [ 0.33430167  0.85938167 -0.71149948 -1.8434118   0.89097775  0.53842254]]


#### *Reshape() flattens your numbers into a list and then reshapes all the numbers into the desired shape. I should of used .T to transpose the matrix*

In [21]:
#SOLUTION
vector_1 = [
               [ 0.80559827,  0.29916789,  0.39630405,  0.92797795, -0.13808099],
               [ 1.7249222 ,  1.59418491,  1.95963002,  0.64988373, -0.08225951],
               [-0.50472891,  0.74287965,  1.8927091 ,  0.33783705,  0.94361808],
               [ 0.99034854, -1.0526394 , -0.33825968, -0.40148036,  1.81821604],
               [-0.7298026 , -0.88302624,  0.49319177, -0.02758864,  0.33430167],
               [ 0.85938167, -0.71149948, -1.8434118 ,  0.89097775,  0.53842254]
                                                                                    ]


vector_2 = [
              [ 0.13288805],
              [-2.50839814],
              [-0.90620828],
              [ 0.09841538],
              [ 1.86783262],
              [ 1.98903307]
                               ]

In [22]:
vector_1 = np.array(vector_1)
vector_2 = np.array(vector_2)

np.dot(vector_1.T, vector_2)

array([[-3.31869275],
       [-7.80043543],
       [-9.35675419],
       [-0.13185929],
       [ 1.207176  ]])