Exercises
========================

In [1]:
import pandas as pd
data = pd.DataFrame([[0,0,1],[0,1,1.5],[1,0,1.75],[1,1,2.25]],columns=['X1','X2','t'])
data

Unnamed: 0,X1,X2,t
0,0,0,1.0
1,0,1,1.5
2,1,0,1.75
3,1,1,2.25


In [2]:
X = data.iloc[:,:2]
X

Unnamed: 0,X1,X2
0,0,0
1,0,1
2,1,0
3,1,1


Because this is a learning exercise, here are the answers we are going to hunt for using gradient descent. In other words, your last theta should be w and b given below.

In [3]:
import numpy as np
w = np.array([0.75,0.5])
b = 1
y = np.dot(X,w)+b
y

array([1.  , 1.5 , 1.75, 2.25])

Our y functions are:

$y_1 = 0*w_1+0*w_2+b$

$y_2 = 0*w_1+1*w_2+b$

$y_3 = 1*w_1+0*w_2+b$

$y_4 = 1*w_1+1*w_2+b$

Our Loss functions are:

$L_1 = (b - 1)^2$

$L_2 = (1*w_2+b - 1.5)^2$

$L_3 = (1*w_1+b - 1.75)^2$

$L_4 = (1*w_1+1*w_2+b - 2.25)^2$

Our derivatives are:

$\frac{\delta L_1}{\delta w1} = 0$

$\frac{\delta L_2}{\delta w1} = 0$

$\frac{\delta L_3}{\delta w1} = 2(w_1+b-1.75)$

$\frac{\delta L_4}{\delta w1} = 2(w_1+w_2+b-2.25)$

$\frac{\delta L_1}{\delta w2} = 0$

$\frac{\delta L_2}{\delta w2} = 2(w_2+b-1.5)$

$\frac{\delta L_3}{\delta w2} = 0$

$\frac{\delta L_4}{\delta w2} = 2(w_1+w_2+b-2.25)$

$\frac{\delta L_1}{\delta b} = 2(b-1)$

$\frac{\delta L_2}{\delta b} = 2(w_2+b-1.5)$

$\frac{\delta L_3}{\delta b} = 2(w_1+b-1.75)$

$\frac{\delta L_4}{\delta b} = 2(w_1+w_2+b-2.25)$

## Gradient descent

Our first take on gradient descent will assume you can symbolically find the gradient of a function. So you can use the derivatives I supplied above:

Gradient descent says:

$w_1 = w_1 - \alpha \frac{1}{4} \left(\sum_{i=1}^4\frac{dF_i}{dw_1}\right)$

$w_2 = w_2 - \alpha \frac{1}{4} \left(\sum_{i=1}^4\frac{dF_i}{dw_2}\right)$

$b = b - \alpha \frac{1}{4} \left(\sum_{i=1}^4\frac{dF_i}{db}\right)$

We will set $\alpha=0.1$.

We will now define the derivatives programmatically:

In [4]:
import random
gradients_w1 = [lambda w1,w2,b: 0, lambda w1,w2,b: 0, lambda w1,w2,b: 2*(w1+b-1.75), lambda w1,w2,b: 2*(w1+w2+b-2.25)]
print('This should be all zeros:',[gradients_w1[i](0.75,0.5,1) for i in range(4)])
print('This should be non-zero:',[gradients_w1[i](0.75-random.random(),0.5-random.random(),1-random.random()) for i in range(4)])
print('This should be non-zero:',[gradients_w1[i](0.75+random.random(),0.5+random.random(),1+random.random()) for i in range(4)])

This should be all zeros: [0, 0, 0.0, 0.0]
This should be non-zero: [0, 0, -1.8807649980801988, -3.4333866550552794]
This should be non-zero: [0, 0, 0.6887773670465123, 3.1686107842609683]


In [5]:
import random
gradients_w2 = [lambda w1,w2,b: 0, lambda w1,w2,b: 2*(w2+b-1.5), lambda w1,w2,b: 0, lambda w1,w2,b: 2*(w1+w2+b-2.25)]
print('This should be all zeros:',[gradients_w2[i](0.75,0.5,1) for i in range(4)])
print('This should be non-zero:',[gradients_w2[i](0.75-random.random(),0.5-random.random(),1-random.random()) for i in range(4)])
print('This should be non-zero:',[gradients_w2[i](0.75+random.random(),0.5+random.random(),1+random.random()) for i in range(4)])

This should be all zeros: [0, 0.0, 0, 0.0]
This should be non-zero: [0, -2.947693247297475, 0, -3.341903704289769]
This should be non-zero: [0, 0.5533862963790517, 0, 2.909228712911399]


In [6]:
import random
gradients_b = [lambda w1,w2,b: 2*(b-1), lambda w1,w2,b: 2*(w2+b-1.5), lambda w1,w2,b: 2*(w2+b-1.5), lambda w1,w2,b: 2*(w1+w2+b-2.25)]
print('This should be all zeros:',[gradients_b[i](0.75,0.5,1) for i in range(4)])
print('This should be non-zero:',[gradients_b[i](0.75-random.random(),0.5-random.random(),1-random.random()) for i in range(4)])
print('This should be non-zero:',[gradients_b[i](0.75+random.random(),0.5+random.random(),1+random.random()) for i in range(4)])

This should be all zeros: [0, 0.0, 0.0, 0.0]
This should be non-zero: [-1.5138719000545129, -1.6266335216261116, -0.11626299567275833, -1.824109859920978]
This should be non-zero: [0.4678778186971382, 1.6580484167126421, 1.3700430059368784, 3.0618002911413207]


In [7]:
import gradient_descent
thetas = gradient_descent.minimize_gradient_descent([gradients_w1,gradients_w2,gradients_b],0.1,[0.5,-0.2,2.5])
pd.Series(thetas)
# please note that I only add the pd.Series, so the output is nicely formatted

0    [0.7499998546511271, 0.4999998546478536, 1.000...
dtype: object

In [8]:
print('This should be all zeros, but is it...:',[gradients_w1[i](0.75,0.5,1) for i in range(4)])
print('This should be all zeros, but is it...:',[gradients_w2[i](0.75,0.5,1) for i in range(4)])
print('This should be all zeros, but is it...:',[gradients_b[i](0.75,0.5,1) for i in range(4)])

This should be all zeros, but is it...: [0, 0, 0.0, 0.0]
This should be all zeros, but is it...: [0, 0.0, 0, 0.0]
This should be all zeros, but is it...: [0, 0.0, 0.0, 0.0]


In [9]:
import numpy as np
w_predicted = np.array(thetas[-1][:2])
b_predicted = thetas[-1][-1]
y = np.dot(X,w_predicted)+b_predicted
y

array([1.00000017, 1.50000003, 1.75000003, 2.24999988])

Now what if you can't or don't want to find the derivatives symbolically? 
You can always estimate the gradient analytically using the difference quotient:

$[L(\theta+h)-L(\theta)]/h$,

where h is a scalar parameter. Let's give it a shot with our functions. As a reminder, they are:

$L_1 = (b - 1)^2$

$L_2 = (1*w_2+b - 1.5)^2$

$L_3 = (1*w_1+b - 1.75)^2$

$L_4 = (1*w_1+1*w_2+b - 2.25)^2$

In [10]:
F1_func = lambda w1,w2,b: (b-1)**2
F1_func(w[0],w[1],b)

0

In [11]:
F2_func = lambda w1,w2,b: (w2+b-1.5)**2
F2_func(w[0],w[1],b)

0.0

In [12]:
F3_func = lambda w1,w2,b: (w1+b-1.75)**2
F3_func(w[0],w[1],b)

0.0

In [13]:
F4_func = lambda w1,w2,b: (w1+w2+b-2.25)**2
F4_func(w[0],w[1],b)

0.0

In [14]:
R_func = lambda w1,w2,b: 1/4*(F1_func(w1,w2,b)+F2_func(w1,w2,b)+F3_func(w1,w2,b)+F4_func(w1,w2,b))
R_func(w[0],w[1],b)

0.0

In [15]:
thetas = gradient_descent.minimize_gradient_descent_analytically(R_func,0.1,[0.5,-0.2,2.5],0.01)
pd.Series(thetas)

0    [0.7499999045018414, 0.4999999044985678, 0.995...
dtype: object

In [16]:
R_func(*thetas[-1]) # Shouldn't be too bad :)

2.4999822240924255e-05

In [20]:
# Good job!
# Woohoo!

# Test your code

In [24]:
%%bash

pytest test_Assignment4.py

platform darwin -- Python 3.10.7, pytest-8.3.3, pluggy-1.5.0
rootdir: /Users/michael.murray.iv/Desktop/Fall 2024/487/stochastic_gradient_descent
plugins: anyio-4.2.0
collected 2 items

test_Assignment4.py [32m.[0m[32m.[0m[32m                                                   [100%][0m



# Colab End Section - Submit your code

In [25]:
%%bash 

git add .
#git commit -m update
#git push
#./command_line_sync.sh # if you want to sync