# Learning Objectives

* Show how gradient descent can be implemented in Python
* Introduce the relationship between equations/mathematical objectives (theory) and their implementation (practice)
* Perform basic classifications in Python

## Goal: Regression Objective

Let's look at implementing this on the same PM2.5 dataset from our previous lecture on regression.

<table><tr>
<td> <img src="Datasets/gradient_linear.jpg" alt="Drawing" style="width: 300px;"/> </td>
</tr></table>


In [1]:
import pandas as pd
df = pd.read_csv("Datasets/Beijing_PM25_air_data.csv", sep=',', header=0)
df.head()

Unnamed: 0,No,year,month,day,hour,pm2.5,DEWP,TEMP,PRES,cbwd,Iws,Is,Ir
0,1,2010,1,1,0,,-21,-11.0,1021.0,NW,1.79,0,0
1,2,2010,1,1,1,,-21,-12.0,1020.0,NW,4.92,0,0
2,3,2010,1,1,2,,-21,-11.0,1019.0,NW,6.71,0,0
3,4,2010,1,1,3,,-21,-14.0,1019.0,NW,9.84,0,0
4,5,2010,1,1,4,,-20,-12.0,1018.0,NW,12.97,0,0


In [2]:
dataset = df.dropna(subset=['pm2.5'])

In [3]:
dataset = dataset.to_numpy()

### Code: Extrating features from the data

In [4]:
def feature(data):
    feat = [1, float(data[7])]  # Temperature
    return feat

In [5]:
X = [feature(d) for d in dataset]
y = [float(d[5]) for d in dataset]

In [6]:
X[0]

[1, -4.0]

### Code: Initialization

Initialize parameters (and include some utility functions)

* Initializing ```theta_0``` (the offset parameter) to the mean value will help the model to converge faster
* Generally speaking, initializing gradient descent algorithms with a "good guess" can help them to converge more quickly

In [8]:
theta = [0.0] * len(X[0])

theta[0] = sum(y) / len(y)

In [9]:
def inner(x, y):
    return sum([a*b for (a, b) in zip(x,y)])

In [10]:
def norm(x):
    return sum([a*a for a in x])

### Code: Derivative

Compute partial derivatives for each dimension:

In [11]:
def derivative(X, y, theta):
    dtheta = [0.0]*len(theta)
    K = len(theta)
    N = len(X)
    MSE = 0
    for i in range(N):
        error = inner(X[i], theta) - y[i]
        for k in range(K):
            dtheta[k] += 2*X[i][k]*error/N
        MSE += error*error/N
    return dtheta, MSE

In [18]:
learningRate = 0.003
K = len(X[0])

while(True):
    dtheta, MSE = derivative(X, y, theta)
    m = norm(dtheta)
    print("norm(dtheta) = " + str(m) + " MSE = " + str(MSE))
    for k in range(K):
        theta[k] -= learningRate * dtheta[k]                         # Update in direction of derivative
    if m < 0.01: break                                               # Stopping Condition

norm(dtheta) = 11.95593028441804 MSE = 8409.7226293839
norm(dtheta) = 11.885738815819712 MSE = 8409.686814314158
norm(dtheta) = 11.815959430776651 MSE = 8409.651209509286
norm(dtheta) = 11.746589710009204 MSE = 8409.615813734945
norm(dtheta) = 11.677627248440475 MSE = 8409.580625763723
norm(dtheta) = 11.60906965511098 MSE = 8409.545644375981
norm(dtheta) = 11.540914553101446 MSE = 8409.510868358528
norm(dtheta) = 11.473159579445952 MSE = 8409.476296505865
norm(dtheta) = 11.405802385050256 MSE = 8409.441927619411
norm(dtheta) = 11.338840634613472 MSE = 8409.407760507564
norm(dtheta) = 11.272272006542597 MSE = 8409.37379398577
norm(dtheta) = 11.206094192874932 MSE = 8409.340026876007
norm(dtheta) = 11.140304899199426 MSE = 8409.306458007897
norm(dtheta) = 11.074901844572562 MSE = 8409.273086217754
norm(dtheta) = 11.009882761443075 MSE = 8409.239910348288
norm(dtheta) = 10.945245395572206 MSE = 8409.206929249429
norm(dtheta) = 10.880987505955083 MSE = 8409.174141777681
norm(dtheta) = 10.8

norm(dtheta) = 5.031253948611018 MSE = 8406.189325821946
norm(dtheta) = 5.001716213350045 MSE = 8406.17425424601
norm(dtheta) = 4.972351889690081 MSE = 8406.159271153098
norm(dtheta) = 4.94315995955823 MSE = 8406.144376023623
norm(dtheta) = 4.914139410857545 MSE = 8406.129568341235
norm(dtheta) = 4.885289237433972 MSE = 8406.1148475923
norm(dtheta) = 4.856608439040147 MSE = 8406.100213266884
norm(dtheta) = 4.828096021299437 MSE = 8406.085664857423
norm(dtheta) = 4.799750995675498 MSE = 8406.071201859511
norm(dtheta) = 4.771572379434578 MSE = 8406.056823771438
norm(dtheta) = 4.743559195611311 MSE = 8406.042530095074
norm(dtheta) = 4.715710472976841 MSE = 8406.028320334884
norm(dtheta) = 4.688025246005451 MSE = 8406.014193997971
norm(dtheta) = 4.660502554837477 MSE = 8406.000150594498
norm(dtheta) = 4.633141445249991 MSE = 8405.98618963805
norm(dtheta) = 4.605940968622407 MSE = 8405.972310643958
norm(dtheta) = 4.57890018190183 MSE = 8405.95851313144
norm(dtheta) = 4.552018147573245 MSE =

norm(dtheta) = 2.1549672645707463 MSE = 8404.721705749284
norm(dtheta) = 2.1423157758545837 MSE = 8404.715250350042
norm(dtheta) = 2.129738562126594 MSE = 8404.708832849654
norm(dtheta) = 2.1172351873282067 MSE = 8404.702453025326
norm(dtheta) = 2.1048052179631664 MSE = 8404.696110655896
norm(dtheta) = 2.0924482230787933 MSE = 8404.689805521492
norm(dtheta) = 2.0801637742531804 MSE = 8404.683537403838
norm(dtheta) = 2.06795144557888 MSE = 8404.677306085367
norm(dtheta) = 2.0558108136493973 MSE = 8404.671111349737
norm(dtheta) = 2.0437414575442028 MSE = 8404.664952982754
norm(dtheta) = 2.031742958814054 MSE = 8404.658830770588
norm(dtheta) = 2.0198149014648865 MSE = 8404.65274450094
norm(dtheta) = 2.007956871946625 MSE = 8404.646693962904
norm(dtheta) = 1.9961684591362114 MSE = 8404.640678946305
norm(dtheta) = 1.9844492543249315 MSE = 8404.63469924348
norm(dtheta) = 1.9727988512019083 MSE = 8404.62875464645
norm(dtheta) = 1.9612168458437123 MSE = 8404.622844949057
norm(dtheta) = 1.94970

norm(dtheta) = 0.9175884211957104 MSE = 8404.090335434988
norm(dtheta) = 0.9122013975747072 MSE = 8404.08758671613
norm(dtheta) = 0.9068460003593087 MSE = 8404.084854134257
norm(dtheta) = 0.9015220438753678 MSE = 8404.08213759516
norm(dtheta) = 0.8962293435395239 MSE = 8404.079437004393
norm(dtheta) = 0.8909677158512894 MSE = 8404.076752268344
norm(dtheta) = 0.8857369783876454 MSE = 8404.074083294037
norm(dtheta) = 0.8805369497972864 MSE = 8404.071429988788
norm(dtheta) = 0.8753674497924552 MSE = 8404.068792260812
norm(dtheta) = 0.8702282991447484 MSE = 8404.066170018574
norm(dtheta) = 0.8651193196777127 MSE = 8404.063563170943
norm(dtheta) = 0.8600403342608224 MSE = 8404.060971627916
norm(dtheta) = 0.8549911668034753 MSE = 8404.058395299373
norm(dtheta) = 0.8499716422487744 MSE = 8404.055834095994
norm(dtheta) = 0.8449815865677759 MSE = 8404.053287929259
norm(dtheta) = 0.8400208267532701 MSE = 8404.050756710529
norm(dtheta) = 0.8350891908136163 MSE = 8404.048240352004
norm(dtheta) = 0

norm(dtheta) = 0.39767358664349645 MSE = 8403.825049832194
norm(dtheta) = 0.3953389048240111 MSE = 8403.823858564865
norm(dtheta) = 0.39301792957041154 MSE = 8403.822674291483
norm(dtheta) = 0.39071058041342704 MSE = 8403.821496970855
norm(dtheta) = 0.38841677735641345 MSE = 8403.820326562067
norm(dtheta) = 0.38613644087213356 MSE = 8403.819163024327
norm(dtheta) = 0.3838694919004383 MSE = 8403.818006317717
norm(dtheta) = 0.3816158518452253 MSE = 8403.816856401972
norm(dtheta) = 0.3793754425717473 MSE = 8403.815713237083
norm(dtheta) = 0.37714818640431563 MSE = 8403.814576783854
norm(dtheta) = 0.3749340061228013 MSE = 8403.813447002382
norm(dtheta) = 0.3727328249605943 MSE = 8403.812323853597
norm(dtheta) = 0.37054456660221663 MSE = 8403.81120729866
norm(dtheta) = 0.36836915517943286 MSE = 8403.810097298981
norm(dtheta) = 0.36620651526974607 MSE = 8403.808993815921
norm(dtheta) = 0.3640565718940627 MSE = 8403.807896811199
norm(dtheta) = 0.3619192505123794 MSE = 8403.806806246934
norm(d

norm(dtheta) = 0.17033001513222595 MSE = 8403.709048186485
norm(dtheta) = 0.16933003323004633 MSE = 8403.708537947401
norm(dtheta) = 0.16833592207124892 MSE = 8403.708030704118
norm(dtheta) = 0.16734764718943893 MSE = 8403.707526438735
norm(dtheta) = 0.1663651743209131 MSE = 8403.707025133694
norm(dtheta) = 0.16538846940276267 MSE = 8403.706526771524
norm(dtheta) = 0.16441749857236712 MSE = 8403.706031335529
norm(dtheta) = 0.16345222816569854 MSE = 8403.705538807917
norm(dtheta) = 0.16249262471636367 MSE = 8403.70504917215
norm(dtheta) = 0.1615386549546285 MSE = 8403.704562410869
norm(dtheta) = 0.16059028580588502 MSE = 8403.704078507151
norm(dtheta) = 0.15964748438966078 MSE = 8403.703597444472
norm(dtheta) = 0.15871021801893662 MSE = 8403.70311920597
norm(dtheta) = 0.1577784541980419 MSE = 8403.702643775234
norm(dtheta) = 0.15685216062232316 MSE = 8403.702171135683
norm(dtheta) = 0.1559313051769706 MSE = 8403.701701270847
norm(dtheta) = 0.1550158559354118 MSE = 8403.70123416443
norm(

norm(dtheta) = 0.07425525570421751 MSE = 8403.660026214284
norm(dtheta) = 0.07381931426550235 MSE = 8403.659803776127
norm(dtheta) = 0.07338593217344538 MSE = 8403.659582643604
norm(dtheta) = 0.07295509440246595 MSE = 8403.659362809305
norm(dtheta) = 0.07252678601528505 MSE = 8403.659144265866
norm(dtheta) = 0.07210099216217891 MSE = 8403.658927005286
norm(dtheta) = 0.07167769808075507 MSE = 8403.658711020458
norm(dtheta) = 0.07125688909520651 MSE = 8403.658496303324
norm(dtheta) = 0.07083855061593049 MSE = 8403.658282846805
norm(dtheta) = 0.07042266813899294 MSE = 8403.658070643578
norm(dtheta) = 0.07000922724551802 MSE = 8403.657859685889
norm(dtheta) = 0.06959821360131185 MSE = 8403.657649966988
norm(dtheta) = 0.06918961295641166 MSE = 8403.657441479361
norm(dtheta) = 0.06878341114445898 MSE = 8403.657234215625
norm(dtheta) = 0.06837959408228393 MSE = 8403.657028168605
norm(dtheta) = 0.06797814776928542 MSE = 8403.656823331483
norm(dtheta) = 0.06757905828725444 MSE = 8403.6566196966

norm(dtheta) = 0.032562699566725056 MSE = 8403.63875266322
norm(dtheta) = 0.03237152885483425 MSE = 8403.638655118675
norm(dtheta) = 0.03218148047743767 MSE = 8403.638558146804
norm(dtheta) = 0.031992547845441414 MSE = 8403.638461744074
norm(dtheta) = 0.03180472440853714 MSE = 8403.638365907544
norm(dtheta) = 0.031618003654787556 MSE = 8403.638270633724
norm(dtheta) = 0.03143237911045238 MSE = 8403.638175919104
norm(dtheta) = 0.03124784433998177 MSE = 8403.638081760679
norm(dtheta) = 0.031064392945360467 MSE = 8403.63798815481
norm(dtheta) = 0.03088201856626797 MSE = 8403.637895098798
norm(dtheta) = 0.030700714879741912 MSE = 8403.63780258884
norm(dtheta) = 0.030520475599897727 MSE = 8403.637710622033
norm(dtheta) = 0.03034129447772696 MSE = 8403.637619195222
norm(dtheta) = 0.030163165301002957 MSE = 8403.637528305033
norm(dtheta) = 0.029986081893876762 MSE = 8403.637437948717
norm(dtheta) = 0.02981003811682726 MSE = 8403.637348122378
norm(dtheta) = 0.029635027866284695 MSE = 8403.6372

norm(dtheta) = 0.014279519921088312 MSE = 8403.62942370378
norm(dtheta) = 0.014195687007199035 MSE = 8403.629380928307
norm(dtheta) = 0.01411234626372843 MSE = 8403.62933840366
norm(dtheta) = 0.014029494801233286 MSE = 8403.629296129042
norm(dtheta) = 0.013947129747203291 MSE = 8403.629254102287
norm(dtheta) = 0.013865248246013063 MSE = 8403.62921232246
norm(dtheta) = 0.013783847458829832 MSE = 8403.629170787826
norm(dtheta) = 0.013702924563385906 MSE = 8403.629129497016
norm(dtheta) = 0.013622476754125135 MSE = 8403.629088448784
norm(dtheta) = 0.01354250124183887 MSE = 8403.629047641416
norm(dtheta) = 0.013462995253760228 MSE = 8403.629007073558
norm(dtheta) = 0.013383956033406851 MSE = 8403.628966744092
norm(dtheta) = 0.013305380840394745 MSE = 8403.628926651045
norm(dtheta) = 0.01322726695051992 MSE = 8403.62888679359
norm(dtheta) = 0.013149611655547472 MSE = 8403.62884717003
norm(dtheta) = 0.013072412263144933 MSE = 8403.628807779294
norm(dtheta) = 0.012995666096777408 MSE = 8403.6

In [19]:
theta

[107.00031935967209, -0.680304871547903]

(**Almost**) identical to the result we got when using the regression library in the previous module.

Although a crude (and fairly slow) implementation, this type of approach can be extended to handle quite general and complex objectives. However it has several difficult issues to deal with:

* How to initialize?
* How to set parameters like the learning rate and convergence criteria?
* Mannually computing derivatives is time consuming - and difficult to debug



## Gradient Descent in TensorFlow

* Introduce the **tensorflow** library

Though often associated with deep learning, is really just a library that simplifies gradient descent and optimization problems, like those we saw in the above.

### Code: Gradient Descent in TensorFlow

Reading the data is much the same as before (except that we first import the tensorflow library)

In [7]:
import tensorflow as tf

Next we extract features from the data

In [8]:
def feature(data):
    feat = [1, float(data[7]), float(data[8]), float(data[10])]  # Temperature, pressure, and wind speed
    return feat

In [9]:
X = [feature(d) for d in dataset]
y = [float(d[5]) for d in dataset]

**ATTENTION**

We convert ```y``` to a native tensorflow vector. In particular we convert to **column** vector. We have to be careful about getting our matrix dimension correct or we may (accidentally) apply the wrong matrix operations.

In [10]:
y = tf.constant(y, shape=[len(y), 1])

K = len(X[0])

Next we write down the objective - note that we use native tensorflow operations to do so

In [11]:
def MSE(X, y, theta):
    return tf.reduce_mean((tf.matmul(X, theta) - y)**2)

Next we setup the variables we want to optimize - note that we explicitely indicate that these are **variables** to be optimized (rather than constants)

In [12]:
theta = tf.Variable(tf.constant([0.0]*K, shape=[K, 1]))  # Initialized to zero

Instructions for updating:
Colocations handled automatically by placer.


In [13]:
optimizer = tf.train.AdamOptimizer(0.01)  #  learning rate  = 0.01

Specify the objective we want to optimize - note that no computation is performed (yet)

In [14]:
objective = MSE(X, y, theta) 

We want to **minimize** the objective

In [15]:
train = optimizer.minimize(objective)

In [16]:
init = tf.global_variables_initializer()

In [17]:
sess = tf.Session()
sess.run(init)

Run 1000 iterations of gradient descent:

In [18]:
for iteration in range(1000):
    cvalues = sess.run([train, objective])
    print("objective = " + str(cvalues[1]))

objective = 18197.637
objective = 16256.282
objective = 14544.926
objective = 13065.934
objective = 11818.946
objective = 10800.018
objective = 10000.627
objective = 9406.96
objective = 8999.845
objective = 8755.393
objective = 8645.968
objective = 8641.214
objective = 8709.442
objective = 8819.535
objective = 8943.11
objective = 9056.542
objective = 9142.375
objective = 9189.907
objective = 9194.928
objective = 9158.809
objective = 9087.201
objective = 8988.594
objective = 8872.923
objective = 8750.35
objective = 8630.255
objective = 8520.496
objective = 8426.892
objective = 8352.982
objective = 8299.988
objective = 8267.004
objective = 8251.36
objective = 8249.131
objective = 8255.698
objective = 8266.351
objective = 8276.801
objective = 8283.605
objective = 8284.421
objective = 8278.105
objective = 8264.656
objective = 8245.03
objective = 8220.874
objective = 8194.209
objective = 8167.1294
objective = 8141.532
objective = 8118.9033
objective = 8100.2
objective = 8085.7993
objective 

objective = 7836.533
objective = 7836.533
objective = 7836.532
objective = 7836.532
objective = 7836.532
objective = 7836.5317
objective = 7836.5317
objective = 7836.531
objective = 7836.531
objective = 7836.531
objective = 7836.531
objective = 7836.53
objective = 7836.5293
objective = 7836.5293
objective = 7836.5293
objective = 7836.5293
objective = 7836.5283
objective = 7836.5293
objective = 7836.5283
objective = 7836.5283
objective = 7836.5283
objective = 7836.528
objective = 7836.528
objective = 7836.528
objective = 7836.528
objective = 7836.527
objective = 7836.527
objective = 7836.5264
objective = 7836.527
objective = 7836.5264
objective = 7836.5264
objective = 7836.5264
objective = 7836.5264
objective = 7836.5264
objective = 7836.5264
objective = 7836.5264
objective = 7836.5254
objective = 7836.5244
objective = 7836.5254
objective = 7836.5244
objective = 7836.5254
objective = 7836.5244
objective = 7836.5244
objective = 7836.5244
objective = 7836.5244
objective = 7836.5244
object