<H2>Manually Compute Gradients</H2>
<p>Welcome!</p>
<p>This code is meant to be written with greater detail and explanation than normal in order to help new Python/ML programmers get passed some hurdles.  </p>

<p>The code snippets come from a combination of O'Reilly's "Hands-On Machine Learning with Scikit-Learn & TensorFlow" as well as my own combined.</p>

In [3]:
# Imports
import tensorflow as tf
import numpy as np
from sklearn.datasets import fetch_california_housing

In [38]:
housing = fetch_california_housing() # sklearn makes using data sets easy!
m, n = housing.data.shape # Unlike C++, you can return multiple values from functions

print("m value = " , m)
print("n value = ", n)


#Print housing.data.shape before augmenting it with a bias
print("Shape before bias:", housing.data.shape)

# Adding bias: concatenating a vector of "1"s with length m
# to the housing.data data. It adds a bias "feature"
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

# Just to show that a feature was infact added tot he housing.data
print("Shape after bias added: ", housing_data_plus_bias.shape)

# Print out the first row of the dataset
print(housing_data_plus_bias[1])

m value =  20640
n value =  8
Shape before bias: (20640, 8)
Shape after bias added:  (20640, 9)
[  1.00000000e+00   8.30140000e+00   2.10000000e+01   6.23813708e+00
   9.71880492e-01   2.40100000e+03   2.10984183e+00   3.78600000e+01
  -1.22220000e+02]


In [5]:
n_epochs = 1000 # Number of times to traverse entire dataset
learning_rate = 0.01 # scaled down changes to weights. If a change of 10 was calculated, use 10*.01 instead

Remember when dealing with neural networks, it's always best to scale the data. StandardScaler removes the mean and scales the data to unit variance. What this means is that it take the average of the feature values, each column, and subracts it from the actual value. Thus, making the new average = 0. Then dividing it by the std deviation to reduce the range. 

Why do this? Think about data centered around the origin vs data that is say, in the top right corner of your coordinate system. Now draw a line through the data to separate two classes as best you can. If your data is very, very far away, each wiggle or bit you are off on that line drawn will have a scaled effect on how you classify your data. If your data was centered about the origin and you draw such a line, you have much better "wiggle" room. This is probably not the greatest analogy without pictures, sorry. 

In [35]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

scaled_housing_data = scaler.fit_transform(housing_data_plus_bias.data)

# Make sure the matrix scaled_housing_data is all of type tf.float32 and store it as X (inputs)
X = tf.constant(scaled_housing_data, dtype=tf.float32, name="X")

print(housing.target.shape)
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
print(y.shape)

# Creat reandom variables of size (n+1, 1) with min -1 and max 1.0. Seed 42 to make random numbers repeatable
# remember n = 8 so this is a 9 row, 1 column matrix
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name = "predictions")

error = y_pred - y

# The average of the errors^2 or mse
mse = tf.reduce_mean(tf.square(error), name = "mse") # cost function or loss function
gradients = 2/m * tf.matmul(tf.transpose(X), error)

# operate on the original theta values to reduce the loss
training_op = tf.assign(theta, theta-learning_rate*gradients)

init = tf.global_variables_initializer() # initialize all variables

with tf.Session() as sess:
    sess.run(init) #  run the session with all the initialized variables
    
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval()) # Print MSE every 100 iterations
        sess.run(training_op) # Apply the training_op each iteration
        
    best_theta = theta.eval()
    
print(best_theta)


(20640,)
(20640, 1)
Epoch 0 MSE = 5.67843
Epoch 100 MSE = 4.88732
Epoch 200 MSE = 4.8513
Epoch 300 MSE = 4.83743
Epoch 400 MSE = 4.82801
Epoch 500 MSE = 4.82122
Epoch 600 MSE = 4.81631
Epoch 700 MSE = 4.81276
Epoch 800 MSE = 4.81017
Epoch 900 MSE = 4.8083
(9, 1)
[[  9.04542923e-01]
 [  7.74078131e-01]
 [  1.31192401e-01]
 [ -1.17845133e-01]
 [  1.64778173e-01]
 [  7.44095247e-04]
 [ -3.91945131e-02]
 [ -8.61356437e-01]
 [ -8.23479652e-01]]
