## Batch Gradient Descent
 The first way to calculate the gradient is explicit to calculate the gradient 
 and when it come to linear regression its fine.
 But, for example, when we want to calculate the gradient in neural net, this way
 could be very inefficient and the explicit equation of the gradient could be
 much more complicated.

In [1]:
import tensorflow as tf
from sklearn import datasets
import numpy as np
from sklearn.preprocessing import StandardScaler

In [2]:
houses = datasets.fetch_california_housing()

iteration = 1000
learningRate = 0.01
m, n = houses.data.shape
scaler = StandardScaler()
scaled_data = scaler.fit_transform(houses.data)
scaled_data_with_bias = np.c_[np.ones((m, 1)), scaled_data] 

In [3]:
X = tf.constant(scaled_data_with_bias, dtype=tf.float32, name="X")
Y = tf.constant(houses.target.reshape(-1, 1), dtype=tf.float32, name="Y")
theta = tf.Variable(tf.random_uniform([n+1, 1], -1.0, 1.0), name="theta")

y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - Y
mse = tf.reduce_mean(tf.square(error), name="mse") # mse - Mean Squared Error
init_node = tf.global_variables_initializer()

In [4]:
gradient = (2/m) * tf.matmul(tf.transpose(X), error)
training_node = tf.assign(theta, theta - learningRate * gradient)
with tf.Session() as sess:
    sess.run(init_node)
    for iter in range(0, iteration):
        if iter % 100 == 0:
            print("Iteration : ", iter, "MSE : ", mse.eval())
        sess.run(training_node)
    best_theta = theta.eval()

Iteration :  0 MSE :  10.06712


Iteration :  100 MSE :  0.7189078
Iteration :  200 MSE :  0.5707189
Iteration :  300 MSE :  0.5565982
Iteration :  400 MSE :  0.5479179
Iteration :  500 MSE :  0.5416264
Iteration :  600 MSE :  0.5370417
Iteration :  700 MSE :  0.53369474


Iteration :  800 MSE :  0.5312474
Iteration :  900 MSE :  0.5294548


We can use tensorflow for much more easy way calculate the gradient of the loss function 

In [5]:
gradient2 = tf.gradients(mse, [theta])[0]
training_node = tf.assign(theta, theta - learningRate * gradient2)
with tf.Session() as sess_op1:
    sess_op1.run(init_node)
    for iter in range(0, iteration):
        if iter % 100 == 0:
            print("Iteration : ", iter, "MSE : ", mse.eval())
        sess_op1.run(training_node)
    best_theta = theta.eval()

Iteration :  0 MSE :  7.13495
Iteration :  100 MSE :  0.818513
Iteration :  200 MSE :  0.6778849
Iteration :  300 MSE :  0.63955176
Iteration :  400 MSE :  0.6121939


Iteration :  500 MSE :  0.591694
Iteration :  600 MSE :  0.5762431
Iteration :  700 MSE :  0.56454366
Iteration :  800 MSE :  0.5556427
Iteration :  900 MSE :  0.54883736


In addition, google give as some optimization for calculating the gradient and make the
training more efficient so we can use -

In [6]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learningRate)
training_node_optimizer = optimizer.minimize(mse)

training_node = tf.assign(theta, theta - learningRate * gradient2)
with tf.Session() as sess_op2:
    sess_op2.run(init_node)
    for iter in range(0, iteration):
        if iter % 100 == 0:
            print("Iteration : ", iter, "MSE : ", mse.eval())
        sess_op2.run(training_node_optimizer)
    best_theta = theta.eval()

Iteration :  0 MSE :  3.9333186
Iteration :  100 MSE :  0.9359435
Iteration :  200 MSE :  0.79566395
Iteration :  300 MSE :  0.7240079
Iteration :  400 MSE :  0.67254627
Iteration :  500 MSE :  0.63477564
Iteration :  600 MSE :  0.6069445


Iteration :  700 MSE :  0.5863774
Iteration :  800 MSE :  0.57113194
Iteration :  900 MSE :  0.5597934
