#### Implementing Back propagation in tensorflow
Tensorflow can keep track of our operations with the aid of its computation graph and then automatically update the model's variables based on a system called back propagation.
> We normally used backpropagation to minimize our loss function with respect to the models variables or parameters.
This minimization is done via declaring an optimization function. Once we have declared an optimization function, Tensorflow will go through and automatically figure out the backpropagation values for all our computation on the graph and then it automatically modify or update the variables in the graph accordingly.

In [19]:
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np

#start a session
sess = tf.Session()

#### A simple regression algorithm 
In this example we will sample random numbers from a normal distribution with mean 1 and standard deviation 0.1. Then we will apply a simple linear operation on it, i.e multiply it by a variable $W$, then we would apply a loss function (L2 norm) between the output and the target which for this example we will set to a constant 10.

In [20]:
# We create our data, placeholders and variables
x_val = np.random.normal(1, 0.1, 100)   #input values
y_val = np.repeat(10., 100)   #target values

# placeholders
x_data = tf.placeholder(shape=[1], dtype=tf.float32)
y_target = tf.placeholder(shape=[1], dtype=tf.float32)

#Variables
w = tf.Variable(tf.random_normal(shape=[1]))

# Add linear function to the computational graph
y_pred = tf.multiply(x_data, w)

# Add a loss function(L2 norm)
loss = tf.square(y_pred - y_target)

# initialize our variables
init = tf.global_variables_initializer()
sess.run(init)


>Most optimization algorithms need
to know how far to step in each iteration. This distance is controlled
by the learning rate. If our learning rate is too big, our algorithm might
overshoot the minimum, but if our learning rate is too small, out
algorithm might take too long to converge; this is related to the
vanishing and exploding gradient problem.

In [21]:
# Declare an optimizer: here i use gradient descent
my_opt = tf.train.GradientDescentOptimizer(learning_rate=0.02)

#Create the train step
train_step = my_opt.minimize(loss)


> Now we train the model by looping through the algorithm a number of times. We will print the updated result for every 20th iteration. To train, we will use a stochastic gradient descent training by picking a value from the input and target at random, and feeding it to the graph.

>Tensorflow will automatically compute the loss and update the variable W to minimize the loss.

In [22]:
n_iterations = 100

for i in range(n_iterations):
    rand_index = np.random.choice(100)
    rand_x = [x_val[rand_index]]
    rand_y = [y_val[rand_index]]
    #Run the graph
    sess.run(train_step, feed_dict={x_data: rand_x, y_target: rand_y})
    #Print the result on every 20th iteration
    if(i + 1) % 20 == 0:
        print('Step #' + str(i+1) + ' W = ' + str(sess.run(w)))
        print('Loss = ' + str(sess.run(loss, feed_dict={x_data: rand_x, y_target: rand_y})))
        
        

Step #20 W = [5.460243]
Loss = [18.638432]
Step #40 W = [8.000424]
Loss = [6.931777]
Step #60 W = [9.075418]
Loss = [0.8593326]
Step #80 W = [9.735153]
Loss = [0.7545254]
Step #100 W = [9.869209]
Loss = [0.02213772]


#### Another sample for a binary classification problem
In this example, we will create two target classes ```Y(1, 0 )``` from two different normal distribution $N(-1,1)$ and $N(3,1)$. We will also generate target placeholder for the data and the weight label W.

In [66]:
# Reset the graph to re-initialize the session
from tensorflow.python.framework import ops
ops.reset_default_graph
sess = tf.Session()

In [67]:
# Create the data
x_vals = np.concatenate((np.random.normal(-1, 1, 50), np.random.normal(3, 1, 50)))
y_vals = np.concatenate((np.repeat(0., 50), np.repeat(1., 50)))

#create placeholders
x_data = tf.placeholder(shape=[1], dtype=tf.float32)
y_target = tf.placeholder(shape=[1], dtype=tf.float32)

#Variable
w = tf.Variable(0.)

> We initialize w to be around a mean of 10 just to show how the algorithm converges from a far off value

Next we create the function. Since this is a binary classification and the predictions are probabilities between 0 and 1, we apply the sigmoid function to the linear function above. 
>This is automatically done by the Tensorflow function ```tf.nn.sigmoid_cross_entropy_with_logits```

In [68]:
# Create the function
y_pred = tf.add(x_data, w)

# initialize our variables
init = tf.global_variables_initializer()
sess.run(init)

In [75]:
#loss function: Sigmoid cross entropy loss
x_entropy_loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=y_target, logits=y_pred)

# Declare an optimizer: here i use gradient descent
my_opt = tf.train.GradientDescentOptimizer(learning_rate=0.05)
my_opt2 = tf.train.GradientDescentOptimizer(learning_rate=0.001)

#Create the train step
train_step = my_opt.minimize(x_entropy_loss)
train_step2 = my_opt2.minimize(x_entropy_loss)

>Finally, we loop through a randomly selected data point several
hundred times and update the variable W accordingly. Every 200
iterations, we will print out the value of w and the loss:

In [76]:
for i in range(1400):
    rand_index = np.random.choice(100)
    rand_x = [x_vals[rand_index]]
    rand_y = [y_vals[rand_index]]
    #Run the graph
    sess.run(train_step, feed_dict={x_data: rand_x, y_target: rand_y})
    #Print the result on every 20th iteration
    if(i + 1) % 200 == 0:
        print('Step #' + str(i+1) + ' W = ' + str(sess.run(w)))
        print('Loss = ' + str(sess.run(x_entropy_loss, feed_dict={x_data: rand_x, y_target: rand_y})))
        

Step #200 W = -0.8044722
Loss = [0.11743699]
Step #400 W = -0.94036
Loss = [0.08700228]
Step #600 W = -0.786594
Loss = [0.17700219]
Step #800 W = -0.90714014
Loss = [0.15840782]
Step #1000 W = -0.88285613
Loss = [0.41357884]
Step #1200 W = -1.0001022
Loss = [0.12860726]
Step #1400 W = -0.870744
Loss = [0.14574195]


In [77]:
for i in range(1400):
    rand_index = np.random.choice(100)
    rand_x = [x_vals[rand_index]]
    rand_y = [y_vals[rand_index]]
    #Run the graph
    sess.run(train_step2, feed_dict={x_data: rand_x, y_target: rand_y})
    #Print the result on every 20th iteration
    if(i + 1) % 200 == 0:
        print('Step #' + str(i+1) + ' W = ' + str(sess.run(w)))
        print('Loss = ' + str(sess.run(x_entropy_loss, feed_dict={x_data: rand_x, y_target: rand_y})))
        

Step #200 W = -0.8709925
Loss = [0.1203137]
Step #400 W = -0.8737013
Loss = [0.28539962]
Step #600 W = -0.8747969
Loss = [0.23416257]
Step #800 W = -0.8769944
Loss = [0.10669471]
Step #1000 W = -0.8776215
Loss = [0.08085661]
Step #1200 W = -0.8792195
Loss = [0.17069075]
Step #1400 W = -0.8769954
Loss = [0.8258968]
