Problem 1. 
===
Create a computational graph for the following expression:
$$(x*y + z+1/x)* w = f$$
Calculate the forward values of all the nodes and function f starting with $x = -1, y = 2, z = 4, w = 5$. Subsequently, determine backward values, and finally the derivatives of  f with respect with x, y, z and w. Please, present your results as a simple graph. You can draw you graph by any means you find convenient, including by hand. Please place forward values above the lines representing propagation of values and backpropagation values (derivatives) below the lines.
(25%)

Solution:
---
* Black numbers are calculations for f starting with the given x, y, z, w values
* Red numbers are back-propagated derivative values
* NOTE: as there are two calculation paths from x, we need to add up the two back propagated derivative values to x
![hw3_p1.png](hw3_p1.png)

Problem 2. 
===
Create a computational graph for the following expression:
$$f =\frac{x+ \sigma(y)}{\sigma(z)+(x+y)^2}$$
Where$$\sigma(q) = \frac{1}{1+ e^{-q}}$$
Calculate forward computational values of all nodes and the derivatives of function f with respect to x,y and z. Please, present your results as a simple graph. Please place forward values above the lines representing propagation of values and backpropagation values (derivatives) below the lines. Perform your calculations manually. 
As values for x,y,z use (2, -4, 3)
(25%)

Solution:
---
* Representing Sigmoid function as a single node
* Black numbers are calculations for f starting with given x, y, z values
* Red numbers are back-propagated derivative values with precision of 4
![hw3_p2.png](hw3_p2.png)

Problem 3. 
===
Please perform calculations in problem 2 using TensorFlow. As you are moving forward, please calculate and store (cash) values of various derivatives you will need for backward calculations. 
(25%)

In [1]:
import tensorflow as tf

def sigma(x):
    return tf.div(tf.constant(1.0),
                  tf.add(tf.constant(1.0), tf.exp(tf.negative(x))))

def sigmaprime(x):
    return tf.multiply(sigma(x), tf.subtract(tf.constant(1.0), sigma(x)))

x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
z = tf.placeholder(tf.float32)

x_sigY = x + sigma(y)

x_y_2 = (x + y)**2

sigZ_x_y_2 = sigma(z) + x_y_2

f = tf.div(x_sigY, sigZ_x_y_2)


# Back propagation

# Upper branch of x
d_x_1 = tf.div(tf.constant(1, dtype=tf.float32), sigZ_x_y_2)

# Upper branch of y
d_y_1 = tf.multiply(sigmaprime(y), d_x_1)

d_y_z = tf.div(tf.negative(x_sigY), sigZ_x_y_2**2)
d_x_y_2 = 2*(x+y)

d_y_2 = d_x_y_2 * d_y_z
d_x_2 = d_y_2

d_x = d_x_1 + d_x_2
d_y = d_y_1 + d_y_2
d_z = tf.multiply(sigmaprime(z), d_y_z)


with tf.Session() as sess:
    result = sess.run([f, d_x, d_y, d_z], feed_dict={x: 2, y: -4, z: 3})
    print("Forward Calculated  f  =  {}".format(result[0]))
    print("Back propagation   dx  =  {}".format(result[1]))
    print("Back propagation   dy  =  {}".format(result[2]))
    print("Back propagation   dz  =  {}".format(result[3]))


Forward Calculated  f  =  0.40746209025382996
Back propagation   dx  =  0.5310063362121582
Back propagation   dy  =  0.33265751600265503
Back propagation   dz  =  -0.003716809442266822


Problem 4. 
===
Please examine attached Python code for classification of MNIST hand written digits dataset. Please try to find “optimal” values of two hyper parameters: learning_rate and batch_size. Optimal is a vague term. You would like to achieve the best accuracy in the shortest possible time. Please do not sweat it out. 
(25%)

In [1]:
"""
Simple logistic regression model to solve OCR task 
with MNIST in TensorFlow
MNIST dataset: yann.lecun.com/exdb/mnist/

"""

import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
import time
# Define paramaters for the model
learning_rate = 0.1
batch_size = 150
n_epochs = 30

# Step 1: Read in data
# using TF Learn's built in function to load MNIST data to the folder mnist
mnist = input_data.read_data_sets('./mnist', one_hot=True) 

# Step 2: create placeholders for features and labels
# each image in the MNIST data is of shape 28*28 = 784
# therefore, each image is represented with a 1x784 tensor
# there are 10 classes for each image, corresponding to digits 0 - 9. 
# each lable is one hot vector.
X = tf.placeholder(tf.float32, [batch_size, 784], name='X_placeholder') 
Y = tf.placeholder(tf.float32, [batch_size, 10], name='Y_placeholder')

# Step 3: create weights and bias
# w is initialized to random variables with mean of 0, stddev of 0.01
# b is initialized to 0
# shape of w depends on the dimension of X and Y so that Y = tf.matmul(X, w)
# shape of b depends on Y
w = tf.Variable(tf.random_normal(shape=[784, 10], stddev=0.01), name='weights')
b = tf.Variable(tf.zeros([1, 10]), name="bias")

# Step 4: build model
# the model that returns the logits.
# this logits will be later passed through softmax layer
logits = tf.matmul(X, w) + b 

# Step 5: define loss function
# use cross entropy of softmax of logits as the loss function
entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=Y, name='loss')
loss = tf.reduce_mean(entropy) # computes the mean over all the examples in the batch

# Step 6: define training op
# using gradient descent with learning rate of 0.01 to minimize loss
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

with tf.Session() as sess:
	# to visualize using TensorBoard
	writer = tf.summary.FileWriter('./logistic_reg', sess.graph)

	start_time = time.time()
	sess.run(tf.global_variables_initializer())	
	n_batches = int(mnist.train.num_examples/batch_size)
	for i in range(n_epochs): # train the model n_epochs times
		total_loss = 0

		for _ in range(n_batches):
			X_batch, Y_batch = mnist.train.next_batch(batch_size)
			_, loss_batch = sess.run([optimizer, loss], feed_dict={X: X_batch, Y:Y_batch}) 
			total_loss += loss_batch
		print ('Average loss epoch {0}: {1}'.format(i, total_loss/n_batches))

	print ('Total time: {0} seconds'.format(time.time() - start_time))

	print('Optimization Finished!') # should be around 0.35 after 25 epochs

	# test the model
	n_batches = int(mnist.test.num_examples/batch_size)
	total_correct_preds = 0
	for i in range(n_batches):
		X_batch, Y_batch = mnist.test.next_batch(batch_size)
		_, loss_batch, logits_batch = sess.run([optimizer, loss, logits], feed_dict={X: X_batch, Y:Y_batch}) 
		preds = tf.nn.softmax(logits_batch)
		correct_preds = tf.equal(tf.argmax(preds, 1), tf.argmax(Y_batch, 1))
		accuracy = tf.reduce_sum(tf.cast(correct_preds, tf.float32))
		total_correct_preds += sess.run(accuracy)	
	
	print ('Accuracy {0}'.format(total_correct_preds/mnist.test.num_examples))

	writer.close()


Extracting ./mnist/train-images-idx3-ubyte.gz
Extracting ./mnist/train-labels-idx1-ubyte.gz
Extracting ./mnist/t10k-images-idx3-ubyte.gz
Extracting ./mnist/t10k-labels-idx1-ubyte.gz
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

Average loss epoch 0: 0.6200318447227686
Average loss epoch 1: 0.39352775723393496
Average loss epoch 2: 0.35750633966727335
Average loss epoch 3: 0.3388547466996589
Average loss epoch 4: 0.32706829347734245
Average loss epoch 5: 0.31817121600192755
Average loss epoch 6: 0.31099098273476616
Average loss epoch 7: 0.30674878495638486
Average loss epoch 8: 0.30279367152458986
Average loss epoch 9: 0.2980282822104751
Average loss epoch 10: 0.29580685146193686
Average loss epoch 11: 0.29321130267420753
Average loss epoch 12: 0.28969128307749015
Average loss epoch 13: 0.2879804641904075
Average loss epoch 14: 0.2854545149402540

Answer:
---
Learning Rate: 0.1

Batch Size : 150

Total Time: 12.00

Accuracy : 0.915