# Basic Models in Tensorflow

In [1]:
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Linear Regression using Dummy Data
We're going to build a simple linear regression model that learns a function between 1d input and 1d output. 

This means for scalar input $X$ and scalar output $Y$, we learn a function the weights of the function $f(x)$

$$f(X) = w * X + b$$ where we learn $w, b$

We're going to use a dummy dataset for ease. We load the csv in the next cell

In [23]:
df = pd.read_csv("linreg.csv", header = None)

Create a placeholder value for the feature X and the output y. Both X and y should be float scalars represented by tf.float32. The placeholders set the type and shape of input. By default, placeholders are scalars.

In [24]:
#TODO: Fill in the dots
X = tf.placeholder(...)
y = tf.placeholder(...)

Create a variable for the weight, $w$, and another for the bias, $b$. Both can initially be set to a small number such as 0.1. These will be the values we update in training.

In [25]:
#TODO: Fill in the dots
w = tf.Variable(...)
b = tf.Variable(...)

We simply compute the predicted value of y using $\hat{y}  = w * X + b$


In [26]:
y_pred = tf.add(tf.multiply(w, X), b) 
# We could have also written w* X + b but using tf methods can make the operation more clear.

Compute the loss. We're using Mean Squared Error for our  loss function, $$\mathcal{L}_{MSE} = (y - \hat{y})^2$$

In [27]:
#TODO: Fill in the dots
loss = tf.square(...)

## Optimizer
The optimizer powers our algorithm's learning abilities. The optimizer performs gradient descent on the loss function, and adjusts all of the variables in order to decrease the loss.

We will use the simplest optimizer, GradientDescentOptimizer, with an appropriate learning rate. Set it to minimize the loss.

In [28]:
#TODO: Fill in the dots
optimizer = tf.train.GradientDescentOptimizer(learning_rate= ... ).minimize(...)

# Training 
* For each epoch, run the optimizer on each X,y pair (each batch) from the dataset and sum up the loss over all data points
* Print the loss after each epoch (full iteration through dataset)
* Get the final values of the weights and the bias 

In [None]:
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
for i in range(5):
    total_loss = 0
    for row in df.values:
        x_batch = row[0]
        y_batch = row[1]
        #TODO: Fill in the dots
        _, l = sess.run([optimizer, loss], feed_dict={X: ..., y: ...})
        total_loss += l
    # Print the loss after each epoch ()
    print("Epoch {0}: {1}".format(i, total_loss))
# Get the final values of the weights and the bias 
weight, bias = sess.run([w, b]) 

**NOTE:** In the last code block, if your loss gradually increased, stayed the same, or return NaN, then go back to the optimizer and lower the learning rate.

Use the code below to plot the data and the regression line.

In [None]:
plt.scatter(df.values[:,0], df.values[:,1])
plt.plot(df.values[:,0], weight*df.values[:,0]+bias, 'ro')
plt.show()

## Logistic Regression Using MNIST

Tensorflow provides a convenient interface for MNIST data. This makes it really easy to test your code on a dataset that is commonly used. The code below shows you how to read MNIST images and store the labels as one-hot vectors

In [31]:
from tensorflow.examples.tutorials.mnist import input_data
MNIST = input_data.read_data_sets("../data/mnist", one_hot = True)

Extracting ../data/mnist/train-images-idx3-ubyte.gz
Extracting ../data/mnist/train-labels-idx1-ubyte.gz
Extracting ../data/mnist/t10k-images-idx3-ubyte.gz
Extracting ../data/mnist/t10k-labels-idx1-ubyte.gz


Create placeholders for X and Y. 
* Note that each MNIST image is 28x28. Additionally, the data will already be flattened into a 784 dimensional vector when we input it into the model
* Each label is 10d - a vector element for every possible digit.
* Make sure the shapes of the placeholders are defined so a variable number of images and labels can be fed in each batch. *This is what index 0 manages. Just put None instead of a dimension in this piece of the net*

In [32]:
#TODO: Fill in the dots
X = tf.placeholder (...)
Y = tf.placeholder (...)

Create a weights variable and a biases variable of the appropriate shapes.
* Initialize the weights variable from a truncated normal distribution using tf.truncated_normal(...) - this is better than setting weights to zero because it removes symmetry from backpropagation. [Here's a more in depth discussion](https://datascience.stackexchange.com/a/10930)
* The bias variable should also be set to a small value, such as 0.1. Do this by using tf.constant(...) and inputting the value and the appropriate shape
* When you multiply the feature vector X and the weights variable, the result should be the same shape as the bias tensor so they can be added

In [33]:
#TODO: Fill in the dots
W = tf.Variable (tf.truncated_normal(...))
b = tf.Variable(tf.constant(...))

Declare the logits as $X \cdot W + b$. Logits are a quantity that will be mapped to a probability with softmax. This is the exact same expression as multi-dimensional linear regression.
* Make sure to use tf.matmul() when multiplying matrices. Using \* will multiply element wise

In [34]:
#TODO: Fill in the dots
logits = ...

Compute the entropy using tf.nn.softmax_cross_entropy_with_logits(...). This will apply the softmax function to the logits before calculating the entropy. Compute the loss as the mean over the entropy.

In [35]:
#TODO: Fill in the dots
entropy = tf.nn.softmax_cross_entropy_with_logits(logits = ..., labels = ...)
loss = tf.reduce_mean(...)

Declare the optimizer as the GradientDescentOptimizer with an appropriate learning rate. *(Hint: Try 0.01).* Set it to minimize the loss.
* Note: When running the optimizer, if the loss is nan or increasing with each epoch, try decreasing the learning rate

In [36]:
#TODO: Fill in the dots
optimizer = ...

Compute the accuracy by:
* using tf.equal on the predicted label and the true label
* casting that to a float and computing the mean over all examples

In [37]:
#TODO: Fill in the dots
Y_pred = tf.nn.softmax(logits)
y_pred_cls = tf.argmax(Y_pred, 1)
y_cls = ...
accuracy = tf.reduce_mean(tf.cast(tf.equal(..., ...), tf.float32))

Start an Interactive Session and initialize all the global variables.
* For each epoch, run the optimizer on each X,y pair and sum up the loss over all data points
* Print the loss after each epoch

In [None]:
#TODO: Fill in the dots

sess = tf.InteractiveSession()
batch_size = 128
# initialize all the global variables
sess.run(...)
n_batches = (int) (MNIST.train.num_examples/batch_size)
for i in range(25):
    total_loss = 0
    for batch in range(n_batches):
        # iterate through the batches 
        X_batch, y_batch = MNIST.train.next_batch(batch_size)
        o, l = sess.run(...)
        total_loss += l
    print("Epoch {0}; Loss: {1}".format(i, total_loss))

Run the accuracy by feeding in all the test examples.

In [None]:
#TODO: Fill in the dots

print("Computing accuracy ...")
X_batch, y_batch = MNIST.test.next_batch(MNIST.test.num_examples)
final_accuracy = sess.run(...)

print ("Accuracy {0}".format(final_accuracy))

# Basic Feed Forward Network on MNIST
Here we build a simple fully-connected network for MNIST. The network will have 2 hidden layers:  784 input neurons (28x28 shaped mnist), 2x layers with 256 hidden neurons , and 10 output neurons ( 1 for each digit)

Create placeholders for X and Y.

In [20]:
#TODO: Fill in the dots
X = tf.placeholder(...)
Y = tf.placeholder(...)

Declare each layer in the network and the final logits by:
* Creating variables for weights and biases of the appropriate sizes
* Applying ReLu on $X \cdot W + b$

This formulation is nearly the same as the logistic regression setup - except instead of applying softmax on the output of the hidden layers, we apply [relu](https://en.wikipedia.org/wiki/Rectifier_(neural_networks))

Network Configurations:
* First layer has 784 input features and 256 output features
* Second layer has 256 input features and 256 output features
* Third layer has 256 input features and 10 output features



In [40]:
#TODO: Fill in the dots

W1 = tf.Variable(tf.truncated_normal(...))
b1 = tf.Variable(tf.constant(...))
layer1 = tf.nn.relu(tf.matmul(X, W1) + b1)

W2 = tf.Variable(...)
b2 = tf.Variable(tf.constant(...))
layer2 = tf.nn.relu(...)

W_out = tf.Variable(...)
b_out = tf.Variable(tf.constant(...))
logits = tf.matmul(...) + ...

Compute the entropy using tf.nn.softmax_cross_entropy_with_logits. This will apply the softmax function to the logits before calculating the entropy. The loss as the mean over the entropy.

In [41]:
#TODO: Fill in the dots

entropy = tf.nn.softmax_cross_entropy_with_logits(logits = ..., labels=...)
loss = tf.reduce_mean(...)

Declare the optimizer as the GradientDescentOptimizer with an appropriate learning rate. Set it to minimize the loss.

In [42]:
#TODO: Fill in the dots

optimizer = ...

Compute the accuracy by:
* using tf.equal on the predicted label and the true label
* casting that to a float and computing the mean over all examples

In [46]:
#TODO: Fill in the dots

Y_pred = tf.nn.softmax(logits)
y_pred_cls = ...
y_cls = ...
accuracy = tf.reduce_mean(tf.cast(tf.equal(...), tf.float32))

Start an Interactive Session and initialize all the global variables.
* For each epoch, run the optimizer on each X,y batch and sum up the loss over all data points
* Print the loss after each epoch

We set the batch size to 128 and epochs to 25. Feel free to play around with these variables. Additionally, every 5 epochs we calculate validation accuracy

In [None]:
#TODO: Fill in the dots

batch_size = 128
epochs = 25
sess = tf.InteractiveSession()
sess.run(...)

n_batches = (int) ...
for i in range(epochs):
    total_loss = 0
    for batch in range(n_batches):
        X_batch, y_batch = ...
        o, l = sess.run(...)
        total_loss += l
    print("Epoch {0}: {1}".format(i, total_loss))
    if i % 5 == 0 and i!= 0:
        X_val, y_val = MNIST.validation.next_batch(MNIST.validation.num_examples)
        val_accuracy = sess.run(...)
        print("\tVal Accuracy {0}".format(val_accuracy))

After training and all validation, you'll want to return your test accuracy

In [None]:
#TODO: Fill in the dots

print("Computing accuracy ...")
X_batch, y_batch = MNIST.test.next_batch(MNIST.test.num_examples)
final_accuracy = sess.run(...)

print ("Test Accuracy {0}".format(final_accuracy))