In [None]:
import pandas as pd
import numpy as np
import tensorflow as tf

# Intro to Tensorflow

### Goals
- Gain a basic understanding of the what/how/why of Tensorflow
- Implement a simple multi-layer perceptron 

## Tensorflow Basics

Tensorflow (and other 'deep learning' libraries) are really good at gradient descent. 

Three types of objects
- Placeholders where we will use real data
- Variables. These are the model parameters - they can be updated using gradient descent.
- Constants.

Use these objects to construct a loss function. Then use gradient descent to find the best parameters, given the data.

### Constants

### Placeholders
Placeholders are the objects that will be filled with real data at runtime

### Variables

Think about the linear equation
$$
y = 3 x - 3
$$

Variables need to be initialized

Or we could define some y values and see how well it fits the model

# Linear Regression

## Crime Data

In [None]:
from sklearn.model_selection import train_test_split

# Load some crime data
headers = pd.read_csv('comm_names.txt', squeeze=True)
headers = headers.apply(lambda s: s.split()[1])
crime = (pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/communities/communities.data', 
                    header=None, na_values=['?'], names=headers)
         .iloc[:, 5:]
         .dropna()
         )

# Set target and predictors
target = 'ViolentCrimesPerPop'
predictors = [c for c in crime.columns if not c == target]

# Train/test split
X = crime[predictors]
y = crime[[target]]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=2)

### Define the model

Initialize

View loss

Parameters

### Exercise

1: Run 10000 gradient descent steps of the model above. Every 500 iterations, note the train error and the test error.

2: Compare your results above to LinearRegression in scikit-learn.

3: In Week 5, we found that the best ridge regularization parameter for this data was alpha=11.8. Try to add the same amount of regularization to the tensorflow model above, then compare with ridge regression in scikit-learn.

# Multi-layer Perceptron (MLP)

![](mlp.png)

### Exercise

Build a multi-layer perceptron to predict crime rates.

Start with two hidden units. You should be able to define one matrix transforms the inputs to the hidden layer, and a second matrix that will transform the hidden layer to the output.

Don't forget add bias at each step and to apply a nonlinear transformation to the hidden layer (e.g. tf.nn.sigmoid())

In [None]:
dim_hidden = 2

# input

# output

# Input to hidden


# Hidden to output


# Model


# Loss


# Optimizer


Once you have something working, it is time to tune your network to find the right number of hidden layers and amount of regularization.

1. Use your code block from above that performs gradient descent steps and records intermediate results.
2. You might want to force the optimizer to be stochastic. That is, feed it 100 random training examples at each step instead of the whole training dataset.
3. Start with two hidden units and try to get the regularization right. Then slowly increase the number of hidden units and continue tuning the regularization.
4. If the training error is high, you have too much bias. If the training and testing errors are very different, you have too much variance. If the training or testing errors are jumping all over the place, your step size is too high.

In [None]:
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

# Bonus: Add _another_ hidden layer.

Can you decrease the MSE on the test set even further?

In [None]:
dim_h1 = 8
dim_h2 = 8

# input
x = tf.placeholder(tf.float32, [None, dim_input])

# target
y_ = tf.placeholder(tf.float32, [None, 1])

# Input to hidden 1
W1 = tf.Variable(tf.random_normal([dim_input, dim_h1]))
b1 = tf.Variable(tf.random_normal([dim_h1]))

# Hidden 1 to hidden 2
W2 = tf.Variable(tf.random_normal([dim_h1, dim_h2]))
b2 = tf.Variable(tf.random_normal([dim_h2]))

# Hidden 2 to output
W3 = tf.Variable(tf.random_normal([dim_h2, dim_output]))
b3 = tf.Variable(tf.random_normal([1]))

# Model
H1 = tf.nn.tanh(tf.matmul(x, W1) + b1)
H2 = tf.nn.tanh(tf.matmul(H1, W2) + b2)
y = tf.matmul(H2, W3) + b3

# Loss
mse = tf.reduce_mean(tf.square(y - y_))
lam = .4
reg = tf.reduce_mean(lam * tf.square(W1)) + \
    tf.reduce_mean(lam * tf.square(W2)) + \
    tf.reduce_mean(lam * tf.square(W3))
loss = mse + reg

# Optimizer
train_step = tf.train.AdamOptimizer(0.0005).minimize(loss)

In [None]:
sess = tf.InteractiveSession()

In [None]:
tf.global_variables_initializer().run()

In [None]:
for i in range(10000):
    idx = np.random.choice(X_train.shape[0], 150, replace=True)
    X_batch = X_train.iloc[idx, :]
    y_batch = y_train.iloc[idx, :]
    if i % 1000 == 0:
        train_mse = sess.run(mse, {x: X_batch, y_: y_batch})
        test_mse = sess.run(mse, {x: X_test, y_: y_test})
        print 'Iteration: {:04} \t Train Loss: {:.3} \t Test Loss: {:.3}'.format(i, train_mse, test_mse)
    batch_update(sess, X_train, y_train)