In [None]:
import pandas as pd
import numpy as np
import tensorflow as tf

# Intro to Tensorflow

https://www.tensorflow.org/get_started/get_started

### Goals
- Gain a basic understanding of the what/how/why of Tensorflow
- Implement a simple multi-layer perceptron 

## Tensorflow Basics

Tensorflow (and other 'deep learning' libraries) are really good at gradient descent. 

Three types of objects
- Placeholders where we will use real data
- Variables. These are the model parameters - they can be updated using gradient descent.
- Constants.

Use these objects to construct a loss function. Then use gradient descent to find the best parameters, given the data.

### Constants

Initialize a session

In [None]:
sess = tf.InteractiveSession()

### Placeholders
Placeholders are the objects that will be filled with real data at runtime

### Variables

Think about the linear equation
$$
y = 3 x - 3
$$

Variables need to be initialized

Run the graph

Or we could define some y values and see how well it fits the model

So, as far as tensorflow is concerned, error = F(data, variables, constants).

Tensorflow knows how to use gradient descent to find the _values_ of the variables, that minimize the total error, given some data.

# Linear Regression

## Crime Data

In [None]:
from sklearn.model_selection import train_test_split

# Load some crime data
headers = pd.read_csv('comm_names.txt', squeeze=True)
headers = headers.apply(lambda s: s.split()[1])
crime = (pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/communities/communities.data', 
                    header=None, na_values=['?'], names=headers)
         .iloc[:, 5:]
         .dropna()
         )

# Set target and predictors
target = 'ViolentCrimesPerPop'
predictors = [c for c in crime.columns if not c == target]

# Train/test split
X = crime[predictors]
y = crime[[target]]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=2)

### Define the model

In [None]:
# Parameters


# Input

# Output

# Variables


# Model

# Loss

# Optimizer



Initialize

In [None]:
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())

View loss

Execute a training step

View error on training data

### Exercise

1: Run 10000 gradient descent steps of the model above. Every 500 iterations, note the train error and the test error.

2: Compare your results above to LinearRegression in scikit-learn.

In [None]:
from sklearn.linear_model import LinearRegression



3: In Week 5, we found that linear regression tended to overfit this data (due to the high number of features), and we used regularization to reduce the variance. Try fitting a ridge regression model to this data (just like in week 5), then extract the regularization parameter from the model and see if you can add the same amount of regularization to the TF model above.

In [None]:
from sklearn.linear_model import RidgeCV


# Multi-layer Perceptron (MLP)

![](mlp.png)

### Exercise

Build a multi-layer perceptron to predict crime rates.

Start with two hidden units. You should be able to define one matrix transforms the inputs to the hidden layer, and a second matrix that will transform the hidden layer to the output.

Don't forget add bias at each step and to apply a nonlinear transformation to the hidden layer (e.g. tf.nn.sigmoid())

In [None]:
dim_hidden = 2

# input
x = tf.placeholder(tf.float32, [None, dim_input])

# target
y_ = tf.placeholder(tf.float32, [None, 1])

# Input to hidden


# Hidden to output


# Model


# Loss


# Optimizer
train_step = tf.train.AdamOptimizer(0.001).minimize(loss)

In [None]:
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())

Once you have something working, it is time to tune your network to find the right number of hidden layers and amount of regularization.

1. Use your code block from above that performs gradient descent steps and records intermediate results.
2. You might want to force the optimizer to be stochastic. That is, feed it 100 random training examples at each step instead of the whole training dataset.
3. Start with two hidden units and try to get the regularization right. Then slowly increase the number of hidden units and continue tuning the regularization.
4. If the training error is high, you have too much bias. If the training and testing errors are very different, you have too much variance. If the training or testing errors are jumping all over the place, your step size is too high.

# Bonus: Add _another_ hidden layer.

Can you decrease the MSE on the test set even further?

In [None]:
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()