First, we need to import some of the libraries we will need, as usual. 

In [0]:
import matplotlib.pyplot as plt
import os, sys
import numpy as np
from scipy.misc import bytescale, imresize

Now, we tell python to try and import the progressbar module, but if it receives an error trying to import it, install it first. 

In [0]:
# Try whatever it says in the indented line below
try:
  import progressbar  # import a tool called progressbar
  
# if you get this specific error (ImportError), do the things in the indented lines below  
except ImportError:
  !pip3 install -q progressbar2   # install progressbar first
  import progressbar  # then import it

Now, we use the *try...except* approach again to see if the notebook has a link to our Google Drive or not, and if it doesn't, it will install the necessary tools and execute the code to link them. Don't worry about this block, it will be the same thing every time...just need to copy and past it.

In [0]:
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse

from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass

!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
!mkdir -p drive
!google-drive-ocamlfuse drive

Now we need to navigate to our DeepLearningFall2018 folder in our drive. 

In [0]:
print(os.getcwd()) # find out where we are in the drive
print(os.listdir())  # find out what is in the current folder in our drive

In [0]:
os.chdir('drive')  # enter the folder called drive

In [0]:
os.chdir('DeepLearningFall2018 (f9fa8bfb)')

For this first exercise on perceptrons, we will need to open a file called **linear_data.csv** by using numpy's *genfromtxt* method. You can open up the file in Google Sheets and see what is actually in this file. It contains 1000 rows (data samples) and 3 attributes for each sample (columns). The first and second attributes are the sample's x- and y-coordinates, and the third attribute is either a 1 or a 0, which determines what class that sample is in.

In [0]:
# read in the data file called linear_data.csv
data = np.genfromtxt('linear_data.csv', delimiter=',')
print(data.shape)  # print the shape of the data

First, let's separate the labels from the data. Since the labels are in the last column of the variable **data**, we can say we want every row --- indicated by the colon --- and just the last column of **data** --- indicated by the -1. Then, we reassign the variable **data** to every row and the first two columns. 

In [0]:
labels = data[:, -1]  # take the labels from the data...labels were last column
data = data[:, :-1]  # now remove the last column from the data...bc that was the labels
print(data.shape)  # print data shape
print(labels.shape) # print labels shape

Let's plot the data to see what it looks like. First, we plot all of the samples in the 0 class as red dots, and then we do the same for all of the samples in class 1 except for with blue dots. It looks like these two classes are able to be separated with a straight line, so a single perceptron will be just fine. 

In [0]:
# make a scatter plot to visualize the data
plt.scatter(data[labels == 1, 0], data[labels == 1, 1], c='r')
plt.scatter(data[labels == 0, 0], data[labels == 0, 1], c='b')
plt.show()

This is the general structure of a perceptron. It takes some inputs, weights each one differently, sums them, and produces an ouput that determines the category these inputs belong to.

In [0]:
# read in the file perceptron_image.png as an image...
# make sure its in the folder python is looking in
perceptron_img = plt.imread('perceptron_image.png')


plt.imshow(perceptron_img)  # create a figure and show the image 
plt.grid(False)  # get rid of the grid lines because this is an image
plt.show()  # show the figure

Now, we need to add a column of ones to the end of the data. This is done because the solution to separate these two classes is a straight line, and this column essentially acts as the y-intercept. The input will always be 1 for this last column but how much it is weighted will change, which will change the intercept. 

In [0]:
# create biases for the network
biases = np.ones([data.shape[0], 1])

# add the column of ones to the dataset at the end so each sample also has a 
# bias of value 1 as third input
data = np.concatenate((data, biases), 1)

# print the shape of data to make sure it worked correctly
print(data.shape)

We now need to make a 3-dimensional vector which contains the weights for each of the three aspects of the input (x-coordinate, y-coordinate, and y-intercept). These are the values that will be adjusted during training as the perceptron figures out how to weight each of the inputs so it can predict the sample's class correctly. 

In [0]:
# weight matrix = [input dim + 1, number of perceptrons]
# initialize random weights. 3 weights, two for each data input and one for
# the bias
weights = np.random.randn(3, )
print(weights)

We set the learning rate to 0.05. This number represents how much the weights will change depending on the current samples the perceptron is adjusting to. We don't want this too low, because it will take too long to learn. We also don't want it too high because it will adjust too much to the sample it is currently learning on which could cause it to overfit to that specific sample and cause the learning to be erratic. 

In [0]:
lr = 0.05  # how much to change the weight values each time

Below, we initialize an empty list where we will add each training iteration's error to for later visualization. 

In [0]:
se = []  # create an empty list called se

Below, we initialize the progressbar object which is a cool and useful way to monitor iterative algorithms / for loops. 

In [0]:
bar = progressbar.ProgressBar()

This is the main training loop. We are going to have the perceptron look at one sample at a time and predict what class it thinks it is in based on its current weights. It will take each input and multiply by its respective weight, then sum all three products up. We can do this easily with the *np.dot* function. Next, it will compare its output with the label and add that value to the stored errors to be plotted later. To update the weights, we first multiply the output error by each input value, which is a crude way of finding out how much each specific input contributed to the output. Finally, we adjust the weights a little bit based on this error and the value of the learning rate.

In [0]:
# have i count up to the number of samples in the data and each time, do what 
# the indented lines below say
for i in bar(range(data.shape[0])):
  x = data[i, :]  # assign x to the sample represented by i
  y = labels[i]  # assign y to the label represented by i
  
  # perform the dot product between inputs and their respective weights
  out = np.dot(x, weights)
  
  # perform the activation function on the output...in this case output a 1 if 
  # the number is greater or equal to 0.5 or a 0 if the number is less than 0.5.
  out = np.round(out)
  
  # get the error between the actual output and the desired output (label)
  error = y - out
  
  # add the error for that sample to the list called se so we can plot how the 
  # error changed during training later
  se.append(error)
  
  # update the weights by multiplying the error by each input value and scaling
  # by the learning rate
  weights += lr * (error * x)

Here, we only had the perceptron look at each sample once, but it would easily be possible to modify the training loop to have it go through the dataset multiple times. Below, we can send the entire dataset through and see how many of the training samples it can correctly predict the class of.

In [0]:
# send all of the data through and get the output for every sample
all_out = np.round(np.matmul(data, weights))

# calculate the accuracy over the training samples by seeing if the output 
# for each sample matches the label for that sample. If it does, it will return
# 1, if not it will return 0 for that sample. Then take the mean to get mean accuracy. 
acc = np.mean(all_out == labels)
print(acc)

We can also see what the new values of the weights are and how each input attribute was weighted. 

In [0]:
print(weights)

Now, we want to separate the training data into two classes based on the perceptron's output to visualize how they were divided.

In [0]:
 # get all of the data samples where the perceptron pridicted 0
pred_neg = data[all_out == 0, :]

# get all of the data samples where the perceptron predicted 1
pred_pos = data[all_out == 1, :]

We can also pick out which of the predictions it got wrong to see where these points lie. 

In [0]:
# calculate the difference between each label and its respective ouptut
diff = labels - all_out

# take the samples out of data where the difference is not 0 (the ones it got wrong.)
wrong_pred = data[diff != 0, :]

Below, we plot the training error over the 1000 iterations, the prediction of each point and where it lies, and the ones it got wrong, each in a different subplot. 

In [0]:
# create a figure
fig = plt.figure(figsize=(12, 6))

# create three subplots within the figure in a row
subplot1 = fig.add_subplot(131)
subplot2 = fig.add_subplot(132)
subplot3 = fig.add_subplot(133)

# set the xlabels and ylabels for subplot 1
subplot1.set_xlabel('training iteration')
subplot1.set_ylabel('Error Squared')

# tell subplot 1 what it's supposed to show
subplot1.plot(np.absolute(se))

# set just a title for subplot 2
subplot2.set_title('Classification based on learned weights')

# tell subplot 2 to make a scatter plot where the ones predicted to be
# 1 by the perceptron are colored red and their x and y locations
subplot2.scatter(pred_pos[:, 0], pred_pos[:, 1], c='r')

# tell subplot 2 to also make the negative predictions blue and plot them
# according to their x and y values
subplot2.scatter(pred_neg[:, 0], pred_neg[:, 1], c='b')

# make a title for the third subplot
subplot3.set_title('Incorrectly Classified Points')

# make a scatter plot using the ones it got wrong
subplot3.scatter(wrong_pred[:, 0], wrong_pred[:, 1], c='g')

# use this when there are a lot of subplots/words...sometimes doesn't show right otherwise
plt.tight_layout()
plt.show() # show the plot

Let's create some new data and send that through to see how the perceptron does on data it didn't use for training. 

In [0]:
# create 1000 more random samples
test_data = np.random.rand(1000, 2)

# add biases to these samples as the third column
test_data = np.concatenate((test_data, biases), 1)

# get the test predictions
test_out = np.round(np.matmul(test_data, weights))

In [0]:
# once again, separate the data into positively predicted samples
test_pos = test_data[test_out == 1, :]

# and negatively predicted ones
test_neg = test_data[test_out == 0, :]

In [0]:
# make a plot of the predicted outputs of this test dataset
# to see how well it did 
plt.scatter(test_pos[:, 0], test_pos[:, 1], c='r')
plt.scatter(test_neg[:, 0], test_neg[:, 1], c='b')
plt.show()

## Try it on your own: 
On your own, rerun the previous steps and retrain a perceptron but without a bias weight to see how the perceptron's solution and accuracy changes without the bias.

# Multilayer Perceptrons

Now, we will use a multilayer perceptron to categorize images of handwritten numbers into different categories based on the number. First, make sure TFLearn --- library that makes Tensorflow code easier to understand and more succinct --- is imported. 

In [0]:
try:
  import tflearn
except ImportError:
  !pip3 install tflearn
  import tflearn

We also need to import TensorFlow itself, as well as a couple others.

In [0]:
import tensorflow as tf
from tensorflow.contrib.tensorboard.plugins import projector
import datetime

TFLearn has this dataset of handwritten digits (called MNIST) built in, so it is possible to load the data as follows. It is separated into 55000 images for training and 10000 images to test on, which each have 784 pixels (28 x 28). 

In [0]:
import tflearn.datasets.mnist as mnist
X, Y, testX, testY = mnist.load_data(one_hot=True)
print(X.shape)
print(Y.shape)
print(testX.shape)
print(testY.shape)

Each image has a white/gray number with a black background. For visualization it is easier if the back ground was white and the number was black, so we can subtract each pixel value from 1 to easily accomplish this. 

In [0]:
X, testX = 1. - X, 1. - testX

Below, we make a simple function which will take in any amount of these images, turn them back into pictures, and then show them all at once. A function like this, which is becoming built-in to many libraries, is very useful for exploring and visualizing data.

In [0]:
def montage(x, return_grid=False):
  num = int(np.sqrt(x.shape[0]))
  m = int(np.ceil(np.sqrt(x.shape[1])))
  n = m
  grid = np.zeros([num*m, num*n])
  
  for i in range(num):
    for j in range(num):
      grid[i*m:i*m+m, j*n:j*n+n] = bytescale(x[i*num+j, ...].reshape([28, 28]))
      
  if return_grid:
    return grid
      
  fig = plt.figure(figsize=(15, 15))
  a1 = fig.add_subplot(111)
  a1.imshow(grid, cmap='gray')
  a1.grid(False)
  plt.show()

The two functions below this are there to start Tensorboard during training. 

In [0]:
def install_tensorboard_dep():
  if 'ngrok-stable-linux-amd64.zip' not in os.listdir(os.getcwd()):
    !wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
    !unzip ngrok-stable-linux-amd64.zip
    os.system('n')

In [0]:
def start_tensorboard():
  LOG_DIR = '/tmp/tflearn_logs'
  get_ipython().system_raw('tensorboard --logdir {} --host 0.0.0.0 --port 6006 &'.format(LOG_DIR))
  get_ipython().system_raw('./ngrok http 6006 &')
  ! curl -s http://localhost:4040/api/tunnels | python3 -c \
  "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

Below, we describe a function that will allow us to visualize the image space later in TensorBoard. 

In [0]:
def viz_mnist_embedding(tensor, images, labels):
  
  tb_dir = '/tmp/tflearn_logs'
  sess = tf.Session()
  sess.run(tensor.initializer)
  summary_writer = tf.summary.FileWriter(tb_dir)
  config = projector.ProjectorConfig()
  embedding = config.embeddings.add()
  embedding.tensor_name = tensor.name
  embedding.metadata_path = os.path.join(tb_dir, 'metadata.tsv')
  embedding.sprite.image_path = os.path.join(tb_dir, 'mnistdigits.png') 
  embedding.sprite.single_image_dim.extend([28,28])
  projector.visualize_embeddings(summary_writer, config)
  saver = tf.train.Saver([tensor])
  saver.save(sess, os.path.join(tb_dir, 'mnist_fc.ckpt'), 1)
  
  image_grid = montage(images, True)
  plt.imsave(os.path.join(tb_dir, 'mnistdigits.png'), image_grid, cmap='gray')
  
  with open(os.path.join(tb_dir, 'metadata.tsv'),'w') as f:
    f.write("Index\tLabel\n")
    for index,label in enumerate(labels):
      f.write("%d\t%d\n" % (index,label))
  f.close()

Let's look at the first 1000 examples from the training set to make sure they're loaded in properly using our *montage* function. 

In [0]:
montage(X[:1000, ...])

We can also plot the distribution of the values in these 1000 training examples. 

In [0]:
plt.hist(X[:1000, :].flatten(), bins=100)
plt.show()

. . . and the mean and standard deviation. 

In [0]:
print(np.mean(X))
print(np.std(X))

Below, we normalize the data by subtracting each pixel's value by the mean value for that pixel across all of the training images and dividing by the standard deviation of that pixel across all of the images. This is a very common technique in data processing, and can help increase speed of learning and generalization of the network. 

In [0]:
X_mean, X_std = np.mean(X, 0), np.std(X, 0)
X = (X - X_mean) / (X_std + 1e-6)
testX = (testX - X_mean) / (X_std + 1e-6)

Let's plot the normalized images now. 

In [0]:
montage(X[:1000, ...])

And we can look at the distribution of all of the values in the first 1000 images like before. 

In [0]:
plt.hist(X[:1000, :].flatten(), bins=100)
plt.show()

Notice how the mean is now very close to zero and the standard deviation close to one. The data is now close to a normal distribution. 

In [0]:
print(np.mean(X))
print(np.std(X))

The line below mainly serves to reset the TensorFlow graph in case you want to run the same model more than once without resetting your notebook's runtime. 

In [0]:
tf.reset_default_graph()

The first thing we need to create for the network is the input layer. All we really need to say is what shape it should expect. For this network, we tell the network to expect any amount of examples, but they should all be 784-dimensional vectors. 

In [0]:
input_layer = tflearn.input_data(shape=[None, 784])
emb = tf.Variable(X[:5000,:], name='input_images')

Now that we have the input layer, we can create our network. The one we create here has two hidden (intermediate) layers. The first hidden layer contains 500 nodes or perceptrons, each of which will weight every pixel in the input image vector a certain amount and produce an output between -1 and +1 since we are using the *tanh* activation function. The second hidden layer has 500 nodes that all weight the responses of the 500 nodes in the first layer and produce some output between -1 and +1. Then we have a third layer also with 500 nodes that look at the second layer responses. After each of these layers, we use something called *dropout*, which is designed to increase generalization of a network by randomly removing some of the layer's responses and their weights with the given probability (here we say to keep about 70% of them). The output layer has 10 nodes, one for each of the 10 numbers (0 - 9). The goal is to get the output node corresponding to the letter in the input image to output a value of 1 and all of the other nine nodes to output 0. 

In [0]:
hidden1 = tflearn.fully_connected(input_layer, 
                                  500, 
                                  activation='tanh', 
                                  regularizer='L2', 
                                  weights_init='xavier',
                                  name='fc1')
hidden1 = tflearn.dropout(hidden1, 0.7)
hidden2 = tflearn.fully_connected(hidden1, 
                                  500, 
                                  activation='tanh', 
                                  regularizer='L2',
                                  weights_init='xavier')
hidden2 = tflearn.dropout(hidden2, 0.7)
hidden3 = tflearn.fully_connected(hidden2, 
                                  500, 
                                  activation='tanh', 
                                  regularizer='L2',
                                  weights_init='xavier')
hidden3 = tflearn.dropout(hidden3, 0.7)
output = tflearn.fully_connected(hidden3, 10, activation='softmax')

Below, we describe what autodifferentiation optimizer we want to use to find a set of weights that can map between the input images and the respective outputs, and we specify that the learning rate should be .01. 

In [0]:
sgd = tflearn.SGD(learning_rate=0.1)

In the line below, we describe all of the optimization parameters for the network. We are saying that the output of the network is called output, and that is what should be compared to the actual labels for each input image. Using the *categorical crossentropy* loss function, described as $H(p, q) = -\sum p$ log$(q)$, where q is the response from the 10 output nodes and p is what the response should be. 

In [0]:
network = tflearn.regression(output, optimizer=sgd, 
                             loss='categorical_crossentropy')

Below, we call our embedding visualization function.

In [0]:
tensorboard_name = 'mnist_fc_tflearn'
viz_mnist_embedding(emb, X[:5000,:], np.argmax(Y[:5000,:], 1))

Next, we run the functions to start tensorboard, and a link with the address to Tensorboard will appear (you may have to run this two or three times for it to work). 

In [0]:
install_tensorboard_dep()
start_tensorboard()

Now, that we have told TFLearn how it should optimize the weights, we need to describe the training parameters, build the network, and execute the training.  

In [0]:
model = tflearn.DNN(network, tensorboard_verbose=0)

model.fit(X, Y, n_epoch=10,
          validation_set=(testX, testY),
          batch_size=100,
          snapshot_step=200,
          show_metric=True,
          run_id=tensorboard_name)

If our model reached a good loss and / or accuracy, we might want to save the network for later training or for implementation into an application. The line below describes how to do this. You can rename it to whatever you want by changing the red text. 

In [0]:
model.save('fully_connected_mnist')

Below, we show how to read in the model, but not for further training. Here, we will explore the first layer of weights in the network to see what each of the 500 nodes in the first hidden layer was looking for in the input images. 

In [0]:
from tensorflow.python import pywrap_tensorflow
reader = pywrap_tensorflow.NewCheckpointReader('fully_connected_mnist')
vars = reader.get_variable_to_shape_map()

We can print all of the pieces of the model, including the metrics, objective function, etc.

In [0]:
print(vars)

The first layer of weights is called **FullyConnected/W**, so we can load that layer in. 

In [0]:
hidden1_w = reader.get_tensor('FullyConnected/W')

We can print the shape of it now that we have loaded it as a numpy array. It should be 784 x 500 because it is the weight layer between the input layer, which has 784 nodes, and the first hidden layer, which has 500 nodes.

In [0]:
print(hidden1_w.shape)

We want to see what each of the 500 nodes in the first hidden layer are looking for in the input layer, so if we want to use our montage function to visualize this, we need to transpose the matrix so each feature is a row in the matrix.

In [0]:
hidden1_w = hidden1_w.transpose()

In [0]:
print(hidden1_w.shape)
montage(hidden1_w)

## Try it Yourself:
Below, try to extract and view the weights from all of the subsequent layers. 