# Artificial Neural Networks

# Setup

First, let's make sure this notebook works well in both python 2 and 3, import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures:

In [None]:
# To support both python 2 and python 3
from __future__ import division, print_function, unicode_literals

# Common imports
import numpy as np
import os

# to make this notebook's output stable across runs
def reset_graph(seed=42):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

# To plot pretty figures
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12

# Where to save the figures
PROJECT_ROOT_DIR = "."
CHAPTER_ID = "ann"

def save_fig(fig_id, tight_layout=True):
    path = os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID, fig_id + ".png")
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format='png', dpi=300)

# Perceptrons

- The Perceptron is one of the simplest ANN architectures, invented in 1957 by Frank
Rosenblatt. 

- It is based on a slightly different artificial neuron called a linear threshold unit (LTU), as lectured.
- Its inputs and output are now numbers (instead of binary on/off values) and each input connection is associated with a weight. 

- The LTU computes a weighted sum of its inputs, then applies a step function to that sum and outputs the result:



In [None]:
# Scikit-Learn provides a Perceptron class that implements a single LTU network. 
# It can be used pretty much as you would expect for example, on the iris dataset:

In [None]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron

iris = load_iris()
X = iris.data[:, (2, 3)]  # petal length, petal width
y = (iris.target == 0).astype(np.int)

per_clf = Perceptron(random_state=42)
per_clf.fit(X, y)

y_pred = per_clf.predict([[2, 0.5]])


# you may ignore "Warning"

In [None]:
iris.feature_names

In [None]:
iris.DESCR

In [None]:
iris.data

In [None]:
X

In [None]:
y

In [None]:
# y_pred = per_clf.predict([[2, 0.5]])
# Perceptrons do not output a class probability; 
# rather, they just make predictions based on a hard threshold. 

y_pred

- You may have recognized that the Perceptron learning algorithm strongly resembles
Stochastic Gradient Descent. In fact, Scikit-Learn’s Perceptron class is equivalent to
using an SGDClassifier with the following hyperparameters: loss="perceptron" ,
learning_rate="constant" , eta0=1 (the learning rate), and penalty=None (no regu‐
larization).



- Note that contrary to Logistic Regression classifiers, Perceptrons do not output a class
probability; rather, they just make predictions based on a hard threshold. This is one
of the good reasons to prefer Logistic Regression over Perceptrons.

In [None]:
# Let us have a plot for iris classification using Perceptron

a = -per_clf.coef_[0][0] / per_clf.coef_[0][1]
b = -per_clf.intercept_ / per_clf.coef_[0][1]

axes = [0, 5, 0, 2]

x0, x1 = np.meshgrid(
        np.linspace(axes[0], axes[1], 500).reshape(-1, 1),
        np.linspace(axes[2], axes[3], 200).reshape(-1, 1),
    )
X_new = np.c_[x0.ravel(), x1.ravel()]
y_predict = per_clf.predict(X_new)
zz = y_predict.reshape(x0.shape)

plt.figure(figsize=(10, 4))
plt.plot(X[y==0, 0], X[y==0, 1], "bs", label="Not Iris-Setosa")
plt.plot(X[y==1, 0], X[y==1, 1], "yo", label="Iris-Setosa")

plt.plot([axes[0], axes[1]], [a * axes[0] + b, a * axes[1] + b], "k-", linewidth=3)
from matplotlib.colors import ListedColormap
custom_cmap = ListedColormap(['#9898ff', '#fafab0'])

plt.contourf(x0, x1, zz, cmap=custom_cmap, linewidth=5)
plt.xlabel("Petal length", fontsize=14)
plt.ylabel("Petal width", fontsize=14)
plt.legend(loc="lower right", fontsize=14)
plt.axis(axes)

save_fig("perceptron_iris_plot")
plt.show()

# Activation functions

In my lecture, I introduced four Activation Functions:
- Step function
- Logistic function
- The hyperbolic tangent function
- The ReLU finction


Now, let us plot them for your undersdaning their difference.



In [None]:
def logit(z):
    return 1 / (1 + np.exp(-z))

def relu(z):
    return np.maximum(0, z)

def derivative(f, z, eps=0.000001):
    return (f(z + eps) - f(z - eps))/(2 * eps)

In [None]:
z = np.linspace(-5, 5, 200)

plt.figure(figsize=(11,4))

plt.subplot(121)
plt.plot(z, np.sign(z), "r-", linewidth=2, label="Step")
plt.plot(z, logit(z), "g--", linewidth=2, label="Logit")
plt.plot(z, np.tanh(z), "b-", linewidth=2, label="Tanh")
plt.plot(z, relu(z), "m-.", linewidth=2, label="ReLU")
plt.grid(True)
plt.legend(loc="center right", fontsize=14)
plt.title("Activation functions", fontsize=14)
plt.axis([-5, 5, -1.2, 1.2])

plt.subplot(122)
plt.plot(z, derivative(np.sign, z), "r-", linewidth=2, label="Step")
plt.plot(0, 0, "ro", markersize=5)
plt.plot(0, 0, "rx", markersize=10)
plt.plot(z, derivative(logit, z), "g--", linewidth=2, label="Logit")
plt.plot(z, derivative(np.tanh, z), "b-", linewidth=2, label="Tanh")
plt.plot(z, derivative(relu, z), "m-.", linewidth=2, label="ReLU")
plt.grid(True)
#plt.legend(loc="center right", fontsize=14)
plt.title("Derivatives", fontsize=14)
plt.axis([-5, 5, -0.2, 1.2])

save_fig("activation_functions_plot")
plt.show()

# FNN for MNIST

** 1. Training Mulptple Layer Perceptrons (MLP) with TensorFlow’s High-Level API **

- The simplest way to train an MLP with TensorFlow is to use the high-level API
TF.Learn, which is quite similar to Scikit-Learn’s API. 


- The DNNClassifier class makes it trivial to train a deep neural network with any number of hidden layers, and a softmax output layer to output estimated class probabilities. 


- For example, the following code trains a DNN for classification with two hidden layers (one with 300 neurons, and the other with 100 neurons) and a softmax output layer with 10 neurons:

## using tf.learn

In [None]:
# We will be using the MNIST dataset, which is a set of 70,000 small
# images of digits handwritten by high school students and employees of the US Census Bureau. 
# Each image is labeled with the digit it represents.

from tensorflow.examples.tutorials.mnist import input_data
# mnist = input_data.read_data_sets("/tmp/data/")

mnist = input_data.read_data_sets("/tmp/data/")

X_train = mnist.train.images
X_test = mnist.test.images
y_train = mnist.train.labels.astype("int")
y_test = mnist.test.labels.astype("int")

In [None]:
# We use Tensorflow for this classifications

import tensorflow as tf

# RunConfig is optional, you may ignore it or use it.
# config = tf.contrib.learn.RunConfig(tf_random_seed=42) 

feature_cols = tf.contrib.learn.infer_real_valued_columns_from_input(X_train)

# Select a training model-"DNNClassifer" from Tensorflow and specify its parameters,
# Create the model.

# with Runconfig
# dnn_clf = tf.contrib.learn.DNNClassifier(hidden_units=[300,100], n_classes=10,
                                         #feature_columns=feature_cols, config=config)
# without Runcongig:

dnn_clf = tf.contrib.learn.DNNClassifier(hidden_units=[300,100], n_classes=10,
                                         feature_columns=feature_cols)
# if TensorFlow version >= 1.1

dnn_clf = tf.contrib.learn.SKCompat(dnn_clf) 
dnn_clf.fit(X_train, y_train, batch_size=50, steps=40000)

# It takes a long time for training process, please be patient. Wait until finished.


In [None]:
from sklearn.metrics import accuracy_score

y_pred = dnn_clf.predict(X_test)
accuracy_score(y_test, y_pred['classes'])

# it also takes some time!

In [None]:
from sklearn.metrics import log_loss

y_pred_proba = y_pred['probabilities']
log_loss(y_test, y_pred_proba)

** 2. Training an Mulptple Layer Perceptrons (MLP) with TensorFlow’s low-Level **

If you want more control over the architecture of the network, you may prefer to use
TensorFlow’s lower-level Python API.  It means that you may build a neural network by yourself,
rather than by using high-level API, such as using DNNClassifier model.

In this section we, will build the same model as before using this API, and we will implement Mini-batch Gradient Descent to train it on the MNIST dataset. 

The first step is the construction phase, building the TensorFlow graph. The second step is the execution
phase, where you actually run the graph to train the model.


## Using plain TensorFlow

In [None]:
# Construction Phase:
# First we need to import the tensorflow library. Then we must specify the
# number of inputs and outputs of a neural network, 
# and set the number of hidden neurons in each layer:

In [None]:
import tensorflow as tf

n_inputs = 28*28  # MNIST
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10

In [None]:
# Next, you can use placeholder nodes to represent thetraining data and targets. 
# The shape of X is only partially defined. 
# We know that it will be a 2D tensor (i.e., a matrix), 
# with instances along the first dimension and features along the second dimension, 
# and we know that the number of features is going to be
# 28 x 28 (one feature per pixel), but we don’t know yet how many instances each train‐
# ing batch will contain. So the shape of X is (None, n_inputs) . 
# Similarly, we know that y will be a 1D tensor with one entry per instance, 
# but again we don’t know the size of the training batch at this point, so the shape is (None) .

In [None]:
reset_graph()

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int64, shape=(None), name="y")

Now let’s create the actual neural network:

- The placeholder X will act as the input layer; during the execution phase, it will be replaced with one training batch at a time (note that all the instances in a training batch will be processed simultaneously by the neural network). 

Now we need to create the two hidden layers and the output layer. 

- The two hidden layers are almost identical: they differ only by the inputs they are connected to and by the number of neurons they contain. 
- The output layer is also very similar, but it uses a softmax activation function instead of a ReLU activation function. 

- So let’s create a neuron_layer() function that we will use to create one layer at a time. It will need parameters to specify the inputs, the number of neurons, the activation function, and the name of the layer:

In [None]:
def neuron_layer(X, n_neurons, name, activation=None):
    
    # First we create a name scope using the name of the layer: it will contain all the
    #computation nodes for this neuron layer. This is optional, but the graph will look
    # much nicer in TensorBoard if its nodes are well organized.
    
    with tf.name_scope(name):
        n_inputs = int(X.get_shape()[1]) # we get the number of inputs 
                                        # by looking up the input matrix’s shape and
          # getting the size of the second dimension (the first dimension is for instances).
        
        # next three lines for creating a W variable that will hold the weights matrix
        
        stddev = 2 / np.sqrt(n_inputs)
        init = tf.truncated_normal((n_inputs, n_neurons), stddev=stddev)
        W = tf.Variable(init, name="kernel")
        
        #creates a b variable for biases, initialized to 0 (no symmetry issue in
        # this case), with one bias parameter per neuron.
        b = tf.Variable(tf.zeros([n_neurons]), name="bias")
        # create a subgraph to compute z = X · W + b. This vectorized implementation
        # will efficiently compute the weighted sums of the inputs plus the bias term
        # for each and every neuron in the layer, for all the instances in the batch in just
        # one shot.
        Z = tf.matmul(X, W) + b
        
        # if the activation parameter is set to "relu" , the code returns relu(z)
        #(i.e., max (0, z)), or else it just returns z .
        
        if activation is not None:
            return activation(Z)
        else:
            return Z

Okay, so now you have a nice function to create a neuron layer. Let’s use it to create
the deep neural network! The first hidden layer takes X as its input. The second takes
the output of the first hidden layer as its input. And finally, the output layer takes the
output of the second hidden layer as its input.

In [None]:
# To create a deep learning neural network, called "dnn", with two hidden layers and output layer

with tf.name_scope("dnn"):
    hidden1 = neuron_layer(X, n_hidden1, name="hidden1",
                           activation=tf.nn.relu)
    hidden2 = neuron_layer(hidden1, n_hidden2, name="hidden2",
                           activation=tf.nn.relu)
    logits = neuron_layer(hidden2, n_outputs, name="outputs")
    
    # note that logits is the output of the neural network 
    # before going through the softmax activation function:



For optimization reasons, we will handle the softmax computation later.
As you might expect, TensorFlow comes with many handy functions to create
standard neural network layers, so there’s often no need to define your own
neuron_layer() function like we just did. 

For example, TensorFlow’s fully_connected() function creates a fully connected layer, where all the inputs are connected to all the neurons in the layer. It takes care of creating the weights and biases variables, with the proper initialization strategy, and it uses the ReLU activation function by default (we can change this using the activation_fn argument). 

As we will see in previous Chapter - "Traing models", it also supports regularization and normalization parameters.

Let’s tweak the preceding code to use the fully_connected() function instead of our neuron_layer() function. Simply import the function and replace the dnn construction
section with the following code:



In [None]:
# import softmax function, introduced in the lecture-slides.
# we will also use cross entropy 
# ( a  new cost function, not discussed yet in the lecture, simply accept it here)

with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
                                                              logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")

In [None]:
# Now that we have the neural network model ready to go, we need to define the cost
# function that we will use to train it. Just as we did for Softmax Regression, 
# we will use cross entropy ( a  new cost function, not discussed yet, simply accept it here) . 


# The cross entropy will penalize models that estimate a low probability for the target class. 
# TensorFlow provides several functions to compute cross entropy. 
# Here, we will use sparse_softmax_cross_entropy_with_logits() : 
# it computes the cross entropy based on the “logits” (i.e., the output of the network 
# before going through the softmax activation function), 
# and it expects labels in the form of integers ranging from 0 to the number
# of classes minus 1 (in our case, from 0 to 9). This will give us a 1D tensor containing
# the cross entropy for each instance. We can then use TensorFlow’s reduce_mean() function, 
# as above, to compute the mean cross entropy over all instances.

In [None]:
# Now, we have the neural network model, we have the cost function, and now we need to
# define a GradientDescentOptimizer that will tweak the model parameters to minimize 
# the cost function. Nothing new; it’s just like before.


learning_rate = 0.01

with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

The last important step in the construction phase is to specify how to evaluate the
model. We will simply use accuracy as our performance measure.

- First, for each instance, determine if the neural network’s prediction is correct by checking whether or not the highest logit corresponds to the target class. 

- For this you can use the in_top_k() function. This returns a 1D tensor full of boolean values, so we need to cast these booleans to floats and then compute the average. This will give us the network’s overall accuracy.

In [None]:
with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
    
    # take some time for computing

In [None]:
# as usual, we need to create a node to initialize all variables, and we will also create
# a Saver to save our trained model parameters to disk:

init = tf.global_variables_initializer()
saver = tf.train.Saver()

This concludes the construction phase. 
This was fewer than 40 lines of code, but it was pretty intense: we created placeholders for the inputs and the targets, we created a function to build a neuron layer, we used it to create the DNN, we defined the cost function, we created an optimizer, and finally we defined the performance measure. 


Now on to the execution phase.

In [None]:
n_epochs = 40
batch_size = 50

In [None]:
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(mnist.train.num_examples // batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
        acc_test = accuracy.eval(feed_dict={X: mnist.test.images,
                                            y: mnist.test.labels})
        print(epoch, "Train accuracy:", acc_train, "Test accuracy:", acc_test)

    save_path = saver.save(sess, "./my_model_final.ckpt")
    
    # it takes some time for 40 outputs

In [None]:
with tf.Session() as sess:
    saver.restore(sess, "./my_model_final.ckpt") # or better, use save_path
    X_new_scaled = mnist.test.images[:20]
    Z = logits.eval(feed_dict={X: X_new_scaled})
    y_pred = np.argmax(Z, axis=1)

In [None]:
print("Predicted classes:", y_pred)
print("Actual classes:   ", mnist.test.labels[:20])

Great job!