# INTRODUCTION TO ARTIFICIAL NEURAL NETWORKS

In [11]:
import numpy as np
import tensorflow as tf
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron
from sklearn.metrics import accuracy_score

In [4]:
iris = load_iris()
X = iris.data[:,(2,3)] #petal length, petal width
y = (iris.target == 0).astype(np.int) #Iris Setosa

per_clf = Perceptron(random_state = 42)
per_clf.fit(X,y)

y_pred = per_clf.predict([[2,0.5]])

[1]




The perceptron learning algorithm strongly resembles stochastic gradient descent. In fact, Scikit-Learn's perceptron class is eqivalent to using SGDClassifier with the following hyperparameters: loss = "perceptron", learning_rate = "constant", eta0 = 1, penalty = None (no regularization).


## Training an MLP with TesnorFlow's High-Level API

The simplest way to train MLP with TensorFlow is to use the high level API TF.Learn. The DNNClassifier class makes it trivial to train a deep neural network with any number of hidden layers, and softmax output layer to output estimated class probabilities. 

The following code trains a DNN for classification with two hidden layers (one with 300 neurons and other with 100) and a softmax output layer with 10 neurons.

In [12]:
#Have to insert MNIST dataset here
"""
feature_columns = tf.contrib.learn.infer_real_valued_columns_from_input(X_train)
dnn_clf = tf.contrib.learn.DNNClassifier(hidden_units = [300,100], 
                                         n_classes = 10, feature_columns = feature_columns)
dnn_clf.fit(x=X_train, y=y_train, batch_size = 50, steps = 40000)

"""

'\nfeature_columns = tf.contrib.learn.infer_real_valued_columns_from_input(X_train)\ndnn_clf = tf.contrib.learn.DNNClassifier(hidden_units = [300,100], \n                                         n_classes = 10, feature_columns = feature_columns)\ndnn_clf.fit(x=X_train, y=y_train, batch_size = 50, steps = 40000)\n\n'

In [7]:
#?tf.contrib.learn.infer_real_valued_columns_from_input

#Creates `FeatureColumn` objects for inputs defined by input `x`.
#This interprets all inputs as dense, fixed-length float values.

In [10]:
"""
from sklearn.metrics import accuracy_score
y_pred = list(dnn_clf.predict(X_test))
accuracy_score(y_test, y_pred)
"""

'\nfrom sklearn.metrics import accuracy_score\ny_pred = list(dnn_clf.predict(X_test))\naccuracy_score(y_test, y_pred)\n'

## Training a DNN Using Plain TensorFlow

In this section we will build the same model as before using this API, and we will implement Mini-batch Gradient Descent to train it on the MNIST dataset. The first step is the construction phase, building the TensorFlow graph. The second step is the execution phase, where you actually run the graph to train the model.

### Construction Phase

First we specify the number of inputs and outputs and set the number of hidden neurons in each layer.

In [13]:
n_inputs = 28*28 #MNIST
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10

The shape of X is only partially defined. we know that it will be a 2D tensor with instances along the first dimension and features along the second dimension, and we know the number of features is going to be 28 x 28 (one feature per pixel), but we dont know how many instances each training batch will contain. So the shape of X is (None, n_inputs). Similarly, we know that y will be a 1D tensor with one entry per instance, but again we dont know the size of the training batch at this point so the shape is (None).

In [14]:
X = tf.placeholder(tf.float32, shape = (None, n_inputs), name = "X")
y = tf.placeholder(tf.int64, shape = (None), name = "y")

The placeholder X will act as the input layer; during the execution phase, it will be replaced with one training batch at a time (note that all instances in a training batch will be processes simultaneously by the neural network...parallel computing). 

Now you can create the two hidden layers and the output layer. The two hidden layers are almost identical: they differ only by the inputs they are connected to and by the number of neurons they contain. 

The output layer is also similar but uses a softmax activation function instead of a ReLU activation function. 

We will begin creating a neuron_layer() function  which will create the two hidden layers and output layer. It will need parameters to specify the inputs, the number of neurons , the activation function and the name of the layer.

In [19]:
def neuron_layer(X, n_neurons, name, activation = None):
    with tf.name_scope(name):
        #second dimension of X is number of inputs
        n_inputs = int(X.get_shape()[1])
        stddev = 2/np.sqrt(n_inputs)
        #initializing the random values which will be our weight inputs 
        #using truncated_normal  command
        init = tf.truncated_normal((n_inputs, n_neurons), stddev=stddev)
        W = tf.Variable(init, name = "weights")
        b = tf.Variable(tf.zeros([n_neurons]), name = "biases")
        z = tf.matmul(X, W) + b
        if activation == "relu":
            return tf.nn.relu(z)
        else:
            return z
        

**Going through the above code line by line:**

1. First we create a name scope using the name of the layer. This code is optional but it organizes the graph better when visualizing it.

2. specify second dimension of X since it represents number of inputs.

3. Next 3 lines create our weight matrix which is a 2D tensor containing all the connection weights between each input and each neuron. Hence, the shape will be (n_inputs, n_neurons).

4. b represents bias initialized to 0 with one bias parameter per neuron.

5. Then we create subgraph to compute $z = X.W + b$. This vectorized implementation will efficiently compute the weighted sums of the inputs plus the bias term for each and every neuron in the layer, for all instances in the batch in just one shot.

6. Activation parameter set to "relu" (max(0,z)).

In [18]:
#?tf.name_scope()
#This context manager validates that the given `values` are from the 
#same graph, makes that graph the default graph, and pushes a
#name scope in that graph.

#?tf.truncated_normal
#Outputs random values from a truncated normal distribution.

#The generated values follow a normal distribution with specified mean and
#standard deviation, except that values whose magnitude is more than 2 
#standard deviations from the mean are dropped and re-picked

Now we begin to consutruct the layers using the function described above:

In [20]:
with tf.name_scope("dnn"):
    hidden1 = neuron_layer(X, n_hidden1, "hidden1", activation = "relu")
    hidden2 = neuron_layer(hidden1, n_hidden2, "hidden2", activation = "relu")
    logits = neuron_layer(hidden2, n_outputs, "outputs")

In the above example, we have tediously written our own neural network function. However, TensorFlow's fully_connected() function creates a fully connected layer where all the inputs are connected to all the neurons in the layer. It takes care of creating the weights and biases of variables, with proper initialization strategy, and it uses the ReLU activation function by default.

In [21]:
from tensorflow.contrib.layers import fully_connected

In [22]:
with tf.name_scope("dnn"):
    hidden1 = fully_connected(X, n_hidden1, scope = "hidden1")
    hidden2 = fully_connected(hidden1, n_hidden2, scope = "hidden2")
    logits = fully_connected(hidden2, n_outputs, scope = "outputs", 
                             activation_fn = None)

We will now use the cross entropy cost function for our softmax outputs. Cross entropy will penalize the models that estimate a low probability for the target class. The loss function expects integers ranging from 0 to number of classes minus (0 to 9 for MNIST). We will further use tensorflow's reduce_mean() function to compute the mean cross entropy over all instances.

Cross entropy: $$H_{y^{'}}(y) := - \sum(y^{'}_{i}log(y_{i}) + (1-y^{'}_{i})log(1-y_{i}) $$

In [23]:
with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
        labels = y, logits = logits)
    loss = tf.reduce_mean(xentropy, name = "loss")
    

So far we have:
- neural network model
- cost function

Now, we need to define the gradient descent optimizer.

In [24]:
learning_rate = 0.01

with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

The last important step in the construction phase is to specify how to evaluate the model. We will use accuracy as our performance measure. For this you can use in_top_k() function which returns a boolean when the class with highest percentage matches the label on the picture. (Need to convert boolean to float)

In [25]:
with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    #convert boolean to float
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

We now create a node to initialize all variables, and we will also create a Saver to save our trained model parameters:

In [27]:
init = tf.global_variables_initializer()
saver = tf.train.Saver()

### Execution Phase

First, load MNIST

In [29]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/")

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting /tmp/data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


Now we define number of epochs and size of mini-batch:

In [30]:
n_epochs = 400
batch_size = 50

And now we train the model:

In [34]:
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(mnist.train.num_examples//batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict = {X: X_batch, y:y_batch})
        acc_train = accuracy.eval(feed_dict = {X: X_batch, y:y_batch})
        acc_test = accuracy.eval(feed_dict = {X: mnist.test.images,
                                             y: mnist.test.labels})
        
        print(epoch, "Train accuracy:",acc_train, "Test accuracy:", acc_test)
    #save_path = saver.save(sess, "./my_model_final.ckpt")

0 Train accuracy: 0.9 Test accuracy: 0.9039
1 Train accuracy: 0.94 Test accuracy: 0.9202
2 Train accuracy: 0.9 Test accuracy: 0.9305
3 Train accuracy: 0.98 Test accuracy: 0.9382
4 Train accuracy: 0.88 Test accuracy: 0.9422
5 Train accuracy: 0.94 Test accuracy: 0.9491


KeyboardInterrupt: 

## Using the Neural Network

After training we can use the NN to make predictions. To do that, you can reuse the construction phase but change the execution like this:

In [32]:
#with tf.Session() as sess:
    #saver.restore(sess, "./my_model_final.ckpt")
#    X_new_scaled = [...] #some new images (scaled from 0 to 1)
#    Z = logits.eval(feed, dict = (X: X_new_scaled))
#    y_pred = np.argmax(Z, axis = 1)