# Artificial_Neural_networks_intro
  
    
### Logical Computations with Neurons
Warren McCulloch and Walter Pitts proposed a very simple network with which neurons which could have identical properties to logic gates. Complex models can be built from this just like hardware.

### The Perceptron
invented by Frank Rosenblatt. It is a single layer network using linear threshold units. The outputs are numbers instead of binary input/output values like the neruron above. The LTU computes a weighted sum of its inputs and then applies a step function to that sum and outputs the result. 



    h_w = step(z) = step(w^TX)

##### Perceptron Learning Rules:
w_i,j(next_step) = w_i,j + n(yJ - yhat_j)x_i

* W_i,j is the connection weight between the i^th input neuron and the j^th output neuron.
* y_hat is the output of the j^th output neuron for the current training instance. 
* y_j is the target output of the j^th output neuron for the current training instance. 
* n is the learning rate

In [36]:
import numpy as np 
import os
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron

iris = load_iris()
X = iris.data[:, (2,3)] #petal length, petal width
y = (iris.target ==0).astype(np.int) # Iris Setosa

per_clf = Perceptron(random_state=42)
per_clf.fit(X, y)

y_pred = per_clf.predict([[2, 0.5]])

##### Perceptrons cannot make class prediction probabilities only classifications #####

An MLP (multi layer perceptron) consists of one or more layers of TLUs (hidden layers) and one final layer of TLUs called the output layer. 

to train using reverse-mode autodiff. The error is measure between the networks output error and then it computes how much of the error was contributed to by each neuron in the hidden layer. This pass efficiently measures the error gradient across all the connection weights in the netwok by propagating the error gradient backward in the network.

in order for this algorithm to work logistic function should be used instead of step for the perceptrons. 
1/ (1+exp(-z)). It is differentiable and output value ranges from -1 to 1 so gradients can be calculated and output is more normalized.

### Training a DNN using tensorflow

In [15]:


n_inputs = 28*28 # MNIST
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X") #To feed batches to during training
y = tf.placeholder(tf.int64, shape=(None), name="y")


def neuron_layer(X, n_neurons, name, activation=None):
    with tf.name_scope(name): #name scope using name of layer
        n_inputs = int(X.get_shape()[1]) #get the number of inputs
        stddev = 2 / np.sqrt(n_inputs + n_neurons) #standard deviation of distribution
        init = tf.truncated_normal((n_inputs, n_neurons), stddev=stddev)#random values from a truncated normal dist
        W = tf.Variable(init, name="kernel") #weights
        b = tf.Variable(tf.zeros([n_neruons]), name="bias") #bias
        Z = tf.matmul(X, W)+b #prediction
        if activation is not None: 
            return activation(X)
        else: 
            return X

In [3]:
#Creating layers
with tf.name_scope("dnn"):
    hidden1 = tf.layers.dense(X, n_hidden1, name="hidden1", activation=tf.nn.relu)
    hidden2 = tf.layers.dense(hidden1, n_hidden2, name="hidden2", activation=tf.nn.relu)
    logits = tf.layers.dense(hidden2, n_outputs, name="outputs")
    
    # cost funcition
    # xentropy is equivalent to applying the softmax activation function
    # and then computing cross entropy.
with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")

    # Training using GradientDescent
learning_rate = 0.01
with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss) #minimizing loss function
    
    #using accuracy as a performance measure
with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
    init = tf.global_variables_initializer()
    saver = tf.train.Saver()

#### Execution phase

In [4]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/")

n_epochs = 40
batch_size = 50

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(mnist.train.num_examples // batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict={X:X_batch, y:y_batch})
        acc_train = accuracy.eval(feed_dict={X:X_batch, y:y_batch})
        acc_val = accuracy.eval(feed_dict={X:mnist.validation.images, y: mnist.validation.labels})
        
        print(epoch, "Train accuracy:", acc_train, "Val accuracy;", acc_val)
        
    save_path = saver.save(sess, "models/tensorflow/my_model_final.cpkt:")

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use urllib or similar directly.
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting /tmp/data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
0 Train accuracy: 0.88 Val accuracy; 0.9042

## Restroing and using the model 

In [10]:
with tf.Session() as sess: 
    saver.restore(sess, "models/tensorflow/my_model_final.cpkt:")
    X_new_scaled = X_batch[:15]
    Z = logits.eval(feed_dict={X: X_new_scaled})
    y_pred = np.argmax(Z, axis=1)
    
print("predicted classes:", y_pred)
print("actual classes: ", y_batch[:15])

INFO:tensorflow:Restoring parameters from models/tensorflow/my_model_final.cpkt:
predicted classes: [9 5 1 2 2 1 9 0 9 7 5 8 8 1 5]
actual classes:  [9 5 1 2 2 1 9 0 9 7 5 8 8 1 5]


## Number of Hidden layers 

It is possible to model even the most complex functions with just one neuron layer. However MLP have a much higher parameter efficiency. Deeper layers model the basic characteristics of the dataset and higher levels model the finer detail. The deeper layers can therefore be utilised for different purposes reducing training times. 

The DNN also converge faster as the lower layers model the more basic characteristics and higher layer the finer ones. 

## Number of Neurons per HIdden Layer
as a rule of thumb you will get better accuracy increasing the number of layers than the number of neurons per layer. provided all of the layers have the same number of neurons. It is not so common anymore to funnel the neurons. Number of neurons chosen should be the same for all layers. This also reduces the number of hyper perameters that need to be tuned. 

A simple approach is to use the stretch pants method. pick a model with more layers and neurons than you need and use early stopping to prevent overfitting. 

## Actiation Functions 
in most casses the ReLu activation function in the hidden layers is faster to compute. Gradient Descent optimization does not get stuck on plateus like it would for large logit values on a sigmoid function. 

A good choice for classification tasks where the classes are mutually exclusive is to use a softmax activation function for the output layer. when they are not mutually exclusive (or when there are just two classes) it is best to use logistic function. For regression tasics no activation function at all is necessary for the output layer. 

## Question Answers 
A Classical Perceptron will converge only if the dataset is linearly separable, and it won't be able to estimate class probabilities. If you alter the perceptrons activation function to a logigistic. It will be able to converge even if the dataset is not linearly seperable. It effectively becomes a logistic classifier. 

The logistic activation function can always be used with gradient descent optimization as the derivative is always non zero.

Backpropagation is a technique used to train ANN. It first computes the gradient of the cost function. and then performs a Gradient descent step using these gradients. To compute the gradients, backpropagation uses reverse-mode autodiff. Reverse-mode autodiff performs a forward pass through a computation graph, computing every node's value for the current training batch, and then it performs a reverse pass, computing all the gradients at once.

In [55]:
tf.reset_default_graph()


def shuffle_batch(X, y, batch_size):
    rnd_idx = np.random.permutation(len(X))
    n_batches = len(X) // batch_size
    for batch_idx in np.array_split(rnd_idx, n_batches):
        X_batch, y_batch = X[batch_idx], y[batch_idx]
        yield X_batch, y_batch
        
        
        
        
        
        

n_inputs = 28*28 # MNIST
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X") #To feed batches to during training
y = tf.placeholder(tf.int64, shape=(None), name="y")

    #Creating layers
with tf.name_scope("dnn"):
    hidden1 = tf.layers.dense(X, n_hidden1, name="hidden1", activation=tf.nn.relu)
    hidden2 = tf.layers.dense(hidden1, n_hidden2, name="hidden2", activation=tf.nn.relu)
    logits = tf.layers.dense(hidden2, n_outputs, name="outputs")
    
    # cost funcition
    # xentropy is equivalent to applying the softmax activation function
    # and then computing cross entropy.
with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")
    loss_summary = tf.summary.scalar('log_loss', loss)

    # Training using GradientDescent
learning_rate = 0.01
with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss) #minimizing loss function
    

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
    accuracy_summary = tf.summary.scalar('accuracy', accuracy)

In [56]:
init = tf.global_variables_initializer()
saver = tf.train.Saver()

In [57]:
from datetime import datetime

def log_dir(prefix=""):
    now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
    root_logdir = "models/tensorflow/tf_logs"
    if prefix:
        prefix += "-"
    name = prefix + "run-" + now
    return "{}/{}/".format(root_logdir, name)

In [58]:
logdir = log_dir("mnist_dnn")

In [59]:
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

In [63]:

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
X_train = X_train.astype(np.float32).reshape(-1, 28*28) / 255.0
X_test = X_test.astype(np.float32).reshape(-1, 28*28) / 255.0
y_train = y_train.astype(np.int32)
y_test = y_test.astype(np.int32)
X_valid, X_train = X_train[:5000], X_train[5000:]
y_valid, y_train = y_train[:5000], y_train[5000:]


m, n = X_train.shape

In [64]:
n_epochs = 10001
batch_size = 50
n_batches = int(np.ceil(m / batch_size))

checkpoint_path = "models/tensorflow/my_deep_mnist_model.ckpt"
checkpoint_epoch_path = checkpoint_path + ".epoch"
final_model_path = "./my_deep_mnist_model"

best_loss = np.infty
epochs_without_progress = 0
max_epochs_without_progress = 50


In [65]:
with tf.Session() as sess:
    if os.path.isfile(checkpoint_epoch_path):
        # if the checkpoint file exists, restore the model and load the epoch number
        with open(checkpoint_epoch_path, "rb") as f:
            start_epoch = int(f.read())
        print("Training was interrupted. Continuing at epoch", start_epoch)
        saver.restore(sess, checkpoint_path)
    else:
        start_epoch = 0
        sess.run(init)
        
        
    for epoch in range(start_epoch, n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        accuracy_val, loss_val, accuracy_summary_str, loss_summary_str = sess.run([accuracy, loss, accuracy_summary, loss_summary], feed_dict={X: X_valid, y: y_valid})
        file_writer.add_summary(accuracy_summary_str, epoch)
        file_writer.add_summary(loss_summary_str, epoch)
        if epoch % 5 == 0:
            print("Epoch:", epoch,
                  "\tValidation accuracy: {:.3f}%".format(accuracy_val * 100),
                  "\tLoss: {:.5f}".format(loss_val))
            saver.save(sess, checkpoint_path)
            with open(checkpoint_epoch_path, "wb") as f:
                f.write(b"%d" % (epoch + 1))
            if loss_val < best_loss:
                saver.save(sess, final_model_path)
                best_loss = loss_val
            else:
                epochs_without_progress += 5
                if epochs_without_progress > max_epochs_without_progress:
                    print("Early stopping")
                    break

Epoch: 0 	Validation accuracy: 90.520% 	Loss: 0.35458
Epoch: 5 	Validation accuracy: 95.020% 	Loss: 0.17876
Epoch: 10 	Validation accuracy: 96.380% 	Loss: 0.12997
Epoch: 15 	Validation accuracy: 96.980% 	Loss: 0.10587
Epoch: 20 	Validation accuracy: 97.460% 	Loss: 0.09175
Epoch: 25 	Validation accuracy: 97.560% 	Loss: 0.08408
Epoch: 30 	Validation accuracy: 97.780% 	Loss: 0.07621
Epoch: 35 	Validation accuracy: 97.900% 	Loss: 0.07321
Epoch: 40 	Validation accuracy: 98.020% 	Loss: 0.06968
Epoch: 45 	Validation accuracy: 98.020% 	Loss: 0.06941
Epoch: 50 	Validation accuracy: 98.080% 	Loss: 0.06770
Epoch: 55 	Validation accuracy: 98.220% 	Loss: 0.06633
Epoch: 60 	Validation accuracy: 98.060% 	Loss: 0.06714
Epoch: 65 	Validation accuracy: 98.140% 	Loss: 0.06695
Epoch: 70 	Validation accuracy: 98.180% 	Loss: 0.06636
Epoch: 75 	Validation accuracy: 98.220% 	Loss: 0.06700
Epoch: 80 	Validation accuracy: 98.260% 	Loss: 0.06733
Epoch: 85 	Validation accuracy: 98.220% 	Loss: 0.06790
Epoch: 90 	V

In [74]:
with tf.Session() as sess:
    saver.restore(sess, final_model_path) # or better, use save_path
    X_new_scaled = X_test[:20]
    Z = logits.eval(feed_dict={X: X_new_scaled})
    y_pred = np.argmax(Z, axis=1)
    
print("Predicted values: ", y_pred)
print("Actual Values: ", y_test[:20])

INFO:tensorflow:Restoring parameters from ./my_deep_mnist_model
Predicted values:  [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4]
Actual Values:  [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4]
