# Chapter 10: Introduction to Artificial Neural Networks

## Basic ANN (Perceptron)

A perceptron is one of the simplest ANN architectures. 
* Each input is represented by a neuron.
* Each output also has a neuron.
* Every input neuron is connected to every output neuron.

Then, an edge between input neuron $i$ and output neuron $j$ has weight $w_{i,j}$.

Thus, the output/hypothesis $j$ can be calculated as: 

$h_w^j(x) = step(\sum\limits_{i=1,...,n}w_{i,j}x_i) = step(w_{*,j}^T x)$

Where step maps a real number to the 0, 1 and/or -1. For instance, step could be either:

$heavyside(z) = \begin{cases} 0 & \text{ if } z < 0 \\ 1 & otherwise \end{cases}$
$sgn(z) = \begin{cases} -1 & \text{ if } z < 0 \\ \ 0 & \text{ if } z = 0 \\ \ 1 & otherwise \end{cases}$

The learning rule is:

$w_{i,j}' = w_{i,j} + r (\hat y_j - y_j)x_i$ ($r$ the learning rate).

#### Note:
* Cannot learn complex patterns.
* With these activation function, only suitable for classification.
* If the data is linearly separable, then it converges according to Rosenblastt's theorem.

In [6]:
import numpy as np
from sklearn.datasets import load_iris 
from sklearn.linear_model import Perceptron

# Load dataset
iris = load_iris()
X = iris.data[:, (2, 3)] # petal length, petal width 
y = (iris.target == 0).astype(np.int) # Iris Setosa?

# Train classifier
per_clf = Perceptron(random_state=42, max_iter=1000, tol=1e-3)
per_clf.fit(X, y)

# Example
print("Target", y[10:100])
print("Prediction", per_clf.predict(X[10:100]))

Target [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
Prediction [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]


## Basic DNN (Multi-layer perceptron)

* Concept: stacking multiple perceptrons.
* Also called a DNN (at least one hidden layer).
* Training: for each training instance the backpropagation algorithm first makes a prediction (forward pass), measures the error, then goes through each layer in reverse to measure the error contribution from each connection (reverse pass), and finally slightly tweaks the connection weights to reduce the error (Gradient Descent step).

### Activation function

Instead of the step function, and in order to implement Gradient Descent to update the weights of the network, the following activation functions were proposed:

* Logistic: $\sigma(z) = \frac{1}{1+e^{-z}}$
* Hyperbolic tangent: $tanh(z)=2\sigma(2z)-1$
* Rectified Linear Unit: $ReLU(z)=max(0,z)$

In [16]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

# Load dataset
digits = load_digits()
X = digits.data
y = digits.target
X_train, X_test, y_train, y_test = train_test_split(
    X, 
    y, 
    test_size=0.15, 
    random_state=42)

# Train classifier
import tensorflow as tf
feature_columns = tf.contrib.learn.infer_real_valued_columns_from_input(X_train)
dnn_clf = tf.contrib.learn.DNNClassifier(
    hidden_units=[300, 100], 
    n_classes=10,
    feature_columns=feature_columns)
dnn_clf.fit(
    x=X_train, 
    y=y_train, 
    batch_size=50, 
    steps=2000)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_summary_steps': 100, '_save_checkpoints_steps': None, '_log_step_count_steps': 100, '_evaluation_master': '', '_eval_distribute': None, '_model_dir': '/var/folders/_5/tlg_1lsd5r76sx6dp5b399n00000gn/T/tmput1hh_3f', '_task_id': 0, '_keep_checkpoint_max': 5, '_num_ps_replicas': 0, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x12e4df208>, '_master': '', '_tf_random_seed': None, '_is_chief': True, '_train_distribute': None, '_num_worker_replicas': 0, '_device_fn': None, '_session_config': None, '_task_type': None, '_save_checkpoints_secs': 600, '_keep_checkpoint_every_n_hours': 10000, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_protocol': None, '_environment': 'local'}
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving c

DNNClassifier(params={'dropout': None, 'embedding_lr_multipliers': None, 'hidden_units': [300, 100], 'gradient_clip_norm': None, 'activation_fn': <function relu at 0x1267b3ea0>, 'feature_columns': (_RealValuedColumn(column_name='', dimension=64, default_value=None, dtype=tf.float64, normalizer=None),), 'input_layer_min_slice_size': None, 'head': <tensorflow.contrib.learn.python.learn.estimators.head._MultiClassHead object at 0x12e4df390>, 'optimizer': None})

In [17]:
# Example
print("Target", y_test[10:100])
print("Prediction", list(dnn_clf.predict(X_test[10:100])))

Target [1 9 4 0 4 2 3 7 8 8 4 3 9 7 5 6 3 5 6 3 4 9 1 4 4 6 9 4 7 6 6 9 1 3 6 1 3
 0 6 5 5 1 9 5 6 0 9 0 0 1 0 4 5 2 4 5 7 0 7 5 9 5 5 4 7 0 4 5 5 9 9 0 2 3
 8 0 6 4 4 9 1 2 8 3 5 2 9 0 4 4]
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/_5/tlg_1lsd5r76sx6dp5b399n00000gn/T/tmput1hh_3f/model.ckpt-2000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Prediction [1, 9, 4, 0, 4, 2, 3, 7, 8, 8, 4, 3, 9, 7, 5, 6, 3, 5, 6, 3, 4, 9, 1, 4, 4, 6, 9, 4, 7, 6, 6, 9, 1, 3, 6, 1, 3, 0, 6, 5, 5, 1, 9, 5, 6, 0, 9, 0, 0, 1, 0, 4, 5, 2, 4, 5, 7, 0, 7, 5, 9, 5, 5, 4, 7, 0, 4, 5, 5, 9, 9, 0, 2, 3, 8, 0, 6, 4, 4, 9, 1, 2, 8, 3, 5, 2, 9, 0, 4, 4]


In [18]:
# Evaluate accuracy
dnn_clf.evaluate(X_test, y_test)

Instructions for updating:
Estimator is decoupled from Scikit Learn interface by moving into
separate class SKCompat. Arguments x, y and batch_size are only
available in the SKCompat class, Estimator will only accept input_fn.
Example conversion:
  est = Estimator(...) -> est = SKCompat(Estimator(...))
Instructions for updating:
Estimator is decoupled from Scikit Learn interface by moving into
separate class SKCompat. Arguments x, y and batch_size are only
available in the SKCompat class, Estimator will only accept input_fn.
Example conversion:
  est = Estimator(...) -> est = SKCompat(Estimator(...))
INFO:tensorflow:Starting evaluation at 2019-05-21T14:29:49Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/_5/tlg_1lsd5r76sx6dp5b399n00000gn/T/tmput1hh_3f/model.ckpt-2000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-05-21-14:29:49
INFO:tensorflow:Saving dict for gl

{'accuracy': 0.9777778, 'global_step': 2000, 'loss': 0.08264495}

## Custom DNN (using TF lower-level API)

In [38]:
import tensorflow as tf
from tensorflow.contrib.layers import fully_connected

n_inputs = 64
n_hidden1 = 300 
n_hidden2 = 100 
n_outputs = 10

# Create a nn layer: Input: (X: placeholder, n_neurons: int, name: string, activation: string)
def neuron_layer(X, n_neurons, name, activation=None): 
    with tf.name_scope(name):
        n_inputs = int(X.get_shape()[1])
        stddev = 2 / np.sqrt(n_inputs)
        init = tf.truncated_normal((n_inputs, n_neurons), stddev=stddev) 
        W = tf.Variable(init, name="weights")
        b = tf.Variable(tf.zeros([n_neurons]), name="biases")
        z = tf.matmul(X, W) + b
        if activation=="relu":
            return tf.nn.relu(z) 
        else:
            return z

# Create nn using the nn layer creation function 'neuron_layer' defined above
def create_nn_1(X):
    with tf.name_scope("dnn"):
        hidden1 = neuron_layer(X, n_hidden1, "hidden1", activation="relu") 
        hidden2 = neuron_layer(hidden1, n_hidden2, "hidden2", activation="relu") 
        logits = neuron_layer(hidden2, n_outputs, "outputs")
        return logits

# Create nn using TF's 'fully_connected' function
def create_nn_2(X):
    with tf.name_scope("dnn"):
        hidden1 = fully_connected(X, n_hidden1, scope="hidden1") 
        hidden2 = fully_connected(hidden1, n_hidden2, scope="hidden2") 
        logits = fully_connected(hidden2, n_outputs, scope="outputs", activation_fn=None)
        return logits
    
# Calculate loss as mean of cross entropy
def calc_loss(y,logits):
    with tf.name_scope("loss"):
        xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
        loss = tf.reduce_mean(xentropy, name="loss")
        return loss

# Create optimizer
def create_optimizer(X,y,learning_rate,loss,logits):
    with tf.name_scope("train"):
        optimizer = tf.train.GradientDescentOptimizer(learning_rate) 
        training_op = optimizer.minimize(loss)
    with tf.name_scope("eval"):
        correct = tf.nn.in_top_k(logits, y, 1)
        accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
    return training_op, accuracy

In [40]:
# Run DNN
from sklearn.model_selection import ShuffleSplit
tf.reset_default_graph()
n_inputs = 64
n_hidden1 = 300 
n_hidden2 = 100 
n_outputs = 10

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int64, shape=(None), name="y")
logits = create_nn_2(X)
loss = calc_loss(y,logits)
training_op, accuracy = create_optimizer(X,y,0.1,loss,logits)

init = tf.global_variables_initializer()
saver = tf.train.Saver()
n_epochs = 200
batch_size = 50
n_batches = (X_train.shape[0] // batch_size)

with tf.Session() as sess: 
    init.run()
    for epoch in range(n_epochs):
        batches = ShuffleSplit(n_splits=n_batches, test_size=0.0, random_state=42)
        for train_index, _ in batches.split(X_train):
            X_batch, y_batch = X_train[train_index], y_train[train_index]
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        if epoch%50 == 0:
            acc_train = accuracy.eval(feed_dict={X: X_train, y: y_train})
            acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test}) 
            print(epoch, "Train accuracy:", acc_train, "Test accuracy:", acc_test)
    save_path = saver.save(sess, "./my_model_final.ckpt")

0 Train accuracy: 0.7924034 Test accuracy: 0.8185185
50 Train accuracy: 1.0 Test accuracy: 0.9777778
100 Train accuracy: 1.0 Test accuracy: 0.9814815
150 Train accuracy: 1.0 Test accuracy: 0.9814815


In [41]:
# Restore model
with tf.Session() as sess:
    saver.restore(sess, "./my_model_final.ckpt")
    Z = logits.eval(feed_dict={X: X_test})
    y_pred = np.argmax(Z, axis=1)

INFO:tensorflow:Restoring parameters from ./my_model_final.ckpt


In [43]:
print("Target", y_test[:10])
print("Prediction", y_pred[:10])

Target [6 9 3 7 2 1 5 2 5 2]
Prediction [6 9 3 7 2 1 5 2 5 2]
