# Introduction to Neural Networks

Consider the following sequence of handwritten digits:

![digits](http://neuralnetworksanddeeplearning.com/images/digits.png)

Most people effortlessly recognize those digits as 504192. That ease is deceptive. In each hemisphere of our brain, humans have a primary visual cortex, also known as V1, containing 140 million neurons, with tens of billions of connections between them. We carry in our heads a supercomputer, tuned by evolution over hundreds of millions of years, and superbly adapted to understand the visual world. Recognizing handwritten digits isn't easy. Rather, we humans are stupendously, astoundingly good at making sense of what our eyes show us. But nearly all that work is done unconsciously. And so we don't usually appreciate how tough a problem our visual systems solve.

## Perceptrons

A perceptron takes several binary inputs, $x_1, x_2, \ldots$ and produces a single binary output:

![perceptron](http://neuralnetworksanddeeplearning.com/images/tikz0.png)

The neuron's output, 0 or 1, is determined by whether the weighted sum $\sum_j w_j x_j$, where $w_1, w_2, \ldots$ are real numbers expressing the importance of the respective inputs to the output is less than or greater than some threshold value. Just like the weights, the threshold is a real number which is a parameter of the neuron.

In algebraic terms:
\begin{eqnarray}
  \mbox{output} & = & \left\{ \begin{array}{ll}
      0 & \mbox{if } \sum_j w_j x_j \leq \mbox{ threshold} \\
      1 & \mbox{if } \sum_j w_j x_j > \mbox{ threshold}
      \end{array} \right.
\end{eqnarray}

## Sigmoid neurons

Suppose we have a network of perceptrons that we'd like to use to learn to solve some problem. For example, the inputs to the network might be the raw pixel data from a scanned, handwritten image of a digit. And we'd like the network to learn weights and biases so that the output from the network correctly classifies the digit. To see how learning might work, suppose we make a small change in some weight (or bias) in the network. What we'd like is for this small change in weight to cause only a small corresponding change in the output from the network. For example, suppose the network was mistakenly classifying an image as an "8" when it should be a "9". We could figure out how to make a small change in the weights and biases so the network gets a little closer to classifying the image as a "9".

The problem is that this isn't what happens when our network contains perceptrons. In fact, a small change in the weights or bias of any single perceptron in the network can sometimes cause the output of that perceptron to completely flip, say from $0$ to $1$.

We can overcome this problem by introducing a new type of artificial neuron called a sigmoid neuron. Sigmoid neurons are similar to perceptrons, but modified so that small changes in their weights and bias cause only a small change in their output.

Just like a perceptron, the sigmoid neuron has inputs, $x_1, x_2, \ldots$ But instead of being just $0$ or $1$, these inputs can also take on any values between $0$ and $1$.

Also just like a perceptron, the sigmoid neuron has weights for each input, $w_1, w_2, \ldots$ and an overall bias, $b$. But the output is not 0 or 1. Instead, it's $\sigma(w \cdot x+b)$, where $\sigma$ is called the sigmoid function, and is defined by:

\begin{eqnarray} 
  \sigma(z) \equiv \frac{1}{1+e^{-z}}.
\end{eqnarray}

The output of a sigmoid neuron with inputs $x_1, x_2, \ldots$ weights $w_1, w_2, \ldots$ and bias $b$ is:

\begin{eqnarray} 
  \frac{1}{1+\exp(-\sum_j w_j x_j-b)}.
\end{eqnarray}


<small>Ref: Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press, 2015</small>

## MNIST

### Import required modules

In [0]:
import keras
from keras.datasets import mnist
from keras.layers import Dense
from keras.models import Sequential
from keras.datasets import mnist
from keras.optimizers import SGD

Using TensorFlow backend.


### Load MNIST Dataset

In [0]:
(train_x, train_y) , (test_x, test_y) = mnist.load_data()
print(train_x.shape)
print(train_y.shape)
print(test_x.shape)
print(test_y.shape)

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
(60000, 28, 28)
(60000,)
(10000, 28, 28)
(10000,)


### Next, we flatten the image into 784 pixels (28 * 28)

We also have to convert all our labels to one hot encoded data.


In [0]:
train_x = train_x.reshape(60000,784)
test_x = test_x.reshape(10000,784)

train_y = keras.utils.to_categorical(train_y,10)
test_y = keras.utils.to_categorical(test_y,10)

Now our data is fully ready to be trained. The next thing is to define our neural network.

In [0]:
model = Sequential()
model.add(Dense(units=128,activation="relu",input_shape=(784,)))
model.add(Dense(units=128,activation="relu"))
model.add(Dense(units=128,activation="relu"))
model.add(Dense(units=10,activation="softmax"))

Next, we need to specify the specify a few components that our network needs to train on the data.

In [0]:
model.compile(optimizer=SGD(0.001),loss="categorical_crossentropy",metrics=["accuracy"])

The optimizer defines exactly the way the parameters would be updated.

Loss function helps in optimizing the parameters of the neural networks. Our objective is to minimize the loss for a neural network by optimizing its parameters(weights). The loss is calculated using loss function by matching the target(actual) value and predicted value by a neural network. Then we use the gradient descent method to optimize the weights of the network such that the loss is minimized.

The metrics parameter specifies the metrics we would like the model to return during the training.

We feed in our training images (train_x) and their labels(train_y) , we also specify a batch size to prevent processing all our training data at once.

In [0]:
model.fit(train_x,train_y,batch_size=32,epochs=10,verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f10032837f0>

These prints the accuracy of our model on the test data, it tells us how well we perform

In [0]:
accuracy = model.evaluate(x=test_x,y=test_y,batch_size=32)
print(accuracy)

[0.41174265130369403, 0.9643]
