## Chapter 1. 
###Using Neural Networks to Recognize Handwritten Digits

In this chapter we discuss some basics of neural networks, including two types of neuron analogs, the perceptron and the sigmoid neuron, a method for training networks, stochastic gradient descent, and build a simple network to recognize handwritten digits.

**Perceptrons**

Perceptrons function quite simply. They take binary inputs and combine them to provide a binary output, through a simple linear operation. For example, given inputs $x_1$, $x_2$, $x_3$, the perceptron will perform a weighted sum of these inputs $\sum_jw_jx_j$ and respond with a 0 if the result is below a threshold value or a 1 if it is above a threshold value. To improve our notation, however, we use the following for out output equation:

$$
\text{output}=
\begin{cases}
0, \mbox{ if } \mathbf{w}^T\mathbf{x} + b \leq 0 \\
1, \mbox{ if } \mathbf{w}^T\mathbf{x} + b > 0,
\end{cases}
$$

where $b$ is the *bias,* which is the negative of our threshold.

One useful feature of the perceptron is that it can compute any logical function, for example AND, OR, or NAND by adjusting the weights and threshold appropriately. A NAND gate can be constructed, for example, by taking the two weights to be -2 and the bias to be +3.

In [10]:
def nand(x1, x2):
    output = 1
    if (-2*x1 + -2*x2 + 3) < 0:
        output = 0
    return output

In [13]:
nand(1,1)

0

One major shortcoming of the perceptron is that it is not clear how to get a network of perceptrons to learn in a stable manner. That is, ideally if the network's output is incorrect, we could slightly modify some subset of the weights to produce a small change in the output. However, weight modification in perceptron networks is a risky business, as the output is likely to flip from 0 to 1 with slight weight modifications and to have widespread effects on the output class for a variety of inputs.

**Sigmoid Neurons**

Sigmoid neurons, also called logistic neurons, overcome this problem by taking any value in the range of 0 to 1. A sigmoid neuron's output is given by

$$
\sigma(z) = \frac{1}{1+e^{-z}} = \frac{1}{1+e^{-(\mathbf{w}^T\mathbf{x}+b)}}
$$

Using the sigmoid *activation function* now allows for small changes in weights and biases to have small effects on the output.

$$
\Delta y \approx \sum_j\frac{\partial y}{\partial w_j}\Delta w_j + \frac{\partial y}{\partial b}\Delta b
$$