*As with all my tutorials, I recommend you follow along with the coding portions on your own. That way, you will learn them better.*

To run this tutorial, you will need the MNIST handwritten digits dataset in the same folder as the notebook. Download all four compressed files from here: http://yann.lecun.com/exdb/mnist/. And then uncompress them.

In [None]:
import numpy as np
import sys
from mnist import MNIST

# Perceptrons

Artificial Neural Networks began with the first artificial neuron: the perceptron. Developed in 1958, the perceptron was based on a simplified model of the neurons we all have in our brains.

Neurons are made of **dendrites** which sense impulses from surrounding neurons and then transmit them through the **axon** to other neurons. If the input impulses pass a specific **threshold**, then the neuron outputs a signal through the **axon**. If they don't pass the threshold, the neuron outputs nothing. In neurons, this signal is all or nothing: effectively a boolean output.

The perceptron works similarly to a neuron. Instead of bundling together dendrites, a perceptron has an **input vector ($x$)**: each entry of the vector corresponds to an input impulse. Instead of the body of a neuron, the perceptron assigns specific importance to each entry of the vector, called a **weight ($w$)**. And just like a neuron, the perceptron uses a **threshhold ($b$)**, outputting $1$ if the weighted inputs pass the threshold and $0$ if the weighted inputs do not.

In its most basic form, we can think of a perceptron as a 2-way linear classifier. If you're not familiar with what that means, I suggest you check out my **Introduction to ML: Classification** doc.

### Perceptron Demo

In [None]:
def threshold_fn(x, b=0):
    return 1 if x > b else 0

class Perceptron(object):
    def __init___(self, input_dimension, threshold):
        self.b = threshold
        self.input_dim = input_dimension
        self.weights = np.random.rand(self.input_dim, 1)
        
    def predict(self, x):
        score = np.matmul(self.weights, x)
        pred = threshold_fn(score, b=self.b)
        return pred

# Artificial Neurons

Artificial neurons are very similar to the perceptron, but they are more generalized. Rather than use the simple threshold function of the perceptron, they use something called an activation function. Just like in the perceptron, an artificial neuron adds up the weighted inputs. But instead of checking if the sum passes a threshold (outputting 1 if activated, and 0 if not), artificial neurons output the result of an function, called an **activation function**, applied to the weighted sum.

We can then represent an artificial neuron as a function, where $h$ is the activation function:

$$f(x) = h(\sum_{i = 1}^N w_i \cdot x_i) = h(w \cdot x)$$

In the case of a perceptron, we use the simple threshold activation for some threshold **$b$**:

$$h(x) =
\begin{cases}
1 & \text{if $w \cdot x \geq b$} \\
0 & \text{otherwise} \\
\end{cases} $$

Although this is more expressed with a negative threshold as:

$$h(x) =
\begin{cases}
1 & \text{if $w \cdot x + b \geq 0$} \\
0 & \text{otherwise} \\
\end{cases} $$

Other activation functions exist, and are important for further applications of artificial neurons to build neural networks.

### Artificial Neuron Demo

In [None]:
class ArtificialNeuron(object):
    def __init__(self, input_dimension, activation):
        self.activation = activation
        self.input_dim = input_dimension
        self.weights = np.random.rand(self.input_dim, 1)
    
    def predict(self, x):
        scr = np.matmul(self.weights.T, x)
        pred = self.activation(scr)
        return pred

# Learning With Artificial Neurons

Take a moment to think about what an artificial neuron is doing, conceptually.

Imagine we wanted to build a machine that could classify if an animal was a dog. We would first think of all the **features** that any animal might have: tail, ears, snout, legs, fur, drool, teeth, woofing, meowing, mooing, obedience, speed, curiosity, etc. Then we would think about which of these features are important in identifying the species **label** of the animal — for instance, obedience is highly indicative of dog-ness, but mooing is not. For each feature-species pair, we can assign a weight that corresponds to their relatedness. In the case of dogs, when presented with a set of features we could multiply each feature by the appropriate dog-weight, add up the scores, and if we had a high number that would mean DOG, otherwise it would be NOT DOG.

This is basically how an artificial neuron works.

What makes artificial neurons so wonderful is that we don't have to tell it which features correspond to dog-ness and not-dogness. The weights associated with each input are variable. All we need is a clever way to have it automatically learn the relevance of each feature and update the weights accordingly.

### Data

In order to learn, we need **data**.

( If you're not familiar with machine learning, now would be a great time to read my **Introduction to ML: Data** doc. )

One of the earliest uses of these artificial neuron systems was for classifying handwritten digits — 0 through 9 — useful for reading postal codes automatically. In this example, the features would be the pixels of the image, and the labels would be the number that the image corresponded to. We will focus on a simplified version of this.

Note that this is a **supervised learning** problem, because we're focusing on a dataset where we have pairs of features and labels.

Let's import a well known dataset, the **MNIST Handwritten Digits**, and limit ourselves to using just the 0s and 1s for now. The set consists of 60k training features, 60k training labels, 10k test features, and 10k test labels.

For an explanation of how to set up these files, read: https://stackoverflow.com/a/40430149

In [None]:
mndata = MNIST('//Users/jeremiahsafe/Documents/Data/MNIST Handwritten')

trainX, trainY = mndata.load_training()
testX, testY = mndata.load_testing()

train_idx = []
for i, ty in enumerate(trainY):
    if ty == 0 or ty == 1:
        train_idx.append(i)
trainX = [trainX[i] for i in train_idx]
trainY = [trainY[i] for i in train_idx]

test_idx = []
for i, ty in enumerate(testY):
    if ty == 0 or ty == 1:
        test_idx.append(i)
testX = [testX[i] for i in test_idx]
testY = [testY[i] for i in test_idx]

We use the training set to train our algorithm. We want our computer to extract as much useful information from the training set as possible. We must COMPLETELY hide the test set from our algorithm: the whole point is to measure how well our algorithm can extrapolate patterns to unseen examples. If we expose the algorithm to the test set in any way, we're letting information leak.

*People make mistakes in separating train and test all the time. If you are writing a research paper and run your algorithm on the test set to decide whether to publish or keep working, you're still exposing information from the test set to the algorithm through your human judgment about when to publish. To avoid this problem, researchers will often set aside part of the training set, called a **holdout set**, **validation set**, or **development set** to check their algorithm's performance WITHOUT using the test set.*

To make sure we're doing things properly, we will further separate the training set into a training set (11k) and a development set (1.6k).

In [None]:
devX = trainX[11000:]
devY = trainY[11000:]
trainX = trainX[:11000]
trainY = trainY[:11000]

trainX = np.array(trainX).reshape(-1, 784, 1)
trainY = np.array(trainY).reshape(-1, 1)
devX = np.array(devX).reshape(-1, 784, 1)
devY = np.array(devY).reshape(-1, 1)
testX = np.array(testX).reshape(-1, 784, 1)
testY = np.array(testY).reshape(-1, 1)

### Making Predictions

Let's re-use our artificial neuron from before. We'll set it up to have randomly selected weights and a threshold-based activation function.

In [None]:
activation_fn = lambda x: threshold_fn(x, b=0)
model = ArtificialNeuron(input_dimension=784, activation=activation_fn)

Now, we can ask the artificial neuron to make predictions for all the MNIST images in the test set using the random initialization. We will keep track of how many we get correct, and how many we get wrong.

In [None]:
def evaluate(model, xs, ys):
    correct = 0
    incorrect = 0
    for i in range(len(xs)):
        x, y = xs[i], ys[i]
        prediction = model.predict(x)
        if prediction == y:
            correct += 1
        else:
            incorrect += 1
    print("{} correct, {} incorrect.".format(correct, incorrect))
    
evaluate(model, devX, devY)

As expected, the system did pretty poorly. Because the weights were random initially, we would not expect performance to be much different from guessing: a 50/50 split.

### Learning

What if we ran all the training data through our system, penalizing it when it did poorly and rewarding it when it did well? After each prediction on training data, we have two pieces of information to work with: the predicted label, and the true label.

In [None]:
for i in range(len(trainX)):
    x,y = trainX[i], trainY[i]
    prediction = model.predict(x)
    if prediction == y:
        continue
    elif prediction == 1:
        model.weights -= x
    else:
        model.weights += x

### Evaluation

In [None]:
evaluate(model, devX, devY)
evaluate(model, testX, testY)

Pretty sweet.

# Further Reading

- Combining multiple artificial neurons: **Artificial Neural Networks**
- How to use data: **Introduction to ML: Data**
- Activation Functions: **Experiments with Activation Functions**
- Evaluation: **Introduction to ML: Evaluation**
- Learning techniques: **Introduction to ML: Learning**
- Common supervised datasets: **World of ML: Datasets**