It is a Jupyter Notebook showing (not explaining!) basics of neural networks with Keras, using [MNIST dataset](http://yann.lecun.com/exdb/mnist/).

It assumes:

* Python 3.5+ with Jupyter Notebook (e.g. [Anaconda distribution](https://www.continuum.io/downloads))
* [Keras](https://keras.io/) 2.x with [TensorFlow](https://www.tensorflow.org/) 1.x backend

Optionally, install [stared/keras-sequential-ascii](https://github.com/stared/keras-sequential-ascii) to see architecture visualizaitons:

* `pip install git+git://github.com/stared/keras-sequential-ascii.git`

In [1]:
%matplotlib inline
# import seaborn as sns

from keras.datasets import mnist
from keras import utils
import numpy as np

from keras.models import Sequential
from keras.layers import Flatten, Dense, Activation
from keras.layers import Conv2D, MaxPool2D

Using TensorFlow backend.


In [2]:
from keras_sequential_ascii import sequential_model_to_ascii_printout

In [3]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz


In [4]:
# (number of examples, x, y)
X_train.shape

(60000, 28, 28)

In [5]:
X_test.shape

(10000, 28, 28)

In [6]:
# 3 -> [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.]
Y_train = utils.to_categorical(y_train)
Y_test = utils.to_categorical(y_test)

In [7]:
# we need to add channel dimension (for convolutions)

# TensorFlow backend
X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255.
X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255.

# for Theano backend, it would be:
# X_train = X_train.reshape(-1, 1, 28, 28).astype('float32') / 255.
# X_test = X_test.reshape(-1, 1, 28, 28).astype('float32') / 255.

## Logistic Regression

Multi-class [logistic regression](https://en.wikipedia.org/wiki/Logistic_regression), expressed in Keras.

In [8]:
model = Sequential()

model.add(Flatten(input_shape=(28, 28, 1)))  # for Theano: (1, 28, 28)
model.add(Dense(10))
model.add(Activation('softmax'))

model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

In [9]:
sequential_model_to_ascii_printout(model)

           OPERATION           DATA DIMENSIONS   WEIGHTS(N)   WEIGHTS(%)

               Input   #####     28   28    1
             Flatten   ||||| -------------------         0     0.0%
                       #####         784
               Dense   XXXXX -------------------      7850   100.0%
             softmax   #####          10


In [10]:
# look at validation scores
model.fit(X_train, Y_train,
          epochs=10,
          validation_data=(X_test, Y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x128331f28>

## One hidden layer

Old-school neural networks.

In [11]:
model = Sequential()

model.add(Flatten(input_shape=(28, 28, 1)))
model.add(Dense(128))
model.add(Activation('sigmoid'))
model.add(Dense(10))
model.add(Activation('softmax'))

model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

In [12]:
sequential_model_to_ascii_printout(model)

           OPERATION           DATA DIMENSIONS   WEIGHTS(N)   WEIGHTS(%)

               Input   #####     28   28    1
             Flatten   ||||| -------------------         0     0.0%
                       #####         784
               Dense   XXXXX -------------------    100480    98.7%
             sigmoid   #####         128
               Dense   XXXXX -------------------      1290     1.3%
             softmax   #####          10


In [13]:
model.fit(X_train, Y_train,
          epochs=10,
          validation_data=(X_test, Y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x12916bac8>

## Convolutional network

Not yet that deep.

In [17]:
model = Sequential()

model.add(Conv2D(16, (3, 3), input_shape=(28, 28, 1)))
model.add(Activation('relu'))
model.add(MaxPool2D())

model.add(Flatten())
model.add(Dense(10))
model.add(Activation('softmax'))

model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

In [18]:
sequential_model_to_ascii_printout(model)

           OPERATION           DATA DIMENSIONS   WEIGHTS(N)   WEIGHTS(%)

               Input   #####     28   28    1
              Conv2D    \|/  -------------------       160     0.6%
                relu   #####     26   26   16
        MaxPooling2D   Y max -------------------         0     0.0%
                       #####     13   13   16
             Flatten   ||||| -------------------         0     0.0%
                       #####        2704
               Dense   XXXXX -------------------     27050    99.4%
             softmax   #####          10


In [19]:
model.fit(X_train, Y_train,
          epochs=10,
          validation_data=(X_test, Y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x129b8f7b8>

## Ideas

* Add one more `Conv2D` and `MaxPool2D` layer.
* Add one more dense layer (with `relu` activation) before.