<a href="https://colab.research.google.com/github/rahiakela/deep-learning--from-basics-to-practice/blob/23-keras-part-1/2_making_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Making the Model

##Setup

In [1]:
from keras.datasets import mnist
from keras import backend as Keras_backend
from keras.utils.np_utils import to_categorical
import matplotlib.pyplot as plt

import numpy as np

from keras import backend as keras_backend
keras_backend.set_image_data_format('channels_last')

Using TensorFlow backend.


In [2]:
from keras.datasets import mnist
from keras import backend as keras_backend

# load MNIST data and save sizes
(X_train, y_train), (X_test, y_test) = mnist.load_data()

image_height = X_train.shape[1]
print(f'image_height = {image_height}')
image_width = X_train.shape[2]
print(f'image_width = {image_width}')
number_of_pixels = image_height * image_width
print(f'number_of_pixels = {number_of_pixels}')
print()

# convert to floating-point
X_train = keras_backend.cast_to_floatx(X_train)
X_test = keras_backend.cast_to_floatx(X_test)
print(f'Before scalling: \n {X_train[:1]}')
print()

# scale data to range [0, 1]
X_train /= 255.0
X_test /= 255.0
print(f'After scalling: \n {X_train[:1]}')
print()

# save the original y_train and y_test
original_y_train = y_train
original_y_test = y_test

# replace label data with one-hot encoded versions
number_of_classes = 1 + max(np.append(y_train, y_test)).astype(np.int32)
print(f'number_of_classes: {number_of_classes}')

# encode each list into one-hot arrays of the size we just found
y_train = to_categorical(y_train, num_classes=number_of_classes)
y_test = to_categorical(y_test, num_classes=number_of_classes)

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
image_height = 28
image_width = 28
number_of_pixels = 784

Before scalling: 
 [[[  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
     0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.]
  [  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
     0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.]
  [  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
     0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.]
  [  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
     0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.]
  [  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
     0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.]
  [  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   3.  18.
    18.  18. 126. 136. 175.  26. 166. 255. 2

##Motivation

The beauty of model-making in Keras is that creating the structure of
our model (that is, our neural network’s architecture) is streamlined.
There are only two steps.

First, we name the layers we want in the order we want them. This is
called specifying the model.

Second, we tell Keras how to use this model to learn. We tell it which
loss function and optimizer to use, and what data we’d like it to collect
along the way. This is called compiling the model. The compilation
step converts our specification into code that runs on the backend
we’ve chosen.

Our first model for classifying MNIST data will be simple. It will have
an input layer (which is implicit in every network), a single hidden
layer, and an output layer. The hidden and output layers will both
be fully-connected, or dense, layers.

<img src='https://github.com/rahiakela/img-repo/blob/master/simple-deep-learning-model.JPG?raw=1' width='800'/>

We’ve decided to set up our first layer to have a single neuron for each
pixel. This is a common way to configure the first layer, but it’s definitely
not required. We could use 5 neurons or 5000 if we thought that
would produce better results.

Using this “one neuron per input pixel” approach for our 28 by 28
images, our first layer requires 28×28=784 neurons.

A full-connected layer can only take
in a 1D list. There’s no processing inside of a dense layer would let
it figure out how to get at the pixels in a 2D data structure. We’ll see
later that convolution layers have that processing, so we can give them
grids directly. But right now we’re using a dense layer, and the input to
a dense layer is a list.

So we need to convert each input sample of 28 by 28 pixels into a 1D
list of 784 values.

## Turning Grids into Lists

There are at least two ways to do this. The first is to build it right into
our neural network, using the Reshape utility layer provided by Keras.
The second is to reshape the data ourselves before training.

To convert our images into a list, we’ll convert our starting 3D input
data into a 2D grid. Each row of the grid is one sample, made up of a
list of 784 features.

<img src='https://github.com/rahiakela/img-repo/blob/master/2-d-rid.JPG?raw=1' width='800'/>

There are two ways to use reshape(). Let’s first
use the version where we call it from Numpy and pass it the array we’re
reshaping as the first argument.

The second argument to reshape() is a list with the new dimensions.
In this case, the second argument is the list [60000, 748].

In [0]:
# reshape samples to 2D grid, one line per image
X_train = np.reshape(X_train, [X_train.shape[0], number_of_pixels])
X_test = np.reshape(X_test, [X_test.shape[0], number_of_pixels])

The other way to call reshape() is to call it as a
method on the object being reshaped. In this case, the only necessary
argument is the list containing the new dimensions.

In [0]:
# reshape samples to 2D grid, one line per image
X_train = X_train.reshape([X_train.shape[0], number_of_pixels])
X_test = X_test.reshape([X_test.shape[0], number_of_pixels])

Both of these variations produce the same results, so we can use
whichever one we prefer. We’ll use the shorter, second version in the
following discussion.

## Creating the Model

We start by telling Keras the overall architecture of our model. Our
choices are basically “a list of layers,” and “anything else.”

The “list of layers” architecture is called the Sequential model. That’s
perfect for us, since our architecture of Figure 23.21 is just two dense
layers one after the other. In other words, they can be described as a
2-element list starting with the hidden layer and ending with the output
layer.

The “anything else” architecture is called the Functional model. This
is more flexible than the Sequential model, but requires a little more
work from us. We’ll come back to the Functional model later.

We build a model in the Sequential style using the Sequential API,
which is a collection of library calls designed to make this process easy.
The beauty of the Sequential API is that to create our model we just name our layers in order from start to finish.

The first time we add a layer to our model, Keras will automatically create
an input layer for us to hold the incoming data. Then it places our
new layer after that. We could stop right there if we wanted, and that
would be a 1-layer neural network (remember that we usually don’t
count the input layer, since it doesn’t do any processing).

But we can keep going, and add as many more layers as we like. Each
new layer takes its input from the most recently added layer. The last
layer we add in is implicitly our output layer. We never explicitly say
that we’re starting or ending. We just add in layers until we’re done.

In [6]:
from keras.models import Sequential

model = Sequential()




Let’s start building our model. The first layer is always the input layer.
But recall that the input layer is implicit. We don’t usually draw it, or
count it, and in the Sequential model we usually don’t even explicitly
make it.

This is fine, because the input layer does nothing but hold the feature
list for a sample. So the only thing we need to tell Keras about the input
layer is how big that list should be, and it will make the appropriate
storage for us.

We tell Keras the size of the input layer with an optional argument
called input_shape. We pass a value to this argument in the first layer
only. In other words, this argument must be included when we make
our first layer, but must not be in any others. Every type of layer that
can serve as the first layer in a sequence (including the fully-connected
layer we’ll be using), takes input_shape as an optional parameter.

Keras calls a fully-connected layer a dense layer. Note that here the
word “dense” refers to how the layer connects to the layer that precedes
it.

<img src='https://github.com/rahiakela/img-repo/blob/master/schematic-view-ense-layer.JPG?raw=1' width='800'/>

A schematic view of a Dense layer. The three colored
neurons make up the dense layer. Each of them connects to every neuron
in the preceding layer (in gray). When we create this layer, we’re only
declaring the nature of its connections to the layer before it, and we’re
saying nothing about what happens to its outputs.

To add a dense layer to our model, we create a Dense object and then
append it to the end of our model’s sequence of layers.

The necessary first argument is the size of the layer. This is just the
number of neurons. This can be, and often is, different from the number
of nodes in the preceding layer.

<img src='https://github.com/rahiakela/img-repo/blob/master/fully-connected-layer.JPG?raw=1' width='800'/>

Our fully-connected layer is shown with colored neurons,
connecting to a previous layer with gray neurons. The number of neurons
in the fully-connected layer is independent of the number of neurons in
the layer that precedes it.

The first optional argument we’ll use tells Keras which activation unit
to place after each neuron in the layer. We can specify any one of the
functions built into Keras (and, as usual, listed in the documentation)
by supplying a string. Common choices are ′relu′ and ′tanh′
for the ReLU and tanh functions in hidden layers, and ′softmax′ or
′sigmoid′ for the output layer. The default is ′None′, or the linear
activation function, so for internal layers we’ll almost always want to
specify one of the other choices.

The second optional argument we’ll use is input_shape, which defines
the size of each dimension in the input. As we saw above, we use this
only for the very first layer in a model. The value of this argument is a
list that tells Keras to build an input layer of the given shape and size,
which must match the shape and size of each sample we’ll be providing.

Since each of our samples (after processing) is a 1D list of 784 numbers,
we’ll tell Keras that our input_shape is a 1D list of 784 numbers (using
the variable number_of_pixels that we saved during pre-processing).

In [9]:
from keras.layers import Dense

# create the Dense layer
dense_layer = Dense(number_of_pixels, activation='relu', input_shape=[number_of_pixels])

# append our layer to the list of layers in model
model.add(dense_layer)





But it’s conventional to create the layer and add it to the model in a
single line.

In [0]:
model.add(Dense(number_of_pixels, activation='relu', input_shape=[number_of_pixels]))

Now we can add the next layer of our model. This will be another Dense
layer, but with 10 neurons.

We create our next Dense layer much like the previous one, but with
a few changes. In particular, we leave out the input_shape argument,
since that is only for the very first layer.

As always, the first argument, which is un-named and mandatory, is
the number of neurons. Since we’re categorizing our images into 10
classes, we’ll have 10 neurons, one for each class. We’ll use the variable
number_of_classes that we saved during pre-processing.

we often use softmax to process the outputs
of a final dense layer in a classifier in order to turn them into probabilities.
Let’s do that here. We need only name it as a string, and Keras
will take care of the rest.

In [0]:
model.add(Dense(number_of_classes, activation='softmax'))

Keep in mind that because this layer is fully-connected to the previous
layer, each of these 10 nodes receives inputs from all 784 nodes in the
hidden layer.

That’s the whole thing. We’ve built a deep-learning model!

In [0]:
from keras.models import Sequential

model = Sequential()
model.add(Dense(number_of_pixels, activation='relu', input_shape=[number_of_pixels]))
model.add(Dense(number_of_classes, activation='softmax'))

That’s all there is to it! Our model is complete!

We can ask Keras to print out the model in text form. This isn’t terribly
revealing for our simple example, but it can come in useful for much
larger models with tens or hundreds of layers. We call the model’s
summary() method.This printout lists the layers
in the order they were placed into the network, so we read it top-down.

In [14]:
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_5 (Dense)              (None, 784)               615440    
_________________________________________________________________
dense_6 (Dense)              (None, 10)                7850      
Total params: 623,290
Trainable params: 623,290
Non-trainable params: 0
_________________________________________________________________


The column labeled “Output Shape” tells us the shape of the tensor
that comes out of each layer, in the form of a list of dimensions. When
we see None as an entry here, this is a placeholder for the number of
samples that are provided as a mini-batch during training. 

For example, if we have a mini-batch size of 64, then the first layer will process 64 of our samples in one shot (using the GPU if it can). The output
will be a list containing 64 rows, each with 784 elements. But since
right now Keras doesn’t know the size of the mini-batch, it uses None
to stand for “Not Yet Known.”

The summary also tells us how many parameters, or weights, are used
by each layer, and then it adds those up to tell us the total number
of parameters in the model. 

We can see that:-
* dense_1, the first Dense layer, has 784 neurons, each of which reads the value of each of the 784
inputs. Since each connection has a weight, there are 784×784=614,656
weights. Each neuron also has a bias term, so adding the 784 bias terms
to the number we just got gives us the 615,440 in the table. That’s a lot
of weights! 
* Similarly, the second layer has 10 neurons, each with a connection
to each of the 784 neurons in the previous layer. Remembering
to add the 10 bias terms, we get (10×784)+10, or 7,850 parameters.

The final line adds these numbers together, telling us that the complete
model has over 600,000 parameters.

## Compiling the Model

So far, our model is nothing more than a list of specifications. It’s a
potential model, but it’s no more a real model than blueprints for a
house are a real house. That house has to be built from the blueprints.
In our case, we need to turn our description into running code. We call
this compiling the model. When our model is compiled, it’s ready for
training.

To compile the model, we need to give Keras at least two pieces of
information.
* First, we have to tell Keras how to measure the error for each sample
(that is, how to put a number to any difference between the network’s
output and the target we want it to produce). 
* Second, we have to tell
it which optimizer it should use to update the weights to reduce that
error. Let’s look at these in turn.

To measure the quality of the weights we need a loss (or cost) function.

That function will compare the one-hot label with the outputs from
our final layer. This comparison uses the idea of entropy to determine how close our match is. The name of the
loss function we want combines these two ideas into the long string
′categorical_crossentropy′.

If we have just two categories, and we’re using one output to decide
between them (perhaps setting it to a value near 0 for one category
and a value near 1 for the other), the function that evaluates the error
for that case is named ′binary_crossentropy′.

Happily, our goal here is basic categorization using multiple outputs,
so we can use the pre-built ′categorical_crossentropy′ loss. That
tells the network that we want the network’s outputs to match the
numbers in our one-hot label as closely as possible.

With the loss function selected, our next job is to pick the optimizer.
Once the error has been computed, Keras gives it to the optimizer,
which uses that error to update the weights. We saw a variety of optimizers
in Chapter 19, with names like SGD, RMSprop, and Adagrad.

There are many other optional pieces of information we can give to
Keras when we compile our model. One of the most common is to provide
a list of measurements, called metrics, telling Keras what we’d
like it to measure as the model learns. We can think of these metrics as
supplemental error or loss functions, but they’re only computed and
returned to us as helpful information for understanding and monitoring
the learning process, and are not used to update the model.

We compile our model by calling our model’s compile() method. This
builds everything that the model needs to actually run on our computer
with our chosen backend. Because this information is saved
along with the model object, we don’t have to save anything ourselves.


In [15]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])





Compiling a model with its compile() method and our
arguments. We’re choosing the ′categorical_crossentropy′ loss
function and the ′adam′ optimizer. Using these strings is a shorthand
for creating the corresponding objects with their defaults. We’re also
telling it that we’ll want it to measure and return the ′accuracy′ once
we start training.

If we think we’re close but things could be better, we might decide to
create a custom optimizer and set some of the parameters to something
other than the defaults.

When we create our optimizer using the string
′adam′, we’re asking for an instance of the Adam
optimizer with all of its default values. To set some of those values ourselves,
we make our own instance of an Adam object where we specify
whatever parameters we want to give values to, leaving all the others
at their defaults. We then hand that object to compile(), instead of
giving it a string.

In [0]:
from keras import optimizers

slow_adam = optimizers.Adam(lr=0.0001)
model.compile(loss='categorical_crossentropy', optimizer=slow_adam, metrics='accuracy')

The loss functions don’t take parameters, so unless we’re using a custom
function that we wrote ourselves, we usually provide a string
naming one of the built-in functions.

## Model Creation Summary

We started out with how to create a new model. We began by creating
an empty Sequential object. Then we added a dense, or fully-connected,
hidden layer that also specified the shape of the input layer.
We finished with another dense layer that produced 10 outputs, one
for each category.

In [0]:
from keras.models import Sequential
from keras.layers import Dense

def make_one_hidden_layer_model():

  # create an empty model
  model = Sequential()

  # add a fully-connected hidden layer with #nodes = #pixels
  model.add(Dense(number_of_pixels, activation='relu', input_shape=[number_of_pixels]))

  # add an output layer with softmax activation
  model.add(Dense(number_of_classes, activation='softmax'))

  # compile the model to turn it from specification to code
  model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

  return model

# make the model
model = make_one_hidden_layer_model()  

Building our model took only three lines of code. Compiling it took
only one. And now we’ll see that training the system also takes only one
line. But as we’ve seen, each of these lines packs in a lot of information.

Now we’re ready to hand our prepared data to our compiled model
and start learning.