# Theano, TensorFlow, Keras and Multi layer Perceptrons

## 2.1 What is Theano?
Theano is an open source project released under the BSD license and was developed by the LISA
(now MILA http://mila.umontreal.ca/) group at the University of Montreal, Quebec, Canada.
It is named after a Greek mathematician. At it’s heart Theano is a compiler for mathematical
expressions in Python. It knows how to take your structures and turn them into very efficient
code that uses NumPy, efficient native libraries like BLAS and native code to run as fast as
possible on CPUs or GPUs.

It uses a host of clever code optimizations to squeeze as much performance as possible from
your hardware. If you are into the nitty-gritty of mathematical optimizations in code, check out
this interesting list (http://deeplearning.net/software/theano/optimizations.html#optimizations) . The actual syntax of Theano expressions is symbolic, which can be off
putting to beginners. Specifically, expression are defined in the abstract sense, compiled and
later actually used to make calculations.

Theano was specifically designed to handle the types of computation required for large
neural network algorithms used in deep learning. It was one of the first libraries of its kind
(development started in 2007) and is considered an industry standard for deep learning research
and development.

## 2.2 How to Install Theano
Theano provides extensive installation instructions for the major operating systems: Windows,
OS X and Linux. Read the Installing Theano guide for your platform (http://deeplearning.net/software/theano/install.html) . Theano assumes a
working Python 2 or Python 3 environment with SciPy. There are ways to make the installation
easier, such as using Anaconda (https://www.continuum.io/downloads) to quickly setup Python and SciPy on your machine as well
as using Docker images. With a working Python and SciPy environment, it is relatively
straightforward to install Theano using pip, for example:

``` sudo pip install Theano ```

New releases of Theano may be announced and you will want to update to get any bug fixes
and efficiency improvements. You can upgrade Theano using pip as follows:

``` sudo pip install --upgrade --no-deps theano ```

You may want to use the bleeding edge version of Theano checked directly out of GitHub.
This may be required for some wrapper libraries that make use of bleeding edge API changes.
You can install Theano directly from a GitHub checkout as follows:

``` sudo pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git ```

You are now ready to run Theano on your CPU, which is just fine for the development of
small models. Large models may run slowly on the CPU. If you have a Nvidia GPU, you may
want to look into configuring Theano to use your GPU. There is a wealth of documentation of
the Theano homepage for further configuring the library.

### 1.3 Simple Theano Example
In this section we demonstrate a simple Python script that gives you a flavor of Theano. In this
example we define two symbolic floating point variables a and b. We define an expression that
uses these variables (c = a + b). We then compile this symbolic expression into a function using
Theano that we can use later. Finally, we use our compiled expression by plugging in some real
values and performing the calculation using efficient compiled Theano code under the covers.

In [5]:
# Example of Theano library
import theano
from theano import tensor
# declare two symbolic floating-point scalars
a = tensor.dscalar()
b = tensor.dscalar()
# create a simple symbolic expression
c = a + b
# convert the expression into a callable object that takes (a,b) and computes c
f = theano.function([a,b], c)
# bind 1.5 to ✬ a ✬ , 2.5 to ✬ b ✬ , and evaluate ✬ c ✬
result = f(1.5, 2.5)

4.0


Running the example prints the output 4, which matches our expectation that 1.5 + 2.5 = 4.0.
This is a useful example as it gives you a flavor for how a symbolic expression can be defined,
compiled and used. You can see how this may be scaled up to large vector and matrix operations
required for deep learning.

### 1.4 Extensions and Wrappers for Theano
If you are new to deep learning you do not have to use Theano directly. In fact, you are highly
encouraged to use one of many popular Python projects that make Theano a lot easier to use
for deep learning. These projects provide data structures and behaviors in Python, specifically
designed to quickly and reliably create deep learning models whilst ensuring that fast and
efficient models are created and executed by Theano under the covers. The amount of Theano
syntax exposed by the libraries varies.

Keras is a wrapper library that hides Theano completely and provides a very simple API to
work with to create deep learning models. It hides Theano so well, that it can in fact run as a
wrapper for another popular foundation framework called TensorFlow (discussed next).

### 1.5 More Theano Resources
Looking for some more resources on Theano? Take a look at some of the following.
1. Theano Official Homepage
http://deeplearning.net/software/theano/
2. Theano GitHub Repository
https://github.com/Theano/Theano/
3. Theano: A CPU and GPU Math Compiler in Python (2010)
http://www.iro.umontreal.ca/~lisa/pointeurs/theano_scipy2010.pdf
4. List of Libraries Built on Theano
https://github.com/Theano/Theano/wiki/Related-projects
5. List of Theano configuration options
http://deeplearning.net/software/theano/library/config.html

### 1.6 Summary
In this lesson you discovered the Theano Python library for efficient numerical computation.
You learned:
1. Theano is a foundation library used for deep learning research and development.
2. Deep learning models can be developed directly in Theano if desired.
3. The development and evaluation of deep learning models is easier with wrapper libraries like Keras.

### Next
You now know about the Theano library for numerical computation in Python. In the next
lesson you will discover the TensorFlow library released by Google that attempts to offer the
same capabilities.

### 2.1 Introduction to TensorFlow
TensorFlow is a Python library for fast numerical computing created and released by Google.
It is a foundation library that can be used to create deep learning models directly or by using
wrapper libraries that simplify the process built on top of TensorFlow. After completing this
lesson you will know:
1. About the TensorFlow library for Python.
2. How to define, compile and evaluate a simple symbolic expression in TensorFlow.
3. Where to go to get more information on the Library.

Let’s get started.

### 2.2 What is TensorFlow?
TensorFlow is an open source library for fast numerical computing. It was created and is
maintained by Google and released under the Apache 2.0 open source license. The API is
nominally for the Python programming language, although there is access to the underlying
C++ API. Unlike other numerical libraries intended for use in Deep Learning like Theano,
TensorFlow was designed for use both in research and development and in production systems,
not least RankBrain in Google search (https://en.wikipedia.org/wiki/RankBrain) and the fun DeepDream project (https://en.wikipedia.org/wiki/DeepDream) . It an run on single
CPU systems, GPUs as well as mobile devices and large scale distributed systems of hundreds
of machines.

### 2.3 How to Install TensorFlow
Installation of TensorFlow is straightforward if you already have a Python SciPy environment.
TensorFlow works with Python 2.7 and Python 3.3+. With a working Python and SciPy environment, it is relatively straightforward to install TensorFlow using pip There are a number
of different distributions of TensorFlow, customized for different environments, therefore to
install TensorFlow you can follow the Download and Setup instructions (https://www.tensorflow.org/versions/r0.9/get_started/os_setup.html) on the TensorFlow
website.

### 2.4 Your First Examples in TensorFlow
Computation is described in terms of data flow and operations in the structure of a directed
graph.
1. Nodes: Nodes perform computation and have zero or more inputs and outputs. Data that
moves between nodes are known as tensors, which are multi-dimensional arrays of real
values.
2. Edges: The graph defines the flow of data, branching, looping and updates to state.
Special edges can be used to synchronize behavior within the graph, for example waiting
for computation on a number of inputs to complete.
3. Operation: An operation is a named abstract computation which can take input attributes
and produce output attributes. For example, you could define an add or multiply operation.

### 2.5 Simple TensorFlow Example
In this section we demonstrate a simple Python script that gives you a flavor of TensorFlow. In
this example we define two symbolic floating point variables a and b. We define an expression
that uses these variables (c = a + b). This is the same example used in the previous chapter that
introduced Theano. We then compile this symbolic expression into a function using TensorFlow
that we can use later. Finally, we use our complied expression by plugging in some real values
and performing the calculation using efficient compiled TensorFlow code under the covers.

In [6]:
# Example of TensorFlow library
import tensorflow as tf
# declare two symbolic floating-point scalars
a = tf.placeholder(tf.float32)
b = tf.placeholder(tf.float32)
# create a simple symbolic expression using the add function
add = tf.add(a, b)
# bind 1.5 to ✬ a ✬ , 2.5 to ✬ b ✬ , and evaluate ✬ c ✬
sess = tf.Session()
binding = {a: 1.5, b: 2.5}
c = sess.run(add, feed_dict=binding)
print(c)

4.0


### 2.6 More Deep Learning Models

Your TensorFlow installation comes with a number of Deep Learning models that you can use
and experiment with directly. Firstly, you need to find out where TensorFlow was installed on
your system.

In [10]:
import os 
import inspect
import tensorflow
print(os.path.dirname(inspect.getfile(tensorflow)))

/home/isaac/anaconda2/lib/python2.7/site-packages/tensorflow


Change to this directory and take note of the models/image/ .subdirectory. Included are a number
of deep learning models with tutorial-like comments, such as:
- Multi-threaded word2vec mini-batched skip-gram model.
- Multi-threaded word2vec unbatched skip-gram model.
- CNN for the CIFAR-10 network.
- Simple, end-to-end, LeNet-5-like convolutional MNIST model example.
- Sequence-to-sequence model with an attention mechanism.
Also check the examples directory as it contains an example using the MNIST dataset.
There is also an excellent list of tutorials on the main TensorFlow website 4 . They show how
to use different network types, different datasets and how to use the framework in various
different ways. Finally, there is the TensorFlow playground 5 where you can experiment with
small networks right in your web browser.

## 3.1 Introduction to Keras
Two of the top numerical platforms in Python that provide the basis for deep learning research
and development are Theano and TensorFlow. Both are very powerful libraries, but both can
be difficult to use directly for creating deep learning models. In this lesson you will discover
the Keras Python library that provides a clean and convenient way to create a range of deep
learning models on top of Theano or TensorFlow. After completing this lesson you will know:
- About the Keras Python library for deep learning.
- How to configure Keras for Theano or TensorFlow.
- The standard idiom for creating models with Keras.
Let’s get started.

### 3.2 What is Keras?
Keras is a minimalist Python library for deep learning that can run on top of Theano or
TensorFlow. It was developed to make developing deep learning models as fast and easy as
possible for research and development. It runs on Python 2.7 or 3.5 and can seamlessly execute
on GPUs and CPUs given the underlying frameworks. It is released under the permissive MIT
license. Keras was developed and maintained by Fran ̧cois Chollet, a Google engineer using four
guiding principles:
- Modularity: A model can be understood as a sequence or a graph alone. All the concerns of a deep learning model are discrete components that can be combined in arbitrary ways.
- Minimalism: The library provides just enough to achieve an outcome, no frills and maximizing readability.
- Extensibility: New components are intentionally easy to add and use within the framework, intended for developers to trial and explore new ideas.
- Python: No separate model files with custom file formats. Everything is native Python.

### 3.3 How to Install Keras
Keras is relatively straightforward to install if you already have a working Python and SciPy
environment. You must also have an installation of Theano or TensorFlow on your system.
Keras can be installed easily using pip, as follows:
###### sudo pip install keras

You can check your version of Keras on the command line using the following script:
###### python -c "import keras; print keras.__version__"

You can upgrade your installation of Keras using the same method:
###### sudo pip install --upgrade keras

### 3.4 Theano and TensorFlow Backends for Keras
Keras is a lightweight API and rather than providing an implementation of the required
mathematical operations needed for deep learning it provides a consistent interface to efficient
numerical libraries called backends. Assuming you have both Theano and TensorFlow installed,
you can configure the backend used by Keras. The easiest way is by adding or editing the Keras
configuration file in your home directory:
##### ~/.keras/keras.json

Which has the format:
```
{
"image_dim_ordering": "th",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "theano"
}
```

In this configuration file you can change the backend property from theano (the default) to
tensorflow. Keras will then use the configuration the next time it is run. You can confirm the
backend used by Keras using the following script on the command line


In [18]:
!python -c "from keras import backend; print backend._BACKEND"

Using TensorFlow backend.
tensorflow


You can also specify the backend to use by Keras on the command line by specifying the
KERAS_BACKEND environment variable, as follows:

In [20]:
!KERAS_BACKEND=theano python -c "from keras import backend; print backend._BACKEND"

Using Theano backend.
theano


In [21]:
!python -c "from keras import backend; print backend._BACKEND"

Using TensorFlow backend.
tensorflow


### 3.5 Build Deep Learning Models with Keras
The focus of Keras is the idea of a model. The main type of model is a sequence of layers called
a Sequential which is a linear stack of layers. You create a Sequential and add layers to it
in the order that you wish for the computation to be performed. Once defined, you compile
the model which makes use of the underlying framework to optimize the computation to be
performed by your model. In this you can specify the loss function and the optimizer to be used.

Once compiled, the model must be fit to data. This can be done one batch of data at a
time or by firing off the entire model training regime. This is where all the compute happens.
Once trained, you can use your model to make predictions on new data. We can summarize the
construction of deep learning models in Keras as follows:
1. Define your model. Create a Sequential model and add configured layers.
2. Compile your model. Specify loss function and optimizers and call the compile()
function on the model.
3. Fit your model. Train the model on a sample of data by calling the fit() function on
the model.
4. Make predictions. Use the model to generate predictions on new data by calling
functions such as evaluate() or predict() on the model.

### 3.6 Summary
In this lesson you discovered the Keras Python library for deep learning research and development.
You learned:
1. Keras wraps both the TensorFlow and Theano libraries, abstracting their capabilities and
hiding their complexity.
2. Keras is designed for minimalism and modularity allowing you to very quickly define deep
learning models.
3. Keras deep learning models can be developed using an idiom of defining, compiling and
fitting models that can then be evaluated or used to make predictions.

# Next
You are now up to speed with the Python libraries for deep learning and gives you the capability to install, configure and use the Python
deep learning libraries on your workstation. Next in
Part II you will learn how to use the Keras API and develop your own neural network models.

# 4 Multilayer Perceptrons

### 4.1 Crash Course In Multilayer Perceptrons
Artificial neural networks are a fascinating area of study, although they can be intimidating
when just getting started. There is a lot of specialized terminology used when describing the
data structures and algorithms used in the field. In this lesson you will get a crash course in the
terminology and processes used in the field of Multilayer Perceptron artificial neural networks.
After completing this lesson you will know:
1. The building blocks of neural networks including neurons, weights and activation functions.
2. How the building blocks are used in layers to create networks.
3. How networks are trained from example data.
Let’s get started.

#### 4.2 Crash Course Overview
We are going to cover a lot of ground in this lesson. Here is an idea of what is ahead:
1. Multilayer Perceptrons.
2. Neurons, Weights and Activations.
3. Networks of Neurons.
4. Training Networks.
We will start off with an overview of Multilayer Perceptrons.

#### 4.3 Multilayer Perceptrons
The field of artificial neural networks is often just called Neural Networks or Multilayer Percep-
trons after perhaps the most useful type of neural network. A Perceptron is a single neuron
model that was a precursor to larger neural networks. It is a field of study that investigates
how simple models of biological brains can be used to solve difficult computational tasks like
the predictive modeling tasks we see in machine learning. The goal is not to create realistic
models of the brain, but instead to develop robust algorithms and data structures that we can
use to model difficult problems.

The power of neural networks come from their ability to learn the representation in your
training data and how to best relate it to the output variable that you want to predict. In
this sense neural networks learn a mapping. Mathematically, they are capable of learning
any mapping function and have been proven to be a universal approximation algorithm. The
predictive capability of neural networks comes from the hierarchical or multilayered structure of
the networks. The data structure can pick out (learn to represent) features at different scales or
resolutions and combine them into higher-order features. For example from lines, to collections
of lines to shapes.

#### 4.4 Neurons
The building block for neural networks are artificial neurons. These are simple computational
units that have weighted input signals and produce an output signal using an activation function.

![alt text](neuron.png "Title")

### 4.4.1 Neuron Weights
You may be familiar with linear regression, in which case the weights on the inputs are very
much like the coefficients used in a regression equation. Like linear regression, each neuron also
has a bias which can be thought of as an input that always has the value 1.0 and it too must be
weighted. For example, a neuron may have two inputs in which case it requires three weights.
One for each input and one for the bias.

Weights are often initialized to small random values, such as values in the range 0 to 0.3,
although more complex initialization schemes can be used. Like linear regression, larger weights
indicate increased complexity and fragility of the model. It is desirable to keep weights in the
network small and regularization techniques can be used.

### 4.4.2 Activation
The weighted inputs are summed and passed through an activation function, sometimes called a
transfer function. An activation function is a simple mapping of summed weighted input to the
output of the neuron. It is called an activation function because it governs the threshold at
which the neuron is activated and the strength of the output signal. Historically simple step
activation functions were used where if the summed input was above a threshold, for example
0.5, then the neuron would output a value of 1.0, otherwise it would output a 0.0.

Traditionally nonlinear activation functions are used. This allows the network to combine
the inputs in more complex ways and in turn provide a richer capability in the functions they
can model. Nonlinear functions like the logistic function also called the sigmoid function were
used that output a value between 0 and 1 with an s-shaped distribution, and the hyperbolic
tangent function also called Tanh that outputs the same distribution over the range -1 to +1.
More recently the rectifier activation function has been shown to provide better results.



# 4.5 Networks of Neurons
Neurons are arranged into networks of neurons. A row of neurons is called a layer and one
network can have multiple layers. The architecture of the neurons in the network is often called
the network topology.

![alt text](network.png "Title")

### 4.5.1 Input or Visible Layers
The bottom layer that takes input from your dataset is called the visible layer, because it is
the exposed part of the network. Often a neural network is drawn with a visible layer with one
neuron per input value or column in your dataset. These are not neurons as described above,
but simply pass the input value though to the next layer.

### 4.5.2 Hidden Layers
Layers after the input layer are called hidden layers because they are not directly exposed to
the input. The simplest network structure is to have a single neuron in the hidden layer that
directly outputs the value. Given increases in computing power and efficient libraries, very deep
neural networks can be constructed. Deep learning can refer to having many hidden layers in
your neural network. They are deep because they would have been unimaginably slow to train
historically, but may take seconds or minutes to train using modern techniques and hardware.

### 4.5.3 Output Layer
The final hidden layer is called the output layer and it is responsible for outputting a value
or vector of values that correspond to the format required for the problem. The choice of
activation function in the output layer is strongly constrained by the type of problem that you
are modeling. For example:
1. A regression problem may have a single output neuron and the neuron may have no
activation function.
2. A binary classification problem may have a single output neuron and use a sigmoid
activation function to output a value between 0 and 1 to represent the probability of
predicting a value for the primary class. This can be turned into a crisp class value by
using a threshold of 0.5 and snap values less than the threshold to 0 otherwise to 1.
3. A multiclass classification problem may have multiple neurons in the output layer, one for
each class (e.g. three neurons for the three classes in the famous iris flowers classification
problem). In this case a softmax activation function may be used to output a probability
of the network predicting each of the class values. Selecting the output with the highest
probability can be used to produce a crisp class classification value.

# 4.6 Training Networks
Once configured, the neural network needs to be trained on your dataset.

### 4.6.1 Data Preparation
You must first prepare your data for training on a neural network. Data must be numerical, for
example real values. If you have categorical data, such as a sex attribute with the values male
and female, you can convert it to a real-valued representation called a one hot encoding. This
is where one new column is added for each class value (two columns in the case of sex of male
and female) and a 0 or 1 is added for each row depending on the class value for that row.
This same one hot encoding can be used on the output variable in classification problems
with more than one class. This would create a binary vector from a single column that would
be easy to directly compare to the output of the neuron in the network’s output layer, that as
described above, would output one value for each class. Neural networks require the input to be
scaled in a consistent way. You can rescale it to the range between 0 and 1 called normalization.
Another popular technique is to standardize it so that the distribution of each column has the
mean of zero and the standard deviation of 1. Scaling also applies to image pixel data. Data
such as words can be converted to integers, such as the frequency rank of the word in the dataset
and other encoding techniques.

### 4.6.2 Stochastic Gradient Descent
The classical and still preferred training algorithm for neural networks is called stochastic
gradient descent. This is where one row of data is exposed to the network at a time as input.
The network processes the input upward activating neurons as it goes to finally produce an
output value. This is called a forward pass on the network. It is the type of pass that is also
used after the network is trained in order to make predictions on new data.
The output of the network is compared to the expected output and an error is calculated.
This error is then propagated back through the network, one layer at a time, and the weights
are updated according to the amount that they contributed to the error. This clever bit of math
is called the Back Propagation algorithm. The process is repeated for all of the examples in
your training data. One round of updating the network for the entire training dataset is called
an epoch. A network may be trained for tens, hundreds or many thousands of epochs.

### 4.6.3 Weight Updates
The weights in the network can be updated from the errors calculated for each training example
and this is called online learning. It can result in fast but also chaotic changes to the network.

Alternatively, the errors can be saved up across all of the training examples and the network
can be updated at the end. This is called batch learning and is often more stable.
Because datasets are so large and because of computational efficiencies, the size of the
batch, the number of examples the network is shown before an update is often reduced to a
small number, such as tens or hundreds of examples. The amount that weights are updated is
controlled by a configuration parameter called the learning rate. It is also called the step size
and controls the step or change made to network weights for a given error. Often small learning
rates are used such as 0.1 or 0.01 or smaller. The update equation can be complemented with
additional configuration terms that you can set.
1. Momentum is a term that incorporates the properties from the previous weight update
to allow the weights to continue to change in the same direction even when there is less
error being calculated.
2. Learning Rate Decay is used to decrease the learning rate over epochs to allow the
network to make large changes to the weights at the beginning and smaller fine tuning
changes later in the training schedule.

### 4.6.4 Prediction
Once a neural network has been trained it can be used to make predictions. You can make
predictions on test or validation data in order to estimate the skill of the model on unseen data.
You can also deploy it operationally and use it to make predictions continuously. The network
topology and the final set of weights is all that you need to save from the model. Predictions
are made by providing the input to the network and performing a forward-pass allowing it to
generate an output that you can use as a prediction.

### Summary
In this lesson you discovered artificial neural networks for machine learning. You learned:
1. How neural networks are not models of the brain but are instead computational models
for solving complex machine learning problems.
2. That neural networks are comprised of neurons that have weights and activation functions.
3. The networks are organized into layers of neurons and are trained using stochastic gradient
descent.
4. That it is a good idea to prepare your data before training a neural network model.

### Next
You now know the basics of neural network models. In the next section you will develop your
very first Multilayer Perceptron model in Keras.