Merge pull request #111 from bartvm/docs

Installation instructions and quick-start tutorial
mila-iqia · Jan 18, 2015 · 2ae9fb8 · 2ae9fb8
2 parents 9656c02 + fa35111
commit 2ae9fb8
Show file tree

Hide file tree

Showing 13 changed files with 235 additions and 82 deletions.
diff --git a/README.rst b/README.rst
@@ -6,19 +6,25 @@
 
 .. image:: https://readthedocs.org/projects/blocks/badge/?version=latest
    :target: https://blocks.readthedocs.org/
-   :alt: Documentation Status
 
 Blocks
 ======
+Blocks is a framework that helps you build neural network models on top of
+Theano. Currently it supports and provides:
 
-Bricks and blocks are Theano functions with parameters. Furthermore, the
-plan is to support:
+* Constructing parametrized Theano operations, called "bricks"
+* Pattern matching to select variables and bricks in large models
+* A pipeline for loading and iterating over training data
+* Algorithms to optimize your model
+* Automatic creation of monitoring channels (*limited support*)
+* Application of graph transformations, such as dropout (*limited support*)
 
-* Lazy initialization
+In the feature we also hope to support:
+
+* Saving and resuming of training
+* Monitoring and analyzing values during training progress (on the training set
+  as well as on test sets)
 * Dimension, type and axes-checking
-* Automatic creation of monitoring channels
-* Easy pattern matching to select the bricks you want in large graphs
-* Application of graph transformations, such as dropout
 
 Please see the documentation_ for more information.
 

diff --git a/blocks/bricks/__init__.py b/blocks/bricks/__init__.py
@@ -1059,6 +1059,8 @@ def apply(self, input_):
 Tanh = _activation_factory('Tanh', tensor.tanh)
 Sigmoid = _activation_factory('Sigmoid', tensor.nnet.sigmoid)
 Softmax = _activation_factory('Softmax', tensor.nnet.softmax)
+Rectifier = _activation_factory('Rectifier',
+                                lambda x: tensor.switch(x > 0, x, 0))
 
 
 class Sequence(Brick):

diff --git a/blocks/bricks/sequence_generators.py b/blocks/bricks/sequence_generators.py
@@ -64,7 +64,7 @@ class BaseSequenceGenerator(Initializable):
 
     | A scheme of the algorithm described above follows.
 
-    .. image:: sequence_generator_scheme.png
+    .. image:: _static/sequence_generator_scheme.png
             :height: 500px
             :width: 500px
 

diff --git a/docs/_static/.gitignore b/docs/_static/.gitignore
diff --git a/docs/_static/mnist.png b/docs/_static/mnist.png
diff --git a/docs/sequence_generator_scheme.png → docs/_static/sequence_generator_scheme.png b/docs/sequence_generator_scheme.png → docs/_static/sequence_generator_scheme.png
diff --git a/docs/blocks.rst → docs/bricks.rst b/docs/blocks.rst → docs/bricks.rst
@@ -1,5 +1,3 @@
-.. _bricks:
-
 Bricks
 ======
 

diff --git a/docs/getting_started.rst → docs/bricks_overview.rst b/docs/getting_started.rst → docs/bricks_overview.rst
@@ -1,5 +1,5 @@
-Getting started
-===============
+Bricks
+======
 
 Blocks is a framework that is supposed to make it easier to build complicated
 neural network models on top of Theano_. In order to do so, we introduce the
@@ -104,7 +104,7 @@ explicitly. Consider the following example:
     >>> linear3.params
     [W, b]
 
-Nested blocks
+Nested bricks
 -------------
 
 Many neural network models, especially more complex ones, can be considered
@@ -174,11 +174,3 @@ bricks' configuration.
 
 .. _machine translation models: http://arxiv.org/abs/1409.0473
 .. _here: :class:`blocks.bricks.Brick`
-
-Examples
---------
-
-You can find examples using the Groundhog main loop in the folder
-``blocks/groundhog/examples``.  Case studies of language modeling, Markov
-chains and sinewave generation are available. They are planned to be replaced
-by PyLearn2 based examples in the near future.
diff --git a/docs/index.rst b/docs/index.rst
@@ -6,41 +6,52 @@
 
 .. image:: https://readthedocs.org/projects/blocks/badge/?version=latest
    :target: https://blocks.readthedocs.org/
-   :alt: Documentation Status
 
 |
 
 Welcome to Blocks's documentation!
 ==================================
-
 Blocks is a framework that helps you build neural network models on top of
-Theano. It also helps you manage your model by doing error-checking, creating
-monitoring channels, and allowing for easy configuration of your model. Features
-include:
+Theano. Currently it supports and provides:
 
-* Dimension, type and axes-checking
-* Automatic creation of monitoring channels
-* Easy pattern matching to select the bricks you want in large graphs
-* Lazy initialization of bricks
-* Application of graph transformations, such as dropout
+* Constructing parametrized Theano operations, called "bricks"
+* Pattern matching to select variables and bricks in large models
+* A pipeline for loading and iterating over training data
+* Algorithms to optimize your model
+* Automatic creation of monitoring channels (*limited support*)
+* Application of graph transformations, such as dropout (*limited support*)
 
-Table of contents
------------------
+In the future we also hope to support:
 
+* Saving and resuming of training
+* Monitoring and analyzing values during training progress (on the training set
+  as well as on test sets)
+* Dimension, type and axes-checking
+
+Getting started
+---------------
 .. toctree::
+   setup
+   quickstart
 
-   getting_started
+In-depth
+--------
+.. toctree::
+   bricks_overview
    configuration
-   blocks
+   developer_guidelines
+
+API Reference
+-------------
+.. toctree::
+   bricks
    initialization
    datasets
    utils
    serialization
    graph
-   developer_guidelines
 
 Indices and tables
 ==================
-
 * :ref:`genindex`
 * :ref:`modindex`
diff --git a/docs/quickstart.rst b/docs/quickstart.rst
@@ -0,0 +1,164 @@
+Quickstart
+==========
+
+In this quick-start tutorial we will use the Blocks framework to train a
+`multilayer perceptron`_ (MLP) to perform handwriting recognition on the `MNIST
+handwritten digit database`_.
+
+The Task
+--------
+MNIST is a dataset which consists of 70,000 handwritten digits. Each digit is a
+grayscale image of 28 by 28 pixels. Our task is to classify each of the images
+into one of the 10 categories representing the numbers from 0 to 9.
+
+.. figure:: _static/mnist.png
+   :align: center
+
+   Sample MNIST digits
+
+The Model
+---------
+We will train a simple MLP with a single hidden layer that uses the rectifier_
+activation function. Our output layer will consist of a softmax_ function with
+10 units; one for each class. Mathematically speaking, our model is parametrized
+by the weight matrices :math:`\mathbf{W}_h` and :math:`\mathbf{W}_y`, and bias
+vectors :math:`\mathbf{b}_h` and :math:`\mathbf{b}_y`. The rectifier activation
+function is defined as
+
+.. math:: \mathrm{ReLU}(\mathbf{x})_i = \max(0, \mathbf{x}_i)
+
+and our softmax output function is defined
+
+.. math:: \mathrm{softmax}(\mathbf{x})_i = \frac{e^{\mathbf{x}_i}}{\sum_{j=1}^n \mathbf{x}_j}
+
+Hence, our complete model is
+
+.. math:: f(\mathbf{x}) = \mathrm{softmax}(\mathbf{W}_y\mathrm{ReLU}(\mathbf{W}_h\mathbf{x} + \mathbf{b}_h) + \mathbf{b}_y)
+
+Since the output of a softmax represents a categorical probability distribution
+we can consider :math:`f(\mathbf{x}) = \hat p(\mathbf{y} \mid \mathbf{x})`,
+where :math:`\mathbf{x}` is the 784-dimensional (28 × 28) input, and
+:math:`\mathbf{y}` the probability distribution of it belonging to classes
+:math:`i = 0,\dots,9`. We can train the parameters of our model by minimizing
+the negative log-likelihood i.e.  the categorical cross-entropy between our
+model's output and the target distribution. That is, we minimize the sum of
+
+.. math:: - \log \sum_{i=0}^{10} p(\mathbf{y} = i) \hat p(\mathbf{y} = i \mid \mathbf{x})
+
+over all examples. We do so by using `stochastic gradient descent`_ (SGD) on
+mini-batches.
+
+Building the model
+------------------
+Constructing the model with Blocks is very simple. We start by defining the
+input variable using Theano.
+
+.. tip::
+   Want to follow along with the Python code? If you are using IPython, enable
+   the `doctest mode`_ using the special ``%doctest_mode`` command so that you
+   can copy-paste the examples below (including the ``>>>`` prompts) straight
+   into the IPython interpreter.
+
+>>> from theano import tensor
+>>> x = tensor.matrix('features')
+
+Note that we picked the name ``'features'`` for our input. This is important,
+because the name needs to match the name of the data source we want to train on.
+MNIST defines two data sources: ``'features'`` and ``'targets'``.
+
+For the sake of this tutorial, we will go through building an MLP the long way.
+For a much quicker way, skip right to the end of this section. We begin with
+applying the linear transformations and activations.
+
+>>> from blocks.bricks import Linear, Rectifier, Softmax
+>>> input_to_hidden = Linear(name='input_to_hidden', input_dim=784, output_dim=100)
+>>> h = Rectifier().apply(input_to_hidden.apply(x))
+>>> hidden_to_output = Linear(name='hidden_to_output', input_dim=100, output_dim=10)
+>>> y_hat = Softmax().apply(hidden_to_output.apply(h))
+
+Blocks' uses "bricks" to build models. Bricks are parametrized Theano ops. What
+this means is that we start by initializing them with certain parameters e.g.
+``input_dim``. After initialization we can apply our bricks on Theano variables
+to build the model we want.
+
+Now that we have built our model, let's define the cost to minimize. For this,
+we will need the Theano variable representing the target labels.
+
+>>> y = tensor.lmatrix('targets')
+>>> from blocks.bricks.cost import CategoricalCrossEntropy
+>>> cost = CategoricalCrossEntropy().apply(y.flatten(), y_hat)
+
+That's it! But creating a simple MLP this way is rather cumbersome. In practice,
+we would have simply used the :class:`~blocks.bricks.MLP` class.
+
+>>> from blocks.bricks import MLP
+>>> mlp = MLP(activations=[Rectifier(), Softmax()], dims=[784, 100, 10]).apply(x)
+
+Training your model
+-------------------
+Besides helping you build models, Blocks also provides the main other features
+needed to train a model. It has a set of training algorithms (like SGD), an
+interface to datasets, and a training loop that allows you to monitoring and
+control the training process.
+
+We want to train our model on the training set of MNIST.
+
+>>> from blocks.datasets.mnist import MNIST
+>>> mnist = MNIST("train")
+
+Datasets only provide an interface to the data. For actual training, we will
+need to iterate over the data in minibatches. This is done by initiating a data
+stream which makes use of a particular iteration scheme. We will use an
+iteration scheme that iterates over our MNIST examples sequentially in batches
+of size 256.
+
+>>> from blocks.datasets import DataStream
+>>> from blocks.datasets.schemes import SequentialScheme
+>>> data_stream = DataStream(mnist, iteration_scheme=SequentialScheme(
+...     num_examples=mnist.num_examples, batch_size=256))
+
+As our algorithm we will use straightforward SGD with a fixed learning rate.
+
+>>> from blocks.algorithms import GradientDescent, SteepestDescent
+>>> algorithm = GradientDescent(cost=cost, step_rule=SteepestDescent(learning_rate=0.1))
+
+That's all we need! We can use the :class:`~blocks.main_loop.MainLoop` to
+combine all the different pieces. Let's train our model for a single epoch and
+print the progress to see how it works.
+
+>>> from blocks.main_loop import MainLoop
+>>> from blocks.extensions import FinishAfter, Printing
+>>> main_loop = MainLoop(model=mlp, data_stream=data_stream, algorithm=algorithm,
+...                      extensions=[FinishAfter(after_n_epochs=1), Printing()])
+>>> main_loop.run() # doctest: +SKIP
+-------------------------------------------------------------------------------
+BEFORE FIRST EPOCH
+-------------------------------------------------------------------------------
+Training status:
+     iterations_done: 0
+     epochs_done: 0
+Log records from the iteration 0:
+-------------------------------------------------------------------------------
+AFTER ANOTHER EPOCH
+-------------------------------------------------------------------------------
+Training status:
+     iterations_done: 235
+     epochs_done: 1
+Log records from the iteration 235:
+     training_finish_requested: True
+-------------------------------------------------------------------------------
+TRAINING HAS BEEN FINISHED:
+-------------------------------------------------------------------------------
+Training status:
+     iterations_done: 235
+     epochs_done: 1
+Log records from the iteration 235:
+     training_finish_requested: True
+     training_finished: True
+
+.. _multilayer perceptron: https://en.wikipedia.org/wiki/Multilayer_perceptron
+.. _MNIST handwritten digit database: http://yann.lecun.com/exdb/mnist/
+.. _rectifier: https://en.wikipedia.org/wiki/Rectifier_%28neural_networks%29
+.. _softmax: https://en.wikipedia.org/wiki/Softmax
+.. _stochastic gradient descent: https://en.wikipedia.org/wiki/Stochastic_gradient_descent
+.. _doctest mode: http://ipython.org/ipython-doc/dev/interactive/tips.html#run-doctests
diff --git a/docs/setup.rst b/docs/setup.rst
@@ -0,0 +1,25 @@
+Installation
+============
+The easiest way to install Blocks using the Python package manager ``pip``.
+Blocks isn't listed yet on the Python Package Index (PyPI), so you will have to
+grab it directly from GitHub.
+
+.. code-block:: bash
+
+   pip install --upgrade --no-deps git+git://github.com/bartvm/blocks.git --user
+
+If you have administrative rights, remove ``--user`` to install the package
+system-wide. The ``--no-deps`` flag is there to make sure that ``pip`` doesn't
+try to update NumPy and Scipy, possibly overwriting the optimised version on
+your system with a newer but slower version.
+
+If you want to update Blocks, simply repeat the command above to pull the latest
+version from GitHub.
+
+Requirements
+------------
+Blocks' only requirements are Theano and six. We develop using the bleeding-edge
+version of Theano, so be sure to follow the `relevant installation
+instructions`_ to make sure that your Theano version is up to date.
+
+.. _relevant installation instructions: http://deeplearning.net/software/theano/install.html#bleeding-edge-install-instructions
diff --git a/examples/mnist.py b/examples/mnist.py