Merge pull request #29 from ragavvenkatesan/dev

Batch Norm Tutorial added.
ragavvenkatesan · Feb 27, 2017 · 5039773 · 5039773
2 parents 7199de3 + d0085bd
commit 5039773
Show file tree

Hide file tree

Showing 4 changed files with 57 additions and 10 deletions.
diff --git a/docs/source/pantry/tutorials/batch_norm.rst b/docs/source/pantry/tutorials/batch_norm.rst
@@ -0,0 +1,48 @@
+.. _batch_norm:
+
+Batch Normalization.
+====================
+
+Batch normalization has become one important operation in faster and stable learning of neural
+networks. In batch norm we do the following:
+
+.. math::
+
+    x = (\frac{x - \mu_b}{\sigma_b})\gamma + \beta
+
+:math: `x` is the input (and the output) of this operation, :math:`\mu_b` and :math:`\sigma_b`
+are the mean and the variance of the minibatch of :math:`x` supplied. :math:`\gamma` and 
+:math:`beta` are learnt using back propagation. This will also store a running mean and a running 
+variance, which is used during inference time. 
+
+By default batch normalization can be performed on convolution and dot product layers using 
+the argument ``batch_norm = True`` supplied to the :mod:`yann.network.add_layer` method. This 
+will apply the batch normalization before the activation and after the core layer operation. 
+
+While this is the technique that was described in the original batch normalization paper[1]. Some 
+modern networks such as the Residual network [2],[3] use a re-orderd version of layer operations 
+that require the batch norm to be applied post-activation. This is particularly used when using 
+ReLU or Maxout networks[4][5]. Therefore we also provide a layer type ``batch_norm``, that could
+create a layer that simply does batch normalization on the input supplied. These layers could be 
+used to create a post-activation batch normalization. 
+
+This tutorial demonstrates the use of both these techniques using the same architecutre of networks
+used in the :ref:`lenet` tutorial. The codes for these can be found in the following module methods
+in :mod:`pantry.tutorials`.
+
+.. rubric:: References
+
+.. [#]   Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network 
+         training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).
+.. [#]   He, Kaiming, et al. "Identity mappings in deep residual networks." European Conference on 
+         Computer Vision. Springer International Publishing, 2016.
+.. [#]   He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the 
+         IEEE Conference on Computer Vision and Pattern Recognition. 2016.
+.. [#]   Nair, Vinod, and Geoffrey E. Hinton. "Rectified linear units improve restricted boltzmann 
+         machines." Proceedings of the 27th International Conference on Machine Learning (ICML-10). 
+         2010.
+.. [#]   Goodfellow, Ian J., et al. “Maxout networks.” arXiv preprint arXiv:1302.4389 (2013).
+         
+.. automodule:: pantry.tutorials.lenet
+   :members:
+
diff --git a/docs/source/trailer.rst b/docs/source/trailer.rst
@@ -17,10 +17,10 @@ After considerable effort being put in to make this toolbox modular, and testing
 using the toolbox on some of my own research. This toolbox, which began as a pet project is 
 something that I am now proud to share with the rest of DL community with.
 
-This toolbox is also used with the course `CSE 591: Introduction to Deep Learning for 
-Computer Vision`_ at ASU in Spring of 2017. With more features being added into the toolbox, I figured
-I would clean it up, formalize it and write some good documentation so that interested people could 
-use it after. Thus after being rechristened as Yann, this toolbox was born.
+This toolbox is also used with the course `CSE 591`_ at ASU in Spring of 2017. With more features 
+being added into the toolbox, I figured I would clean it up, formalize it and write some good 
+documentation so that interested people could use it after. Thus after being rechristened as Yann, 
+this toolbox was born.
 
 .. warning ::
     
@@ -139,6 +139,4 @@ Those marked * are not fully tested yet.
 .. _skdata's: https://jaberg.github.io/skdata/
 .. _Fuel: https://github.com/mila-udem/fuel
 .. _Sebastien Bubeck's: https://blogs.princeton.edu/imabandit/2013/04/01/acceleratedgradientdescent/
-.. _CSE 591: Introduction to Deep Learning for Computer Vision: http://www.ragav.net/cse591
-
-
+.. _CSE 591: http://www.ragav.net/cse591
diff --git a/docs/source/tutorial.rst b/docs/source/tutorial.rst
@@ -19,6 +19,7 @@ tutorial just in case though.
    pantry/tutorials/autoencoder
    pantry/tutorials/lenet
    pantry/tutorials/gan
+   pantry/tutorials/batch_norm
    pantry/tutorials/mat2yann   
 
 .. Todo::

diff --git a/pantry/tutorials/lenet.py b/pantry/tutorials/lenet.py
@@ -469,9 +469,9 @@ def lenet_maxout_batchnorm_after_activation ( dataset= None, verbose = 1 ):
     if dataset is None:
         print " creating a new dataset to run through"
         from yann.special.datasets import cook_cifar10 
-        from yann.special.datasets import cook_mnist 
-        # data = cook_cifar10 (verbose = 2)
-        data = cook_mnist()
+        # from yann.special.datasets import cook_mnist 
+        data = cook_cifar10 (verbose = 2)
+        # data = cook_mnist()
         dataset = data.dataset_location()
 
     lenet5 ( dataset, verbose = 2 )