# Logistic Regression

## Logistic Regression:
Logistic Regression is a linear classifier. Logistic regression is usually one of the first and easiest-to-learn machiens in deep learning studies. Despite its simplicity and lack of heirarchy, logistic regressors are powerful machines as most real world data are linear (or often piece wise linear at best). This notebook is a tutorial for implementing Logistic Regression for the MNIST dataset of digit recognition using the yann toolbox. This will briefly go over some theory of Logistic Regression but for an in-depth study, refer the [book](http://www.convolution.network) or [course  materials](http://www.course.convolution.network).

Logistic regression is typically modelled  for a dataset $ D = \{ x_i,y_i  \vert  x_i \in \mathbb{R}^d, y_i \in [1, 2 ,3 \dots c] \} $ as,

$$ \hat{y} = \phi\left(w_0 + w_1x_1 + w_2x_2 + ... + w_nx_n \right), $$

or, 

$$ \hat{y} = \phi\left(\sum_{i=0}^n \bf{w}^T \bf{x}\right), $$

where,

$$ \phi(\tau) = \frac{1}{1+{e}^{-\tau}} $$

If $\phi(\tau)>0.5$ then this sample $x$ is classified as positive else it is classified as negative.

In the above equations only unknown we need to calculate is $W$, the parameter vector which also contains the bias $b$. $W$ can be calculated by using Gradient Descent optimization technique, where we start with a random values for $W$ and start changing it using the gradient of the negative log likelihood error function. To learn more about Gradient descent and other optimization techniques for logitic regression you can check the book or lecture materials optimization techniques.   

In this notebook, we assume you finished yann setup before starting this tutorial. If you haven't done it already, you can follow [Installation Guide](http://yann.readthedocs.io/en/master/setup.html) for yann setup. To install in a quick fashion without much dependencies run the follwing command:
<pre><code>pip install git+git://github.com/ragavvenkatesan/yann.git</pre></code>
If there was an error with installing **skdata**, you might want to install **numpy** and **scipy** independently first and then run the above command. Note that this installer, does not enable a lot of options of the toolbox for which you need to go through the complete install described at the Installation Guide page.

The easiest way to get going with Yann is to follow this quick start guide. If you are not satisfied and want a more detailed introduction to the toolbox, you may refer to the [Tutorials](http://yann.readthedocs.io/en/master/tutorial.html#tutorial) and the [Structure of the Yann network](http://yann.readthedocs.io/en/master/organization.html#organization). This tutorial was also presented in CSE591 at ASU and the video of the presentation is available.


In [1]:
from IPython.display import YouTubeVideo
YouTubeVideo("0NFvfg8CItQ",theme="light", color="red")

Verify that the installation of theano is indeed version 0.9 or greater by doing the following in a python shell

In [None]:
import theano
theano.__version__

If the version was not 0.9, you can install 0.9 by doing the following:
<pre><code>pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git
</code></pre>


In this tutorial, we will go learn both the toolbox and how to implement a logistic regression simultaneously. Hopefully, this will give a nice introduction to the various features and API commands of the toolbox making further tutorials easier. 

The start and the end of Yann toolbox is the ***`network`*** module. The ***`yann.network.network`*** object is where all the magic happens. Everything is manipualted through the netowrk object. Run the the following code to import ***`network`*** module and create a ***`network`*** object.

In [2]:
from yann.network import network
net = network()

 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29



. Initializing the network


Using gpu device 0: GeForce GT 750M (CNMeM is disabled, cuDNN 5103)


*Voila!* We have thus created a new network. The network doesn’t have any layers or modules in it. This be seen verified by probing into ***`net.layers`*** property of the ***`net`*** object. ***`net.layers`*** is a dictionary in which each key is an id of a layer and each value is a ***`yann.layers`*** object. 

In [3]:
net.layers

{}

This produces an output which is essentially an empty dictionary {} because we did not add any layers to the network. Let’s add some layers! We can begin with an ***`input`*** layer, which is where any neural network begins. 

Before we do that, we need some data to train the network. The toolbox comes with a port to [skdata](https://github.com/jaberg/skdata) through which we can get the [MNIST dataset](http://yann.lecun.com/exdb/mnist/) of handwritten characters can be built using this port. 

To cook a mnist dataset for yann run the following code:

In [4]:
from yann.special.datasets import cook_mnist
data = cook_mnist()

. Setting up dataset 
.. setting up skdata
... Importing mnist from skdata
.. setting up dataset
.. training data
.. validation data 
.. testing data 
. Dataset 25348 is created.
. Time taken is 1.583292 seconds


Running this code will print a statement to the following effect ***``>>Dataset xxxxx`` is created***. The five digits marked ***``xxxxx``*** in the statement is the codeword for the dataset. The actual dataset is located now at ***``_datasets/_dataset_xxxxx/``*** from the directory from where this code was called. Mnist dataset is imported, coverted to a format consumable by yann and stored at this location. Refer to the [Tutorials](http://yann.readthedocs.io/en/master/tutorial.html#tutorial) on how to convert your own dataset for yann. You can check the location of the dataset using ***`data.datastet_location()`*** function.

In [None]:
data.dataset_location()

So what is in this dataset that is created? Every dataset contains three sub directories: train, test and valid. Each of these in turn will contain .pkl files. The files are just dumps of data with two variabels: ***`x`*** containing data and ***`y`*** containing the labels. Each file corresponds to a batch of data which may still be broken down into many minibatches while training. MNIST dataset cooked using the default cook method will only contain one in each directory. The dataset is created to have a minibatch size of 500. There are 100 train minibatches in one batch and 20 test and valid minibatches.

The first layer that we need to add to our network now is an input layer. Every ***``input``*** layer requries a dataset to be associated with it. Let us create this layer with the MNIST we just created.

In [5]:
dataset_params  = { "dataset": data.dataset_location(), "n_classes" : 10 }
net.add_layer(type = "input", dataset_init_args = dataset_params)

.. Adding input layer 0


This piece of code creates and adds a new ***`datastream`*** module to the ***`net`***. Modules are similar to layers in yann. Modules support the network. This command also automatically wires up the newly added ***`input`*** layer with this (the last created) ***`datastream`***. Confirm this by checking ***`net.datastream`***. 


In [None]:
net.datastream

***`net.datastream`*** as can be seen is also a dictionary simliar to ***`net.layers`***.

Let us now build a ***`classifier`*** layer. The default classifier that yann is setup with is the logistic regression classifier. Refer to [Toolbox Documentation](http://yann.readthedocs.io/en/master/yann/index.html#yann) or [Tutorials](http://yann.readthedocs.io/en/master/tutorial.html#tutorial) for other types of layers. Let us create a this ***`classifier`*** layer for now.

In [6]:
net.add_layer(type = "classifier" , num_classes = 10)
net.add_layer(type = "objective")

.. Adding classifier layer 1
.. Adding flatten layer 2
.. Adding objective layer 3


The layer ***`objective`*** creates the loss function from the classifier that can be used as a learning metric. It also provides a scope for other modules such as the optimizer module. Refer [Structure of the Yann network](http://yann.readthedocs.io/en/master/organization.html#organization) and [Toolbox Documentation](http://yann.readthedocs.io/en/master/yann/index.html#yann) for more details on modules. 

By default we add a negative log likelihood loss that we want to minimize. Now that our network is created and constructed we can check the layers in our network with ***`net.layers`***.

In [None]:
net.layers

The keys of the dictionary such as ***`'1'`***, ***`'0'`*** and ***`'2'`*** are the ***`id`*** of the layer. We could have created a layer with a custom id by supplying an id argument to the ***`add_layer`*** method. To get a better idea of how the network looks like, you can use the ***`pretty_print`*** mehtod in yann.

In [None]:
net.pretty_print()

***`net.pretty_print`*** typically prints all the details of the network and its layers. Some of the properties can be accessed individuallty for every layer. For instance, we can acquire a particular layer's properties as follows:

In [None]:
print net.layers['1'].output_shape
print net.layers['1'].activation
print net.layers['1'].active
print net.layers['1'].destination
print net.layers['1'].origin
# more available options can be found using the following:
dir( net.layers['1'] )

Most of these probes should be obvious. Here are some interesting ones. The ***`origin`*** and ***`destination`*** options provides list of layer ids that are feeding in and feeding out of this layer.

Now our network is finally ready to be trained. Before training, we need to build an ***`optimizer`*** and other tools, but for now let us use the default ones. Once all of this is done, yann requires that the network be *`cooked`*. For more details on cooking refer [Structure of the Yann network](http://yann.readthedocs.io/en/master/organization.html#organization). For now let us imagine that cooking a network will finalize the wiring, architecture, cache and prepare the first batch of data, prepare the modules and in general prepare the network for training using back propagation.

In [7]:
net.cook()

.. Cooking the network
.. Setting up the visualizer
.. Setting up the resultor
.. Setting up the optimizer
.. All checks complete, cooking continues


Cooking would take a few seconds and might print what it is doing along the way. Once cooked, we may notice for instance that the network has a ***`optimizer`*** module.

In [8]:
net.optimizer

{'main': <yann.modules.optimizer.optimizer at 0x11a50a250>}

To train the model that we have just cooked, we can use the ***`train`*** function that becomes available to us once the network is cooked.

In [9]:
net.train()

| training  100% Time: 0:00:00                                                 


. Training
. 

.. Epoch: 0 Era: 0
.. Validation accuracy : 78.15
.. Best validation accuracy
.. Cost                : 1.89473

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.00999999977648
... Momentum            : None
. 

.. Epoch: 1 Era: 0
.. Validation accuracy : 82.05
.. Best validation accuracy
.. Cost                : 1.37419

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.00949999969453
... Momentum            : None
. 

.. Epoch: 2 Era: 0
.. Validation accuracy : 83.92
.. Best validation accuracy
.. Cost                : 1.11866

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.00902500003576
... Momentum            : None
. 

.. Epoch: 3 Era: 0
.. Validation accuracy : 84.7
.. Best validation accuracy
.. Cost                : 0.973776

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.00857375003397
... Momentum            : None
. 

.. Epoch: 4 Era: 0
.. Validation accuracy : 85.41
.. Best validation accuracy
.. Cost                : 0.881237

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.00814506225288
... Momentum            : None
. 

.. Epoch: 5 Era: 0
.. Validation accuracy : 85.96
.. Best validation accuracy
.. Cost                : 0.816992

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.0077378093265
... Momentum            : None
. 

.. Epoch: 6 Era: 0
.. Validation accuracy : 86.34
.. Best validation accuracy
.. Cost                : 0.769699

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.00735091883689
... Momentum            : None
. 

.. Epoch: 7 Era: 0
.. Validation accuracy : 86.6
.. Best validation accuracy
.. Cost                : 0.733368

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.00698337284848
... Momentum            : None
. 

.. Epoch: 8 Era: 0
.. Validation accuracy : 86.85
.. Best validation accuracy
.. Cost                : 0.704547

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.00663420418277
... Momentum            : None
. 

.. Epoch: 9 Era: 0
.. Validation accuracy : 87.13
.. Best validation accuracy
.. Cost                : 0.681105

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.00630249409005
... Momentum            : None
. 

.. Epoch: 10 Era: 0
.. Validation accuracy : 87.31
.. Best validation accuracy
.. Cost                : 0.661658

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.0059873694554
... Momentum            : None
. 

.. Epoch: 11 Era: 0
.. Validation accuracy : 87.49
.. Best validation accuracy
.. Cost                : 0.645261

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.00568800093606
... Momentum            : None
. 

.. Epoch: 12 Era: 0
.. Validation accuracy : 87.64
.. Best validation accuracy
.. Cost                : 0.631251

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.00540360109881
... Momentum            : None
. 

.. Epoch: 13 Era: 0
.. Validation accuracy : 87.74
.. Best validation accuracy
.. Cost                : 0.619145

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.00513342116028
... Momentum            : None
. 

.. Epoch: 14 Era: 0
.. Validation accuracy : 87.76
.. Best validation accuracy
.. Cost                : 0.608586

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.0048767500557
... Momentum            : None
. 

.. Epoch: 15 Era: 0
.. Validation accuracy : 87.82
.. Best validation accuracy
.. Cost                : 0.599299

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.0046329125762
... Momentum            : None
. 

.. Epoch: 16 Era: 0
.. Validation accuracy : 87.95
.. Best validation accuracy
.. Cost                : 0.591074

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.00440126704052
... Momentum            : None
. 

.. Epoch: 17 Era: 0
.. Validation accuracy : 87.97
.. Best validation accuracy
.. Cost                : 0.583745

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.00418120389804
... Momentum            : None
. 

.. Epoch: 18 Era: 0
.. Validation accuracy : 88.01
.. Best validation accuracy
.. Cost                : 0.577177

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.00397214386612
... Momentum            : None
. 

.. Epoch: 19 Era: 0
.. Validation accuracy : 88.07
.. Best validation accuracy
.. Cost                : 0.571264

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.00377353676595
... Momentum            : None
. 

.. Epoch: 20 Era: 1
.. Validation accuracy : 88.06
.. Cost                : 0.567661

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.0010000000475
... Momentum            : None
. 

.. Epoch: 21 Era: 1
.. Validation accuracy : 88.08
.. Best validation accuracy
.. Cost                : 0.566264

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.000950000016019
... Momentum            : None
. 

.. Epoch: 22 Era: 1
.. Validation accuracy : 88.09
.. Best validation accuracy
.. Cost                : 0.564952

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.000902500003576
... Momentum            : None
. 

.. Epoch: 23 Era: 1
.. Validation accuracy : 88.11
.. Best validation accuracy
.. Cost                : 0.563721

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.000857374980114
... Momentum            : None
. 

.. Epoch: 24 Era: 1
.. Validation accuracy : 88.12
.. Best validation accuracy
.. Cost                : 0.562563

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.000814506202005
... Momentum            : None
. 

.. Epoch: 25 Era: 1
.. Validation accuracy : 88.11
.. Cost                : 0.561475

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.000773780862801
... Momentum            : None
. 

.. Epoch: 26 Era: 1
.. Validation accuracy : 88.13
.. Best validation accuracy
.. Cost                : 0.560451

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.000735091802198
... Momentum            : None
. 

.. Epoch: 27 Era: 1
.. Validation accuracy : 88.15
.. Best validation accuracy
.. Cost                : 0.559487

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.000698337214999
... Momentum            : None
. 

.. Epoch: 28 Era: 1
.. Validation accuracy : 88.16
.. Best validation accuracy
.. Cost                : 0.55858

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.00066342036007
... Momentum            : None
. 

.. Epoch: 29 Era: 1
.. Validation accuracy : 88.17
.. Best validation accuracy
.. Cost                : 0.557724

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.000630249327514
... Momentum            : None
. 

.. Epoch: 30 Era: 1
.. Validation accuracy : 88.18
.. Best validation accuracy
.. Cost                : 0.556918

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.000598736864049
... Momentum            : None
. 

.. Epoch: 31 Era: 1
.. Validation accuracy : 88.18
.. Cost                : 0.556157

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.000568800023757
... Momentum            : None
. 

.. Epoch: 32 Era: 1
.. Validation accuracy : 88.19
.. Best validation accuracy
.. Cost                : 0.555439

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.000540359993465
... Momentum            : None
. 

.. Epoch: 33 Era: 1
.. Validation accuracy : 88.2
.. Best validation accuracy
.. Cost                : 0.554762

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.00051334197633
... Momentum            : None
. 

.. Epoch: 34 Era: 1
.. Validation accuracy : 88.19
.. Cost                : 0.554122

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.000487674871692
... Momentum            : None
. 

.. Epoch: 35 Era: 1
.. Validation accuracy : 88.2
.. Cost                : 0.553518

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.000463291129563
... Momentum            : None
. 

.. Epoch: 36 Era: 1
.. Validation accuracy : 88.23
.. Best validation accuracy
.. Cost                : 0.552948

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.000440126575995
... Momentum            : None
. 

.. Epoch: 37 Era: 1
.. Validation accuracy : 88.23
.. Cost                : 0.552409

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.000418120238464
... Momentum            : None
. 

.. Epoch: 38 Era: 1
.. Validation accuracy : 88.23
.. Cost                : 0.551899

| validation  100% Time: 0:00:00                                               
| training  100% Time: 0:00:00                                                 



... Learning Rate       : 0.000397214229451
... Momentum            : None
. 

.. Epoch: 39 Era: 1
.. Validation accuracy : 88.24
.. Best validation accuracy
.. Cost                : 0.551417
... Learning Rate       : 0.000377353513613
... Momentum            : None
.. Training complete.Took 0.923233316667 minutes


- validation   10% ETA:  0:00:01                                               \ validation   15% ETA:  0:00:01                                               | validation   20% ETA:  0:00:00                                               / validation   25% ETA:  0:00:00                                               - validation   30% ETA:  0:00:00                                               \ validation   35% ETA:  0:00:00                                               | validation   40% ETA:  0:00:00                                               / validation   45% ETA:  0:00:00                                               - validation   50% ETA:  0:00:00                                               \ validation   55% ETA:  0:00:00                                               | validation   60% ETA:  0:00:00                                               / validation   65% ETA:  0:00:00                                               - validation   70% ETA:  0:00:00        

This will print a progress for each epoch and will show validation accuracy after each epoch on a validation set that is independent from the training set. By default the training might run for 40 epochs: 20 on a higher learning rate and 20 more on a fine tuning learning rate. The learning rate will be printed after each epoch along with the negative log likelihood loss also.

Every layer also has an ***`layer.output`*** object. The ***`output`*** can be probed directly by using the ***`layer_activity`*** method as long as it is directly or in-directly associated with a ***datastream*** module through an ***`input`*** layer and the network was cooked. We need to do this because the output object is typically a theano computation graph. ***`layer_activity`*** will evaluate this graph for the currently loaded minibatch. Let us observe the activity of the input layer for trial. We only print the shape instead of the whole numpy array to save screen space. Once trained we can observe this output. The layer activity will just be a ***`numpy`*** array of numbers, so let us print its shape instead.

In [11]:
print net.layer_activity(id='1').shape
print net.layers['1'].output_shape

(500, 10)
(500, 10)


The second line of code will verify the output we produced in the first line. An interesting layer output is the output of the ***`objective`*** layer, which will give us the current negative log likelihood of the network, the one that we are trying to minimize.

In [12]:
net.layer_activity(id = '2')

array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]], dtype=float32)

Once we are done training, we can run the network feedforward on the testing set to produce a generalization performance result.

In [13]:
net.test()



.. Testing
.. Testing accuracy : 87.84


\ testing   55% ETA:  0:00:00                                                  | testing   60% ETA:  0:00:00                                                  / testing   65% ETA:  0:00:00                                                  - testing   70% ETA:  0:00:00                                                  \ testing   75% ETA:  0:00:00                                                  | testing   80% ETA:  0:00:00                                                  / testing   85% ETA:  0:00:00                                                  - testing   90% ETA:  0:00:00                                                  \ testing   95% ETA:  0:00:00                                                  | testing  100% ETA:  0:00:00                                                  | testing  100% Time: 0:00:00                                                  


Congratualations, you now know how to use the yann toolbox successfully. A full-fledge code of the logistic regression that we implemented here can be found [here](https://github.com/ragavvenkatesan/yann/blob/master/pantry/tutorials/log_reg.py) . That piece of code also has in-commentary that discusses briefly other options that could be supplied to some of the function calls we made here that explain the processes better.
Hope you liked this quick start guide to the Yann toolbox and have fun!