# 3. Getting Started

* 손고리즘 / 손고리즘ML : 파트 3 - theano [1]
* 김무성    

# Contents

* 3.1 Download
* 3.2 Datasets
    - 3.2.1 MNIST Dataset
* 3.3 Notation
    - 3.3.1 Dataset notation
    - 3.3.2 Math Conventions
    - 3.3.3 List of Symbols and acronyms
* 3.4 A Primer on Supervised Optimization for Deep Learning
    - 3.4.1 Learning a Classifier
        - Zero-One Loss
        - Negative Log-Likelihood Loss
    - 3.4.2 Stochastic Gradient Descent
    - 3.4.3 Regularization
        - L1 and L2 regularization
        - Early-Stopping
    - 3.4.4 Testing
    - 3.4.5 Recap
* 3.5 Theano/Python Tips
    - 3.5.1 Loading and Saving Models
        - Pickle the numpy ndarrays from your shared variables
        - Do not pickle your training or test functions for long-term storage
    - 3.5.2 Plotting Intermediate Results

In [None]:
docker pull jupyter/scipy-notebook

In [None]:
https://github.com/jupyter/docker-stacks/tree/master/scipy-notebook

In [None]:
docker run -d -p 8888:8888 -e GRANT_SUDO=yes --name run_theano jupyter/scipy-notebook

In [None]:
docker-machine ls

In [None]:
docker ps

In [None]:
http://ip:port/

In [None]:
현재는 python3

In [None]:
python2로 바꾸려면 source activate python2

In [None]:
conda install theano (파이썬 2와 파이썬 3에 각각 해주자 - 안쓸 버전엔 안해도 되긴함)

# 3.1 Download

* git clone git://github.com/lisa-lab/DeepLearningTutorials.git

In [13]:
!ls ~/work/DeepLearningTutorials

LICENSE.txt  README.rst  code  data  doc  issues_closed  issues_open  misc


In [14]:
!ls ~/work/DeepLearningTutorials/code

DBN.py		      dA.py		  logistic_cg.py   rbm.py     utils.py
SdA.py		      hmc		  logistic_sgd.py  rnnrbm.py
cA.py		      imdb.py		  lstm.py	   rnnslu.py
convolutional_mlp.py  imdb_preprocess.py  mlp.py	   test.py


# 3.2 Datasets

* 3.2.1 MNIST Dataset

## 3.2.1 MNIST Dataset

<img src="figures/cap3.1.png" width=600 />

* mnist.pkl.gz - http://deeplearning.net/data/mnist/mnist.pkl.gz

In [1]:
!ls

3_Getting_Started.ipynb  figures


In [2]:
!wget http://deeplearning.net/data/mnist/mnist.pkl.gz

converted 'http://deeplearning.net/data/mnist/mnist.pkl.gz' (ANSI_X3.4-1968) -> 'http://deeplearning.net/data/mnist/mnist.pkl.gz' (UTF-8)
--2015-11-23 04:23:26--  http://deeplearning.net/data/mnist/mnist.pkl.gz
Resolving deeplearning.net (deeplearning.net)... 132.204.26.28
Connecting to deeplearning.net (deeplearning.net)|132.204.26.28|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16168813 (15M) [application/x-gzip]
Saving to: 'mnist.pkl.gz'


2015-11-23 04:24:21 (319 KB/s) - 'mnist.pkl.gz' saved [16168813/16168813]



In [1]:
import cPickle, gzip, numpy
# Load the dataset
f = gzip.open('mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = cPickle.load(f)
f.close()

In [3]:
def shared_dataset(data_xy):
    """ Function that loads the dataset into shared variables
    
    The reason we store our dataset in shared variables is to allow
    Theano to copy it into the GPU memory (when code is run on GPU).
    Since copying data into the GPU is slow, copying a minibatch everytime
    is needed (the default behaviour if the data is not in a shared
    variable) would lead to a large decrease in performance.
    """
    data_x, data_y = data_xy
    shared_x = theano.shared(numpy.asarray(data_x, dtype=theano.config.floatX)) 
    shared_y = theano.shared(numpy.asarray(data_y, dtype=theano.config.floatX))
    # When storing data on the GPU it has to be stored as floats
    # therefore we will store the labels as ‘‘floatX‘‘ as well
    # (‘‘shared_y‘‘ does exactly that). But during our computations
    # we need them as ints (we use labels as index, and if they are
    # floats it doesn’t make sense) therefore instead of returning
    # ‘‘shared_y‘‘ we will have to cast it to int. This little hack
    # lets us get around this issue
    return shared_x, T.cast(shared_y, 'int32')

In [4]:
import theano
import theano.tensor as T

test_set_x, test_set_y = shared_dataset(test_set)
valid_set_x, valid_set_y = shared_dataset(valid_set)
train_set_x, train_set_y = shared_dataset(train_set)

batch_size = 500 # size of the minibatch

# accessing the third minibatch of the training set
data  = train_set_x[2 * 500: 3 * 500]
label = train_set_y[2 * 500: 3 * 500]

# 3.3 Notation

* 3.3.1 Dataset notation
* 3.3.2 Math Conventions
* 3.3.3 List of Symbols and acronyms
* 3.3.4 Python Namespaces

## 3.3.1 Dataset notation

## 3.3.2 Math Conventions

<img src="figures/cap3.2.png" width=600 />

<img src="figures/cap3.3.png" width=600 />

## 3.3.3 List of Symbols and acronyms

<img src="figures/cap3.4.png" width=600 />

## 3.3.4 Python Namespaces

In [6]:
import theano
import theano.tensor as T
import numpy

# 3.4 A Primer on Supervised Optimization for Deep Learning

* 3.4.1 Learning a Classifier
* 3.4.2 Stochastic Gradient Descent
* 3.4.3 Regularization 
* 3.4.5 Recap

## 3.4.1 Learning a Classifier

* Zero-One Loss
* Negative Log-Likelihood Loss

### Zero-One Loss

#### 참고 :

<img src="http://fa.bianp.net/talks/trento_may_2015/img/logistic.svg" width=600 />

<img src="figures/cap3.5.png" width=600 />

<img src="figures/cap3.6.png" width=600 />

In [None]:
# zero_one_loss is a Theano variable representing a symbolic
# expression of the zero one loss ; to get the actual value this
# symbolic expression has to be compiled into a Theano function (see # the Theano tutorial for more details)
zero_one_loss = T.sum(T.neq(T.argmax(p_y_given_x), y))

### Negative Log-Likelihood Loss

<img src="figures/cap3.7.png" width=600 />

<img src="figures/cap3.8.png" width=600 />

In [None]:
# NLL is a symbolic variable ; to get the actual value of NLL, this symbolic # expression has to be compiled into a Theano function (see the Theano
# tutorial for more details)
NLL = -T.sum(T.log(p_y_given_x)[T.arange(y.shape[0]), y])
# note on syntax: T.arange(y.shape[0]) is a vector of integers [0,1,2,...,len(y)].
# Indexing a matrix M by the two vectors [0,1,...,K], [a,b,...,k] returns the
# elements M[0,a], M[1,b], ..., M[K,k] as a vector.  Here, we use this
# syntax to retrieve the log-probability of the correct labels, y.

## 3.4.2 Stochastic Gradient Descent

#### 참고 [3]

- http://vision.stanford.edu/teaching/cs231n/slides/lecture4.pdf

In [None]:
# GRADIENT DESCENT
while True:
    loss = f(params)
    d_loss_wrt_params = ... # compute gradient 
    params -= learning_rate * d_loss_wrt_params 
    if <stopping condition is met>:
        return params

In [None]:
# STOCHASTIC GRADIENT DESCENT
for (x_i,y_i) in training_set:
                            # imagine an infinite generator
                            # that may repeat examples (if there is only a finite training
    loss = f(params, x_i, y_i) 
    d_loss_wrt_params = ... # compute gradient 
    params -= learning_rate * d_loss_wrt_params 
    if <stopping condition is met>:
        return params

In [None]:
for (x_batch,y_batch) in train_batches:
                            # imagine an infinite generator
                            # that may repeat examples
    loss = f(params, x_batch, y_batch)
    d_loss_wrt_params = ... # compute gradient using theano 
    params -= learning_rate * d_loss_wrt_params
    if <stopping condition is met>: 
        return params

In [None]:
# Minibatch Stochastic Gradient Descent

# assume loss is a symbolic description of the loss function given
# the symbolic variables params (shared variable), x_batch, y_batch;

# compute gradient of loss with respect to params
d_loss_wrt_params = T.grad(loss, params)

# compile the MSGD step into a theano function
updates = [(params, params - learning_rate * d_loss_wrt_params)]
MSGD = theano.function([x_batch,y_batch], loss, updates=updates)

for (x_batch, y_batch) in train_batches:
    # here x_batch and y_batch are elements of train_batches and
    # therefore numpy arrays; function MSGD also updates the params 
    print('Current loss is ', MSGD(x_batch, y_batch))
    if stopping_condition_is_met:
        return params

## 3.4.3 Regularization 

* L1 and L2 regularization
* Early-Stopping

### L1 and L2 regularization

<img src="figures/cap3.9.png" width=600 />

#### 참고 [2] : 

<img src="https://github.com/songorithm/ML/raw/f9f2c631f14613f5051eed28f70ffeb130d9c219/part2/study04/dml07/figures/cap7.21.png" width=600 />

In [None]:
# symbolic Theano variable that represents the L1 regularization term
L1 = T.sum(abs(param))

# symbolic Theano variable that represents the squared L2 term
L2_sqr = T.sum(param ** 2)

# the loss
loss = NLL + lambda_1 * L1 + lambda_2 * L2

### Early-Stopping

#### 참고 [2] : 

<img src="https://github.com/songorithm/ML/raw/f9f2c631f14613f5051eed28f70ffeb130d9c219/part2/study04/dml07/figures/cap7.44.png" width=600 />

<img src="https://github.com/songorithm/ML/raw/f9f2c631f14613f5051eed28f70ffeb130d9c219/part2/study04/dml07/figures/cap7.48.png" width=600 />

In [None]:
# early-stopping parameters
patience = 5000 # look as this many examples regardless 
patience_increase = 2 # wait this much longer when a new best is
                        # found
improvement_threshold = 0.995 # a relative improvement of this much is
                               # considered significant
validation_frequency = min(n_train_batches, patience/2) # go through this many
                              # minibatches before checking the network
                              # on the validation set; in this case we
                              # check every epoch
best_params = None
best_validation_loss = numpy.inf
test_score = 0.
start_time = time.clock()

done_looping = False
epoch = 0

while (epoch < n_epochs) and (not done_looping):
    # Report "1" for first epoch, "n_epochs" for last epoch
    epoch = epoch + 1
    for minibatch_index in xrange(n_train_batches):

        d_loss_wrt_params = ... # compute gradient
        params -= learning_rate * d_loss_wrt_params # gradient descent
        
        # iteration number. We want it to start at 0.
        iter = (epoch - 1) * n_train_batches + minibatch_index
        # note that if we do ‘iter % validation_frequency‘ it will be 
        # true for iter = 0 which we do not want. We want it true for 
        # iter = validation_frequency - 1.
        if (iter + 1) % validation_frequency == 0:

            this_validation_loss = ... # compute zero-one loss on validation set 
            
            if this_validation_loss < best_validation_loss:
                                       
                # improve patience if loss improvement is good enough
                if this_validation_loss < best_validation_loss * improvement_threshold:
                    patience = max(patience, iter * patience_increase)
                
                best_params = copy.deepcopy(params)
                best_validation_loss = this_validation_loss

        if patience <= iter:
            done_looping = True
            break

# POSTCONDITION :
# best_params refers to the best out-of-sample parameters observed during the optimization

## 3.4.4 Testing

## 3.4.5 Recap

# 3.5 Theano/Python Tips

* 3.5.1 Loading and Saving Models
* 3.5.2 Plotting Intermediate Results

## 3.5.1 Loading and Saving Models

* Pickle the numpy ndarrays from your shared variables
* Do not pickle your training or test functions for long-term storage

### Pickle the numpy ndarrays from your shared variables

In [None]:
import cPickle
save_file = open('path', 'wb') # this will overwrite current contents
cPickle.dump(w.get_value(borrow=True), save_file, -1) # the -1 is for HIGHEST_PROTOCOL 
cPickle.dump(v.get_value(borrow=True), save_file, -1) # .. and it triggers much more e 
cPickle.dump(u.get_value(borrow=True), save_file, -1) # .. storage than numpy’s defaul 
save_file.close()

In [None]:
save_file = open('path')
w.set_value(cPickle.load(save_file), borrow=True) 
v.set_value(cPickle.load(save_file), borrow=True) 
u.set_value(cPickle.load(save_file), borrow=True

### Do not pickle your training or test functions for long-term storage

## 3.5.2 Plotting Intermediate Results

In [1]:
!ls

3_Getting_Started.ipynb  figures  logistic_sgd.py  mnist.pkl.gz


# 참고자료 

* [1] Deep Learning Tutorial - http://deeplearning.net/tutorial/deeplearning.pdf
* [2] 
* [3] Optimization, higher-level representations, image features - 