# Lecture Summary

## @ NueralNet 개요

Lecture#1(Lec1) 에서 다룬 Linear model은 DeepLearning 단계가 아님  
Lec1에서의 과정을 요약하면:  
**(a)Linear modeling으로 socore를 계산(Wx+b) 하고, (b)이를 softmax로 총합이 1인 확률값으로 계산하고, (c) 최종적으로 클래스 값과 대응시키는 학습(multi-class 분류기인 multinomial logistic regression 구현, Loss function:cross entropy, gradient descent를 활용하여 최적해 도출)**

![s1](img/L2-snap1.png)

Udacity(google)의 본 강좌에서는, Neural Network에서 흔히 접근하는 Neuron을 통한 메타포적 설명과 수식적 설명을 차치하고, Lazy Engineer입장에서 CS적인 문제 해결 접근방식으로 DeepNetwork 문제해결방법을 설명.(Lazy Engineer, No Neuron 라는 접근 방식이 비CS입장에서는 흥미롭고 효율적인것 같음) 

Non linear 문제(W1\*X1+W2\*X2가 아닌 W\*X1\*X2 식의 문제)를 해결하기 위해서는, ReLu(rectified linear unit)와 같은 non-linear 함수를 활용해야함 (activation function)


![snap2](img/L2-snap2a.png)

SW engineer 입장에서, 다음과 같이 non-linear 문제를 해결하는 방안을 만들 수 있음:  H개의 ReLu함수와 곱해진 연산과정이 추가

![snap2b](img/L2-snap2b.png)

결론적으로 아래와 같은 일련의 연산 과정으로 Deep Network 방법을 구축하게 됨

![snap2b](img/L2-snap2c.png)

## @ Optimization
* Loss Function: Cross Entropy
![lossfunction](img/L2-loss.png )

* Gradient Descent
![GD](img/L2-GD.png)

* Stochastic Gradient Descent(SGD)
![SGD](img/L2-SGDa.png)

* SGD 성능개선방법
![SGD](img/L2-SGDb.png)
![SGD](img/L2-SGDc.png)
![SGD](img/L2-SGDd.png)


## @ Evaluation&Validation (when using SGD)

Learning Step이 커서 빠르게 Learning 한다고 좋은 것이 아니다. Loss가 큰 상태로 학습이 끝날수 있으니, 대체로 learning step은 작은게 좋음.
특히나 SGD에서는 hyper parameter가 굉장히 많은데, ADAGRAD(자세한 설명은 안하고 옵션제시만함)를 사용할수 있거나, 위에서 언급했듯 learning rated을 낮춰가면됨

![eval](img/L2-evala.png)
![eval](img/L2-evalb.png)
![eval](img/L2-evalc.png)

----
# Assignment ( #2 )

Previously in 1_notmnist.ipynb, we created a pickle with formatted datasets for training, development and testing on the notMNIST dataset.  
The goal of this assignment is to progressively train deeper and more accurate models using TensorFlow.

** Assignment#1 에서 만들었던 모델을 Deeper&more accurate하게 만드는게 Assign2 목적 **  
[참조코드1](https://github.com/Arn-O/udacity-deep-learning/blob/master/1_notmnist.ipynb)  
[참조코드2](https://github.com/santiaago/udacity.ud730.deeplearning)

In [32]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

#ymjung
from scipy import stats

assign#1에서 생성시킨 데이터를 리로드

In [33]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a shape that's more adapted to the models we're going to train:
- data as a flat matrix,
- labels as float 1-hot encodings.

모델 훈련에 적합하도록 데이터를 flat matrix 로 변형 & 1-hot encoding 방식으로 레이블링

In [34]:
if False:
    """reformat() 부연"""
    """부연"""
    # reformat 함수 부연: 
    # -1 을 인자로 넣으면, 나머지 인자값을 통해 행 또는 열을'알아서' 판단.
    # a 가 2X10 배열일때, reshape((5,-1))을 적으면 열의 개수를 알아서 판단
    a = np.array([[1,2,3,4,5],[6,7,8,9,10]])
    print (stats.describe(a))
    print ("\n")
    print (a.reshape((1,10)))
    print (a.reshape((-1,10))) 
    print (a.reshape((5,-1))) # the unspecified value is inferred to be 2

    # 1-hot encoding 하기전
    print (train_labels)
    print (valid_labels)
    print (test_labels)
    print (test_labels[-10])
    
    # np.arange(10) == test_labels[:,None] 과정 설명: test_label을 N*1로 쪼개서, 
    # 0~9(arange(10), 1*10 array)에 해당하는 값을 True로 하는 1*10 boolean array를 생성, 이를 float으로 변환
    print (test_labels[:,None])
    print (np.arange(10)==test_labels[:,None])

In [35]:
image_size = 28
num_labels = 10

def reformat(dataset, labels):
  dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
  # Map 0 to [1.0, 0.0, 0.0 ...], 1 to [0.0, 1.0, 0.0 ...]
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels


train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 784) (200000, 10)
Validation set (10000, 784) (10000, 10)
Test set (10000, 784) (10000, 10)


We're first going to train a multinomial logistic regression using simple gradient descent.  
TensorFlow works like this:  
- First you describe the computation that you want to see performed: what the inputs, the variables, and the operations look like. These get created as nodes over a computation graph. This description is all contained within the block below:  

    ```with graph.as_default():
    ...    ```

- Then you can run the operations on this graph as many times as you want by calling session.run(), providing it outputs to fetch from the graph that get returned. This runtime operation is all contained in the block below:  

    ```with tf.Session(graph=graph) as session:
    ...```

Let's load all the data into TensorFlow and build the computation graph corresponding to our training:

** 텐서플로우 연산 방식인 construct graph > run session 의 순서로 코드 작성 **

Build graph:

In [36]:
# With gradient descent training, even this much data is prohibitive.
# Subset the training data for faster turnaround.
train_subset = 10000
graph = tf.Graph()

with graph.as_default():
    # Input data.
    # Load the training, validation and test data into constants that are
    # attached to the graph.
    tf_train_dataset = tf.constant(train_dataset[:train_subset, :])
    tf_train_labels = tf.constant(train_labels[:train_subset])
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)

    # Variables.
    # These are the parameters that we are going to be training. The weight
    # matrix will be initialized using random valued following a (truncated)
    # normal distribution. The biases get initialized to zero.
    weights = tf.Variable(tf.truncated_normal([image_size * image_size, num_labels]))
    biases = tf.Variable(tf.zeros([num_labels]))

    # Training computation.
    # We multiply the inputs with the weight matrix, and add biases. We compute
    # the softmax and cross-entropy (it's one operation in TensorFlow, because
    # it's very common, and it can be optimized). We take the average of this
    # cross-entropy across all training examples: that's our loss.
    logits = tf.matmul(tf_train_dataset, weights) + biases
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))

    # Optimizer.
    # We are going to find the minimum of this loss using gradient descent.
    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

    # Predictions for the training, validation, and test data.
    # These are not part of training, but merely here so that we can report
    # accuracy figures as we train.

    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(tf.matmul(tf_valid_dataset, weights) + biases)
    test_prediction = tf.nn.softmax(tf.matmul(tf_test_dataset, weights) + biases)

Compute:

In [37]:
num_steps = 801

def accuracy(predictions, labels):
    return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels,1))
            / predictions.shape[0])

with tf.Session(graph = graph) as session:
    tf.initialize_all_variables().run()
    print('Initialized')
    for step in range(num_steps):
        # Run the computations. We tell .run() that we want to run the optimizer,
        # and get the loss value and the training predictions returned as numpy arrays.
        _, l, predictions = session.run([optimizer, loss, train_prediction])
        if (step % 100 ==0 ):
            print ('Loss at step %d: %f' %(step,l))
            print ('Training accuracy: %.1f%%' % accuracy(predictions, train_labels[:train_subset, :]))
            # Calling .eval() on valid_prediction is basically like calling run(), but
            # just to get that one numpy array. Note that it recomputes all its graph
            # dependencies.
            print ('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))
    print ('Test Accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))



Initialized
Loss at step 0: 16.987694
Training accuracy: 2.4%
Validation accuracy: 12.4%
Loss at step 100: 0.482089
Training accuracy: 95.3%
Validation accuracy: 12.5%
Loss at step 200: 0.267107
Training accuracy: 96.9%
Validation accuracy: 12.0%
Loss at step 300: 0.164763
Training accuracy: 97.7%
Validation accuracy: 11.5%
Loss at step 400: 0.108455
Training accuracy: 98.3%
Validation accuracy: 11.3%
Loss at step 500: 0.075428
Training accuracy: 98.8%
Validation accuracy: 11.1%
Loss at step 600: 0.054257
Training accuracy: 99.1%
Validation accuracy: 10.9%
Loss at step 700: 0.039811
Training accuracy: 99.3%
Validation accuracy: 10.9%
Loss at step 800: 0.030040
Training accuracy: 99.4%
Validation accuracy: 10.8%
Test Accuracy: 10.9%


Let's now switch to stochastic gradient descent training instead, which is much faster.  


The graph will be similar, except that instead of holding all the training data into a constant node, we create a Placeholder node which will be fed actual data at every call of `session.run().`  

### 이번엔 위와 달리 SGD를 이용해서 Graph Construction:

In [27]:
batch_size = 128
graph = tf.Graph()

with graph.as_default():
    # Input data. For the training data, we use a placeholder that will be fed
    # at run time with a training minibatch.
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)

    # Variables.
    weights = tf.Variable(tf.truncated_normal([image_size * image_size, num_labels]))
    biases = tf.Variable(tf.zeros([num_labels]))

    # Training computation.
    logits = tf.matmul(tf_train_dataset, weights) + biases
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))

    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(tf.matmul(tf_valid_dataset, weights) + biases)
    test_prediction = tf.nn.softmax(tf.matmul(tf_test_dataset, weights) + biases)

Run:

In [28]:
num_steps = 3001

with tf.Session(graph = graph) as session:
    tf.initialize_all_variables().run()
    print ("Initialized")
    for step in range(num_steps):
        # Pick an offset within the traning data, wich has been randomized.
        #note: we could use better randomization across epochs.
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        
        # Generate a minibatch
        batch_data = train_dataset[offset: (offset + batch_size), :]
        batch_labels = train_labels[offset: (offset + batch_size), :]
        
        # Prepare a dictionary telling the session where to feed the minibatch.
        # The key of the dictionary is the placeholder node of the graph to be fed,
        # and the value is the numpy array to feed to it.
        feed_dict = {tf_train_dataset: batch_data, tf_train_labels: batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict = feed_dict)
        
        if (step % 500 ==0):
            print("Minibatch loss at step %d: %f" % (step, l))
            print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
            print("Validation accuracy: %.1f%%" % accuracy(valid_prediction.eval(), valid_labels))    
            
    print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))
    


Initialized
Minibatch loss at step 0: 24.483419
Minibatch accuracy: 2.3%
Validation accuracy: 13.9%
Minibatch loss at step 500: 0.577080
Minibatch accuracy: 94.5%
Validation accuracy: 21.4%
Minibatch loss at step 1000: 0.452049
Minibatch accuracy: 96.1%
Validation accuracy: 20.0%
Minibatch loss at step 1500: 0.406569
Minibatch accuracy: 98.4%
Validation accuracy: 17.4%
Minibatch loss at step 2000: 0.380601
Minibatch accuracy: 96.9%
Validation accuracy: 19.3%
Minibatch loss at step 2500: 1.554680
Minibatch accuracy: 85.2%
Validation accuracy: 26.0%
Minibatch loss at step 3000: 0.991193
Minibatch accuracy: 94.5%
Validation accuracy: 18.0%
Test accuracy: 20.0%


## Problem
Turn the logistic regression example with SGD into a 1-hidden layer neural network with rectified linear units (nn.relu()) and 1024 hidden nodes. This model should improve your validation / test accuracy.

In [38]:
batch_size = 128

graph = tf.Graph()

with graph.as_default():
    # input data
    # for training data, we use a placeholder to be fed at run time with a training minibatch.
    tf_train_dataset = tf.placeholder(tf.float32, shape = (batch_size, image_size * image_size))
    tf_train_labels = tf.placeholder(tf.float32, shape = (batch_size, num_labels) )
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    
    """ hidden layer """
    hidden_nodes = 1024
    # variable for hidden layer
    hidden_weights = tf.Variable(tf.truncated_normal([image_size * image_size, hidden_nodes]))
    hidden_biases = tf.Variable(tf.zeros([hidden_nodes]))
    hidden_layer = tf.nn.relu(tf.matmul( tf_train_dataset, hidden_weights) + hidden_biases)# activation function
    
    #Variable
    weights = tf.Variable(tf.truncated_normal([hidden_nodes, num_labels]))
    biases = tf.Variable(tf.zeros([num_labels]))
    
    #Training computation
    logits = tf.matmul(hidden_layer, weights) + biases
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
    
    #Optimizer
    optimizer = tf.train.GradientDescentOptimizer(.5).minimize(loss)
    
    #Predictions
    train_prediction = tf.nn.softmax(logits)
    valid_relu = tf.nn.relu(tf.matmul(tf_valid_dataset, hidden_weights) + hidden_biases)
    valid_prediction = tf.nn.softmax(tf.matmul(valid_relu, weights)+biases)
    
    test_relu = tf.nn.relu(tf.matmul(tf_test_dataset, hidden_weights)+hidden_biases)
    test_prediction = tf.nn.softmax(tf.matmul(test_relu, weights) + biases)

Execute:

In [42]:
num_steps = 3001

with tf.Session(graph=graph) as session:
    tf.initialize_all_variables().run()
    print("Initialized")
    for step in range(num_steps):
        # randomninzed offset 
        # Note: could use better randomization across epochs
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        # Generate a minibatch
        batch_data = train_dataset[offset:(offset + batch_size), :]
        batch_labels = train_labels[offset: (offset + batch_size), :]
        # Prepare a dictionary telling the session where to feed the mini batch
        # The Key of the dictionary is the placeholder node of the graph to be fed, 
        # and the value is the numpy array to feed to it
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels: batch_labels}
        _,l,predictions = session.run([optimizer, loss, train_prediction], feed_dict = feed_dict)
        if (step % 500 == 0):
            print ("Minibatch loss at step %d: %f" % (step, l))
            print ("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
            print ("Vlidation accuracy: %.1f%%" % accuracy(valid_prediction.eval(), valid_labels))
    print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels) )


Initialized
Minibatch loss at step 0: 278.124908
Minibatch accuracy: 1.6%
Vlidation accuracy: 10.0%
Minibatch loss at step 500: 0.000000
Minibatch accuracy: 100.0%
Vlidation accuracy: 11.1%
Minibatch loss at step 1000: 0.000009
Minibatch accuracy: 100.0%
Vlidation accuracy: 13.8%
Minibatch loss at step 1500: 0.000000
Minibatch accuracy: 100.0%
Vlidation accuracy: 10.1%
Minibatch loss at step 2000: 0.000000
Minibatch accuracy: 100.0%
Vlidation accuracy: 18.6%
Minibatch loss at step 2500: 10.274952
Minibatch accuracy: 89.8%
Vlidation accuracy: 28.6%
Minibatch loss at step 3000: 0.000000
Minibatch accuracy: 100.0%
Vlidation accuracy: 16.2%
Test accuracy: 16.4%
