# 시각 심화

- **Instructor**: Jongwoo Lim / Jiun Bae
- **Email**: [jlim@hanyang.ac.kr](mailto:jlim@hanyang.ac.kr) / [jiunbae.623@gmail.com](mailto:jiunbae.623@gmail.com)

## NeuralNetwork Example

In this example you will practice a simple neural network written by only [Numpy](https://www.numpy.org) which is fundamental package for scientific computing with Python. The goals of this example are as follows:

- Understand **Neural Networks** and how they work.
- Learn basically how to **write and use code**(*Numpy*).

*If you are more familiar with PyTorch and TensorFlow(or Keras), You might be wondering why to write from the ground up with numpy instead of the built-in framework. This process is essential for understanding how a neural network works, and if you understand it, will not be too difficult to write in code.*

And this example also is written in [IPython Notebook](https://ipython.org/notebook.html), an interactive computational environment, in which you can run code directly.

### Environments

In this assignment, we assume the follows environments. 

The [Python](https://www.python.org) is a programming language that lets you work quickly and integrate systems more effectively. It is widely used in various fields, and also used in machine learning.

The [Pytorch](https://pytorch.org) is an open source deep learning platform, provides a seamless path from research to production.

The [Tensorflow](https://www.tensorflow.org) is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.

The [CUDA®](https://developer.nvidia.com/cuda-zone) Toolkit provides high-performance GPU-accelerated computation. In deep learning, the model takes an age to train without GPU-acceleration. ~~even with the GPU, it still takes a lot of time~~.


- [Python3](https://www.python.org/downloads/) (recommend 3.6 or above)
- [PyTorch](https://pytorch.org) (recommend 1.0)
- [Tensorflow](https://tensorflow.org) (recommend above 1.13.0, but under 2.0 *There are huge difference between 2.0 and below*)
- [NumPy](http://www.numpy.org) the fundamental package for scientific computing with Python


- (Optional) [Anaconda](https://www.anaconda.com/distribution/#download-section), *popular Python Data Science Platform*
- (Optional) [Jupyter](https://jupyter.org/) (Notebook or Lab)
- (Optional) [CUDA](https://developer.nvidia.com/cuda-downloads) support GPU


Python packages can install by `pip install [package name]` or using **Anaconda** by `conda install [package name]`.

*If you are having trouble installing or something else, please contact TA or jiun.maydev@gmail.com.*

# Code

### Import packages

Numpy the basic scientific computing package used in customary.

In [None]:
from pathlib import Path

In [None]:
import numpy as np

## Load MNIST dataset

Tesnorflow provide mnist dataset as binary archive file [link](https://chromium.googlesource.com/external/github.com/tensorflow/tensorflow/+/r0.7/tensorflow/g3doc/tutorials/mnist/download/index.md).
In this exampe, we already downloaded datafile in `./data` directory. So just laod dataset from `./data`.

In [None]:
import idx2numpy

In [None]:
data_dir = Path('../data/MNIST/raw')

In [None]:
train_images = idx2numpy.convert_from_file(str(data_dir.joinpath('train-images-idx3-ubyte')))
train_labels = idx2numpy.convert_from_file(str(data_dir.joinpath('train-labels-idx1-ubyte')))
test_images = idx2numpy.convert_from_file(str(data_dir.joinpath('t10k-images-idx3-ubyte')))
test_labels = idx2numpy.convert_from_file(str(data_dir.joinpath('t10k-labels-idx1-ubyte')))

## (Optional) Visualize

In [None]:
from PIL import Image
from IPython.display import display

def show(ary):
    display(Image.fromarray(ary))

In [None]:
for image, label, _ in zip(train_images, train_labels, range(10)):
    print(label)
    show(image.reshape((28, 28)))

## Preprocessing

The data must be preprocessed before training the network. If you inspect the first image in the training set, you will see that the pixel values fall in the range of 0 to 255. We scale these values to a range of 0 to 1 before feeding to the neural network model. For this, we divide the values by 255. It's important that the training set and the testing set are preprocessed in the same way:

In [None]:
train_images = np.expand_dims(train_images, -1)
test_images = np.expand_dims(test_images, -1)

train_images = train_images / 255.
test_images = test_images / 255.

In [None]:
num_classes = 10
train_labels = np.eye(num_classes)[train_labels]
test_labels = np.eye(num_classes)[test_labels]

## Network

This is a simple two dense(fully connected) layer network. The code is quite easy.

So, whole network architecture as follow:

- Dense
- Sigmoid
- Dense
- Sigmoid
- Dense
- Sigmoid

### Dense Layer

In [None]:
class Layer:
    pass

class Dense(Layer):
    def __init__(self, input_units, output_units):
        self.weights = np.random.randn(output_units, input_units) * .01
        self.biases = np.random.randn(output_units, 1) * .1
        
    def forward(self, inputs):
        self.inputs = inputs
        
        return np.dot(self.weights, inputs) + self.biases
      
    def backward(self, grads):
        size = np.size(grads, -1)

        self.grad_weights = np.dot(grads, self.inputs.T) / size
        self.grad_biases = np.sum(grads, axis=1, keepdims=True) / size

        return np.dot(self.weights.T, grads)

    def update(self, lr: float = .01):
        # Here we perform a stochastic gradient descent step.
        self.weights = self.weights - lr * self.grad_weights
        self.biases = self.biases - lr * self.grad_biases

### Activate Function: Sigmoid

In [None]:
class Sigmoid(Layer):
    def forward(self, inputs):
        self.inputs = inputs
        return 1. / (1. + np.exp(-inputs))

    def backward(self, grads):
        r = self.forward(self.inputs)
        return grads * r * (1. - r)
    
    def update(self, lr):
        pass

### Training

In [None]:
from typing import List
from functools import reduce


def fit(networks: List[Layer], X, y, train=True, epsilon=1e-7):
    X = np.reshape(X, (X.shape[0], -1))
    
    # Forward
    preds = reduce(lambda inputs, layer: layer.forward(inputs), [X.T, *networks]).T
    
    # Compute Loss
    loss = -np.sum(y * np.log(np.clip(preds, epsilon, 1. - epsilon) + epsilon)) / np.size(preds, 0)
    
    if train:
        # Backward
        grads = -(np.divide(y, preds) - np.divide(1 - y, 1 - preds))
        grads = reduce(lambda grads, layer: layer.backward(grads), [grads.T, *reversed(networks)])
    
        # Update
        for layer in networks:
            layer.update(lr)
    
    return loss.mean(), preds

## Prepare

In [None]:
def get_batch(datasets, batch):
    image, label = None, None
    images, labels = datasets
    for b, (i, l) in enumerate(zip(*datasets)):
        if not (b % batch):
            if image is not None and label is not None:
                yield image, label
            image = np.empty((batch, 28, 28, 1), dtype=np.float32)
            label = np.empty((batch, 10), dtype=np.uint8)
            
        image[b % batch] = i
        label[b % batch] = l

In [None]:
np.random.seed(42)

In [None]:
lr = .1
batch = 128
epochs = 10

In [None]:
network = [
    Dense(28*28, 10),
    Sigmoid(),
]

## Train

In [None]:
for epoch in range(epochs):
    # Train scope
    train_loss, test_loss, test_acc = 0, 0, 0
    for images, labels in get_batch((train_images, train_labels), batch):
        loss, _ = fit(network, images, labels)
        train_loss += loss
        
    for images, labels in get_batch((test_images, test_labels), batch):
        loss, preds = fit(network, images, labels, train=False)

        test_loss += loss
        test_acc += (preds.argmax(axis=-1) == labels.argmax(axis=-1)).mean()
    print(f'Epoch: {epoch}')
    print(f'\tTrain Loss: {train_loss / (len(train_images) / batch)}')
    print(f'\tTest Loss: {test_loss / (len(test_images) / batch)}')
    print(f'\tTest Acc: {test_acc / (len(test_images) / batch)}')

## Test

In [None]:
for image, label, _ in zip(test_images, test_labels, range(10)):
    show((image[:, :, 0] * 255.).astype(np.uint8))
    prediction = reduce(lambda inputs, layer: layer.forward(inputs), [np.reshape(np.array(image), -1)[None, :].T, *network])
    print (f'Label: {label.argmax(-1)}, Prediction: {prediction.argmax()}')

## Q1. More complex model

We can create more complex models by adding layers to the network.

What makes the next model(*two-dense-layer*) different from the previous model with one layer?

In [None]:
network = [
    Dense(28*28, 128),
    Sigmoid(),
    Dense(128, 10),
    Sigmoid(),
]

## Q2. What is activation function?

What is the role of the Sigmoid in the middle? If not, what would happen?

What other functions will replace Sigmoid?

In [None]:
network = [
    Dense(28*28, 128),
    Dense(128, 10),
    Sigmoid(),
]

## Q3. And then more, more, more ... layers?

What will happen if you stack a lot of layers?

In [None]:
network = [
    Dense(28*28, 600),
    Sigmoid(),
    Dense(600, 500),
    Sigmoid(),
    Dense(500, 400),
    Sigmoid(),
    Dense(400, 300),
    Sigmoid(),
    Dense(300, 200),
    Sigmoid(),
    Dense(200, 100),
    Sigmoid(),
    Dense(100, 10),
    Sigmoid(),
]

## Q4. Parameters

Check the size of the model parameters.

In [None]:
def sizeof(model):
    return sum(getattr(layer, attr, np.empty(0)).size for attr in ['weights', 'bias'] for layer in model)