In [1]:
import keras
keras.__version__

Using TensorFlow backend.


'2.2.4'

# A first look at a neural network

This notebook contains the code samples found in Chapter 2, Section 1 of [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff). Note that the original text features far more content, in particular further explanations and figures: in this notebook, you will only find source code and related comments.

----

We will now take a look at a **first concrete example of a neural network**, which makes use of the Python library Keras to learn **to classify 
hand-written digits**. 

Unless you already have experience with Keras or similar libraries, you will not understand everything about this 
first example right away. You probably haven't even installed Keras yet. Don't worry, that is perfectly fine. 

In the next chapter, we will 
review each element in our example and explain them in detail. 

So don't worry if some steps seem arbitrary or look like magic to you! 
We've got to start somewhere.

- The problem we are trying to solve here is **to classify grayscale images of handwritten digits (28 pixels by 28 pixels), into their 10 categories (0 to 9)**. 

- **The dataset** we will use is the **MNIST dataset**, a classic dataset in the machine learning community, which has been 
around for almost as long as the field itself and has been very intensively studied. 
- It's a set of **60,000 training images**, plus **10,000 test (6만개에서도 validation 데이터를 나눌 것임)
images**, assembled by the National Institute of Standards and Technology (the NIST in MNIST) in the 1980s. 
- You can think of "solving" MNIST 
as the "Hello World" of deep learning -- it's what you do to verify that your algorithms are working as expected. 
- As you become a machine 
learning practitioner, you will see MNIST come up over and over again, in scientific papers, blog posts, and so on.

### dataset 

70% - train : train a model

10% - validation : model selection
*** validation 데이터는 꼭 있어야함!!!!!!★★

10% - test : generalization(성능, 퍼포먼스를 measure)

test 데이터에서는 어떠한 insight 도 가져와서는 안됨. 나중에 모델을 고르고나서 테스트 하는 것이 테스트 데이터. 그 전에는 절대 건드리면 안됨!

### Neural network
- weight 로 구성. weight를 업데이트 하는 것이 training. 즉, 최적의 weight를 찾는 것이 궁극적인 목표.
- 예측값과 정답의 값 차이가 가장 적게끔 하기!

<img src = "images/mnist.png">

The MNIST dataset comes pre-loaded in Keras, in the form of a set of four Numpy arrays:

In [2]:
from keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

## Train dataset, Test dataset, validation dataset
- `train_images` and `train_labels` form the "training set", the data that the model will learn from. 
- The model will then be tested on the 
"test set", `test_images` and `test_labels`. 
- Our images are encoded as Numpy arrays, and the labels are simply an array of digits, ranging 
from 0 to 9. 
- There is a one-to-one correspondence between the images and the labels.

Let's have a look at the training data:

In [3]:
train_images.shape ### 6만개가 있고 크기는 28x28

(60000, 28, 28)

In [59]:
len(train_images[0]) ### 길이가 28

28

In [60]:
len(train_labels)

60000

In [61]:
train_labels

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

Let's have a look at the test data:

In [62]:
test_images.shape

(10000, 28, 28)

In [63]:
len(test_labels)

10000

In [64]:
test_labels

array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)

### Our workflow will be as follow: 
- first we will present our neural network with the training data, `train_images` and `train_labels`. 
- The 
network will then learn to associate images and labels. 
- Finally, we will ask the network to produce predictions for `test_images`, and we 
will verify if these predictions match the labels from `test_labels`.

##Let's build our network -- again, remember that you aren't supposed to understand everything about this example just yet.

## WORKFLOW


### 1.problem define
- 너무 데이터에만 치중되어 있으면 재미가 없어질 수 있음. 따라서 어떤 문제를 풀 수 있는지에 대해서 생각해보기

### 2. data availability


### 3. data preparation 
- collection
- preprocessing

### 4. NN architecture design (input / output)
- data representation

### 5. training

### 6. inference with test data set

In [65]:
from keras import models
from keras import layers

network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,))) ## 이미지가 28*28 모양이므로 이거를 하나로 폈음
network.add(layers.Dense(10, activation='softmax'))

#### fully connected layer : 이전 노드와 모두 연결되어 있는 것 (Dense)
- dense 512 : 이전 노드가 512개(?)

#### dense 10 :  0부터 9까지의 숫자중 어디에 속하는지에 대한 확률을 나오게 하고 싶기 때문에..... 각 뉴런이 각 클래스에 속할 확률을 계산하기 위함. 즉 input data가 0~9 일 확률을 각 노드별로 나타내야 하므로 10개.


- 인풋노드 28*28개 가 인풋레이어
- 512개의 노드를 가진 히든레이어
- 10개의 노드를 가진 아웃풋레이어


#### Relu :  MAX(X,0)


In [66]:
network.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_5 (Dense)              (None, 512)               401920    
_________________________________________________________________
dense_6 (Dense)              (None, 10)                5130      
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________


### Architecture of Deep Neural Networks
- The core building block of neural networks is the "**layer**", a data-processing module which you can conceive as a "filter" for data. 
- Some 
data comes in, and comes out in a more useful form. 
- Precisely, layers extract **representations** out of the data fed into them -- hopefully 
**representations** that are more meaningful for the problem at hand. 
- Most of deep learning really consists of chaining together simple layers 
which will implement a form of progressive "data distillation". 
- A deep learning model is like a sieve for data processing, made of a 
succession of increasingly refined data filters -- the "layers".

**Example**: 
- Here our network consists of a sequence of two `Dense` layers, which are densely-connected (also called "fully-connected") neural layers. 
(weight 가 있는 layer 만 얘기해서 input layer는 제외했음)

- The second (and last) layer is a 10-way "softmax" layer, which means it will return an array of 10 probability scores (summing to 1). 


-> multiclass classification 에서 activation function으로 사용된다.


- Each score will be the probability that the current digit image belongs to one of our 10 digit classes.


-> 10개의 클래스중에 해당 클래스에 input 이 속할 확률. 즉 확률 값을 10개를 갖게되며, 가장 높은 확률이 해당 input의 예측 값이 됨.


To make our network ready for training, we need to pick three more things, as part of "compilation" step:

* A loss function: the is how the network will be able to measure how good a job it is doing on its training data, and thus how it will be 

- 손실함수(error function) : prediction한 값과 실제 정답 값의 ※차이※를 정의해주는 함수. 주로 cross-entropy/유클리디안 loss 많이 사용함. 
- documentation 참고


able to steer itself in the right direction.
* An optimizer: this is the mechanism through which the network will update itself based on the data it sees and its loss function.

-> weight 을 학습하는 방법을 정의...

* Metrics to monitor during training and testing. Here we will only care about accuracy (the fraction of the images that were correctly 
classified).

-> metric: performance measurement

The exact purpose of the loss function and the optimizer will be made clear throughout the next two chapters.

- layer로 구성되어 있음. (input - hidden - output)
- hidden layer 가 많으면 왜 성능이 좋을까?: 복잡한 형태를 표현할 수 있고 high-level feature를 학습할 수 있음.
- 뉴럴 네트워크는 unkown knowledge 를 데이터 기반으로 feature를 스스로 학습할 수 있는 것이 큰 장점
- data가 충분히 있어야 좋음! data가 충분히 있다는 가정이 있으면 성능이 좋음.
- 블랙박스 모델...(문제는 있음) -> 뭐때문에 이렇게됐는지는 잘 모르지만....해석이 잘안됨.. 
- ★ 데이터 충분하고 아키텍쳐가 그것을 충분히 표현할 수 있다는 것이 가장 기본 가정 ★


##### deep learning = representation learning

In [67]:
## 컴파일링

network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy', 
                metrics=['accuracy']) #performance measurement

 ### Preprocessiong
- Convert Input data shape : Before training, we will preprocess our data by reshaping it into the shape that the network expects.

-> 28*28 2D형태이므로 하나의 vector 형태로 만들어주는 전처리가 필요함


- Normalization: Scaling it so that all values are in the `[0, 1]` interval. 

-> 픽셀하나당 0~255 값을 가짐(이미지).


Previously, our training images for instance were stored in an array of shape `(60000, 28, 28)` of type `uint8` with 
values in the `[0, 255]` interval. 

We transform it into a `float32` array of shape `(60000, 28 * 28)` with values between 0 and 1.

In [68]:
train_images = train_images.reshape((60000, 28 * 28)) 
train_images = train_images.astype('float32') / 255 ## 0~255 사이 이므로 255로 scaling 해줌.

##train data를 스케일링 해주었으면 test data도 똑같이 스케일링 해주어야함.

test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

We also need to **categorically encode the labels**, a step which we explain in chapter 3 (multi-class classification):

In [69]:
from keras.utils import to_categorical ### one-hot vector

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

We are now ready to train our network, which in Keras is done via a call to the `fit` method of the network: 
we "fit" the model to its training data.

In [70]:
network.fit(train_images, train_labels, epochs=5, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.callbacks.History at 0x1cb8013d240>

#### epoch: 내 뉴럴네트워크 전체를 트레이닝 하는 수
#### batch_size: 128-> 한번에 feed 되는 샘플의 수는 128개씩 넣을거야 라는 얘기

- 128개 넣어서 그 loss 값을 weight을 한번 업데이트한다.

- 60000/128 : 469 iteration을 해야 one epoch인 것임..

-> 즉 128 배치 기준으로 469 iteration 돌려야 one epoch을 학습한것임~~~~~~

-> epoch 이 다섯개이므로 ,,,,,,,,,,,,, 이 과정을 다섯번해야함..

-> 미니배치 사이즈 설정을 잘해주어야함... 우리는 10개 class 보는 네트워크인데..배치사이즈를 3개로한다? -> 말이안됨.. 3개만보고 weight를 업데이트 하는거니까..그래서 적어도 10개는 되어야함.해당 상황에서는..

Two quantities are being displayed during training: the "loss" of the network over the training data, and the accuracy of the network over 
the training data.

We quickly reach an accuracy of 0.989 (i.e. 98.9%) on the training data. Now let's check that our model performs well on the test set too:

In [71]:
test_loss, test_acc = network.evaluate(test_images, test_labels)



In [72]:
print('test_acc:', test_acc)

test_acc: 0.9821000099182129



Our test set accuracy turns out to be 97.8% -- that's quite a bit lower than the training set accuracy. 
This gap between training accuracy and test accuracy is an example of "**overfitting**", 
the fact that machine learning models tend to perform worse on new data than on their training data. 
Overfitting will be a central topic in chapter 3.

- This concludes our very first example -- you just saw how we could build and a train a neural network to classify handwritten digits, in 
less than 20 lines of Python code. 
- In the next chapter, we will go in detail over every moving piece we just previewed, and clarify what is really 
going on behind the scenes. 
- You will learn about "tensors", the data-storing objects going into the network, about tensor operations, which 
layers are made of, and about gradient descent, which allows our network to learn from its training examples.

### 오버피팅
- 학습에 있어서 중요하지 않은 패턴까지 외워버리는 경우.
- 학습되어야 할 데이터가 적게 주어진 경우 발생할 수 있음.