In [4]:
#keras 설치
pip install keras

SyntaxError: invalid syntax (3094743530.py, line 2)

In [5]:
import keras
keras.__version__

ModuleNotFoundError: No module named 'keras'

In [None]:
!nvidia-smi

Fri Nov 25 06:18:08 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   66C    P0    31W /  70W |    996MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

# A first look at a neural network

This notebook contains the code samples found in Chapter 2, Section 1 of [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff). Note that the original text features far more content, in particular further explanations and figures: in this notebook, you will only find source code and related comments.

----

<p align="left"><img src="https://drive.google.com/uc?export=download&id=1t0vNrvVcu1dyqiYxAgKdUV7RH-kocAG5" width="800"/></p>




We will now take a look at a **first concrete example of a neural network**, which makes use of the Python library Keras to learn **to classify 
hand-written digits**. 

Unless you already have experience with Keras or similar libraries, you will not understand everything about this 
first example right away. You probably haven't even installed Keras yet. Don't worry, that is perfectly fine. 

In the next chapter, we will 
review each element in our example and explain them in detail. 

So don't worry if some steps seem arbitrary or look like magic to you! 
We've got to start somewhere.

- The problem we are trying to solve here is **to classify grayscale images of handwritten digits (28 pixels by 28 pixels), into their 10 categories (0 to 9)**. 

- **The dataset** we will use is the **MNIST dataset**, a classic dataset in the machine learning community, which has been 
around for almost as long as the field itself and has been very intensively studied. 
- It's a set of **60,000 training images**, plus **10,000 test 
images**, assembled by the National Institute of Standards and Technology (the NIST in MNIST) in the 1980s. 
- You can think of "solving" MNIST 
as the "Hello World" of deep learning -- it's what you do to verify that your algorithms are working as expected. 
- As you become a machine 
learning practitioner, you will see MNIST come up over and over again, in scientific papers, blog posts, and so on.

<p align="left"><img src="https://drive.google.com/uc?export=view&id=1JWgEwdPYlhItjkTX33Ic1o9MvjMdlo0o" width="600"/></p>

The MNIST dataset comes pre-loaded in Keras, in the form of a set of four Numpy arrays:

In [None]:
from tensorflow.keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

## Train dataset, Test dataset, validation dataset
- `train_images` and `train_labels` form the "training set", the data that the model will learn from. 
- The model will then be tested on the 
"test set", `test_images` and `test_labels`. 
- Our images are encoded as Numpy arrays, and the labels are simply an array of digits, ranging 
from 0 to 9. 
- There is a one-to-one correspondence between the images and the labels.

Let's have a look at the training data:

In [None]:
train_images.shape

(60000, 28, 28)

In [None]:
len(train_labels)

60000

In [None]:
train_labels

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

In [None]:
train_images.dtype

dtype('uint8')

Let's have a look at the test data:

In [None]:
test_images.shape

(10000, 28, 28)

In [None]:
len(test_labels)

10000

In [None]:
test_labels

array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)

In [None]:
test_images.dtype

dtype('uint8')

### Our workflow will be as follow: 
- first we will present our neural network with the training data, `train_images` and `train_labels`. 
- The 
network will then learn to associate images and labels. 
- Finally, we will ask the network to produce predictions for `test_images`, and we 
will verify if these predictions match the labels from `test_labels`.

Let's build our network -- again, remember that you aren't supposed to understand everything about this example just yet.

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(256, activation='relu'))
network.add(layers.Dense(128, activation='relu'))
network.add(layers.Dense(10, activation='softmax'))

### Architecture of Deep Neural Networks
- The core building block of neural networks is the "**layer**", a data-processing module which you can conceive as a "filter" for data. 
- Some 
data comes in, and comes out in a more useful form. 
- Precisely, layers extract **representations** out of the data fed into them -- hopefully **representations** that are more meaningful for the problem at hand.  
(<font color="blue">**Layer는 주어진 문제에 더 의미있는 표현(representation)을 입력된 데이터로 부터 추출**</font>)
- Most of deep learning really consists of chaining together simple layers 
which will implement a form of progressive "data distillation".  
(<font color="blue">**대부분의 딥러닝은 간단한 층을 연결하여 구성되고 있고, 점진적으로 데이터를 정제하는 형태를 띠고 있음**</font>)
- A deep learning model is like a sieve for data processing, made of a 
succession of increasingly refined data filters -- the "layers".  
(<font color="blue">**딥러닝 모델은 데이터 정제 필터(층)가 연속되어 있는 데이터 프로세싱을 위한 여과기와 같음**</font>)




**Example**: 
- Here our network consists of a sequence of four `Dense` layers, which are densely-connected (also called "fully-connected") neural layers. 
- The fourth (and last) layer is <U>a 10-way "softmax" layer</U>, which means it will return an array of 10 probability scores (summing to 1). 
- Each score will be the probability that the current digit image belongs to one of our 10 digit classes.


To make our network ready for training, we need to pick three more things, as part of <B>"compilation" step</B>:

* <B>A loss function</B>: this is how the network will be able to measure how good a job it is doing on its training data, and thus how it will be 
able to steer itself in the right direction.  
(훈련 데이터에서 신경망의 성능을 측정하는 방법)
* <B>An optimizer</B>: this is the mechanism through which the network will update itself based on the data it sees and its loss function.  
(입력된 데이터와 손실 함수를 기반으로 네트워크를 업데이트하는 방법)
* <B>Metrics to monitor during training and testing</B>. Here we will only care about accuracy (the fraction of the images that were correctly 
classified).  

The exact purpose of the loss function and the optimizer will be made clear throughout the next two chapters.

In [None]:
network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

### Preprocessiong
- Convert Input data shape : Before training, we will preprocess our data by reshaping it into the shape that the network expects.  
(데이터를 network에 맞는 크기로 바꿈)
- Normalization: Scaling it so that all values are in the `[0, 1]` interval. 

Previously, our training images for instance were stored in an array of shape `(60000, 28, 28)` of type `uint8` with 
values in the <U>`[0, 255]` interval</U>. 

We transform it into a `float32` array of shape `(60000, 28 * 28)` with values <font color="blue"><U>between 0 and 1</U></font>.

즉, 이 예에서는 다음과 같이 세 가지를 변경함:  
- shape: `(60000, 28, 28)` --> `(60000, 28*28)`
- type: `uint8` -->`float32`
- Normalization: `[0, 255]` --> `[0, 1]`

In [None]:
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

We also need to **categorically encode the labels**, a step which we explain in chapter 3 (multi-class classification):

In [None]:
from keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)


We are now ready to train our network, which in Keras is done via a call to <B>the `fit` method</B> of the network: 
we "fit" the model to its training data.  
(케라스에서는 fit 메서드를 호출하여 훈련 데이터에 모델을 학습 시킴)

In [None]:
network.fit(train_images, train_labels, epochs=5, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f63c01546d0>

Two quantities are being displayed during training: 
- the <B>"loss"</B> of the network over the training data
- the <B>accuracy</B> of the network over the training data.

We quickly reach an accuracy of 0.989 (i.e. 98.9%) on the training data.  

<B>Now let's check that our model performs well on the test set too</B>:  
(<font color="blue">evaluate 메서드를 통해 모델의 일반화 성능 (즉, 테스트 데이터셋의 성능)을 체크할 수 있음</font>)

In [None]:
test_loss, test_acc = network.evaluate(test_images, test_labels)  # 디폴트 배치 수 : 32



In [None]:
print('test_acc:', test_acc)

test_acc: 0.9793000221252441



Our test set accuracy turns out to be 97.8% -- that's quite a bit lower than the training set accuracy.   
<font color="blue">This gap between training accuracy and test accuracy is an example of "**overfitting**"</font>, 
the fact that machine learning models tend to perform worse on new data than on their training data. 
Overfitting will be a central topic in chapter 3.

- This concludes our very first example -- you just saw how we could build and a train a neural network to classify handwritten digits, in 
less than 20 lines of Python code. 
- In the next chapter, we will go in detail over every moving piece we just previewed, and clarify what is really 
going on behind the scenes. 
- You will learn about "tensors", the data-storing objects going into the network, about tensor operations, which 
layers are made of, and about gradient descent, which allows our network to learn from its training examples.  
(다음장에서는 <B>Tensor(신경망에 주입하는 데이터의 저장 형태), Tensor operations (층을 만들어 주는 텐서 연산), Gradient descent (신경망을 훈련 샘플로부터 학습시키는 경사하강법)</B>에 대해서 살펴볼 것임

<p align="left"><img src="https://mml.pstatic.net/www/mobile/edit/20240320_1095/upload_17109012439022q3Q5.gif" width="800"/></p>


https://drive.google.com/uc?export=view&id=1JWgEwdPYlhItjkTX33Ic1o9MvjMdlo0o

<p align="left"><img src="https://drive.google.com/uc?export=view&id=1JWgEwdPYlhItjkTX33Ic1o9MvjMdlo0o" width="800"/></p>


https://drive.usercontent.google.com/download?id=1JWgEwdPYlhItjkTX33Ic1o9MvjMdlo0o&export=view&authuser=0

<p align="left"><img src="https://drive.usercontent.google.com/download?id=1JWgEwdPYlhItjkTX33Ic1o9MvjMdlo0o&export=view&authuser=0" width="800"/></p>


https://drive.google.com/uc?export=view&id=1kYZcfREcHsqOtjItmicEXcwByG6UkGhm

<p align="left"><img src="https://drive.google.com/uc?export=view&id=1kYZcfREcHsqOtjItmicEXcwByG6UkGhm" width="800"/></p>