# Category 3 - Cats vs Dogs 분류

* Convolution Neural network 활용한 분류 모델 (Classification)
* tensorflow-datasets 를 활용한 데이터 전처리

Computer Vision with CNNs
<br>
<br>For this exercise you will build a cats v dogs classifier
<br>using the Cats v Dogs dataset from TFDS.
<br>Be sure to use the final layer as shown 
<br>    **(Dense, 2 neurons, softmax activation)**
<br>
<br>The testing infrastructre will **resize all images to 224x224**
<br>with **3 bytes of color depth**. Make sure your input layer trains
<br>images to that specification, or the tests will fail.
<br>
<br>Make sure your output layer is exactly as specified here, or the 
<br>tests will fail.

----------------------------------------
<br>이 연습에서는 cats v dogs 분류기를 만들 것입니다.
TFDS의 Cats v Dogs 데이터 세트 사용.
<br> 그림과 같이 최종 레이어를 사용하십시오
<br> **(Dense, 뉴런 2 개, activation='softmax')**
<br>
<br> 테스트 인프라는 **모든 이미지의 크기를 224x224로 조정합니다(컬러사진)**. 입력 레이어를 확인하십시오

# 1.import

In [1]:
import tensorflow as tf
import tensorflow_datasets as tfds

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dropout, Dense
from tensorflow.keras.models import Sequential
from tensorflow.keras.callbacks import ModelCheckpoint

# 2. Load dataset

* [Cats vs Dogs 데이터셋 문서 보기](https://www.tensorflow.org/datasets/catalog/cats_vs_dogs?hl=ko)

In [2]:
dataset_name = 'cats_vs_dogs'

train_dataset = tfds.load(name=dataset_name, split='train[:80%]')
valid_dataset = tfds.load(name=dataset_name, split='train[80%:]')

[1mDownloading and preparing dataset cats_vs_dogs/4.0.0 (download: 786.68 MiB, generated: Unknown size, total: 786.68 MiB) to /root/tensorflow_datasets/cats_vs_dogs/4.0.0...[0m


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]





0 examples [00:00, ? examples/s]



Shuffling and writing examples to /root/tensorflow_datasets/cats_vs_dogs/4.0.0.incompleteUNWEZP/cats_vs_dogs-train.tfrecord


  0%|          | 0/23262 [00:00<?, ? examples/s]

[1mDataset cats_vs_dogs downloaded and prepared to /root/tensorflow_datasets/cats_vs_dogs/4.0.0. Subsequent calls will reuse this data.[0m


# 3. preprocessing

In [3]:
def preprocess(data):
  # 정의
  x = data['image']
  y = data['label']
  
  # image 정규화
  x = x / 255

  # 사이즈 (224, 224)
  x = tf.image.resize(x, size=(224, 224))

  return x, y

In [4]:
batch_size = 32

In [5]:
train_data = train_dataset.map(preprocess).batch(batch_size)
valid_data = valid_dataset.map(preprocess).batch(batch_size)

# 4. 모델 정의 (Sequential)

1. `input_shape`는 (height, width, color_channel)입니다. cats vs dogs 문제에서는 (224, 224, 3) 이 됩니다.
2. 깊은 출력층과 더 많은 Layer를 쌓습니다.
3. Dense Layer에 `activation='relu'`를 적용합니다.
4. 분류(Classification)의 마지막 층의 출력 숫자는 분류하고자 하는 클래스 갯수와 **같아야** 합니다.

In [7]:
model = Sequential([
                    Conv2D(64, (3, 3), input_shape=(224, 224, 3), activation='relu'),
                    MaxPooling2D(2, 2),
                    Conv2D(64, (3, 3), activation='relu'),
                    MaxPooling2D(2, 2),
                    Conv2D(128, (3, 3), activation='relu'),
                    MaxPooling2D(2, 2),
                    Conv2D(128, (3, 3), activation='relu'),
                    MaxPooling2D(2, 2),
                    Conv2D(256, (3, 3), activation='relu'),
                    MaxPooling2D(2, 2),
                    Flatten(),
                    Dropout(0.5),
                    Dense(512, activation='relu'),
                    Dense(216, activation='relu'),
                    Dense(2, activation='softmax')
])

In [8]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_5 (Conv2D)            (None, 222, 222, 64)      1792      
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 111, 111, 64)      0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 109, 109, 64)      36928     
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 54, 54, 64)        0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 52, 52, 128)       73856     
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 26, 26, 128)       0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 24, 24, 128)       1

# 5. Compile

1. `optimizer`는 가장 최적화가 잘되는 알고리즘인 'adam'을 사용합니다.
2. `loss`설정
  * 출력층 activation이 `sigmoid` 인 경우: `binary_crossentropy`
  * 출력층 activation이 `softmax` 인 경우: 
    * 원핫인코딩(O): `categorical_crossentropy`
    * 원핫인코딩(X): `sparse_categorical_crossentropy`)
3. `metrics`를 'acc' 혹은 'accuracy'로 지정하면, 학습시 정확도를 모니터링 할 수 있습니다.

In [9]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['acc'])

# 6. ModelCheckpoint

In [10]:
checkpoint_path = 'my_checkpoint.ckpt'
checkpoint = ModelCheckpoint(filepath=checkpoint_path,
                             save_weights_only=True,
                             save_best_only=True,
                             monitor='val_loss',
                             verbose=1)

# 7. fit

In [11]:
model.fit(train_data,
          validation_data = (valid_data),
          epochs=20,
          callbacks=[checkpoint])

Epoch 1/20

Epoch 00001: val_loss improved from inf to 0.69182, saving model to my_checkpoint.ckpt
Epoch 2/20

Epoch 00002: val_loss did not improve from 0.69182
Epoch 3/20

Epoch 00003: val_loss improved from 0.69182 to 0.68019, saving model to my_checkpoint.ckpt
Epoch 4/20

Epoch 00004: val_loss improved from 0.68019 to 0.68017, saving model to my_checkpoint.ckpt
Epoch 5/20

Epoch 00005: val_loss improved from 0.68017 to 0.59445, saving model to my_checkpoint.ckpt
Epoch 6/20

Epoch 00006: val_loss improved from 0.59445 to 0.52578, saving model to my_checkpoint.ckpt
Epoch 7/20

Epoch 00007: val_loss improved from 0.52578 to 0.45779, saving model to my_checkpoint.ckpt
Epoch 8/20

Epoch 00008: val_loss improved from 0.45779 to 0.41090, saving model to my_checkpoint.ckpt
Epoch 9/20

Epoch 00009: val_loss did not improve from 0.41090
Epoch 10/20

Epoch 00010: val_loss did not improve from 0.41090
Epoch 11/20

Epoch 00011: val_loss improved from 0.41090 to 0.38241, saving model to my_check

<keras.callbacks.History at 0x7f213bd059d0>

# 8. Load Weights

In [12]:
model.load_weights(checkpoint_path)

<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7f20c214fc90>