## 5.1 Introduction to convnets (CNNs)

* A simple convnet example

In [None]:
from tensorflow.keras import layers
from tensorflow.keras import models

model = models.Sequential() 
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))  #parameter (3*3+1)*32
model.add(layers.MaxPooling2D((2, 2))) #(2,2)에서 첫번째 2는 
model.add(layers.Conv2D(64, (3, 3), activation='relu')) #parameter는 (3*3*32+1)*64 여기서 잘 봐야하는데 kernel을 3*3을 줬다고 그게 다가 아닌 input shape의 32도 생각해서 kernel하나는 (3*3*32+1)이다
model.add(layers.MaxPooling2D((2, 2))) 
model.add(layers.Conv2D(64, (3, 3), activation='relu')) #(3*3*64+1)*64

* CNN takes as input tensors of shape `(image_height, image_width, image_channels)` (not including the batch dimension).

In [None]:
model.summary() #output shape의 첫 인자가 다 none으로 나와있는데 첫번째 표현은 batch에 관한 것이다

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 3, 3, 64)          36928     
                                                                 
Total params: 55,744
Trainable params: 55,744
Non-traina

* Note that the output of every `Conv2D` and `MaxPooling2D` layer is a 3D tensor of shape `(height, width, channels)`.
* The width and height dimensions tend to shrink as you go deeper in the network.
* The number of channels is controlled by the first argument passed to the `Conv2D` layers (32 or 64).

In [None]:
model.add(layers.Flatten()) #일종의 operation인데 3d tensor를 id tensor로 쭉 나열하겠다는 의미이다
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

In [None]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 3, 3, 64)          36928     
                                                                 
 flatten (Flatten)           (None, 576)               0

* Training the CNN on the MNIST digits

In [None]:
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [None]:
train_images.shape

(60000, 28, 28)

In [None]:
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images.reshape((60000, 28, 28, 1)) #원래는 (60000,28,28)을 (60000,28,28,1)로 바꿔주는 작업이다
train_images = train_images.astype('float32') / 255 #그리고 0부터 255의 정수로 되어있는 데이터 값들을 실수로 변경후 0부터1까지의 값으로 변경시킨다.

test_images = test_images.reshape((10000, 28, 28, 1)) 
test_images = test_images.astype('float32') / 255

train_labels = to_categorical(train_labels) #one-hot encoding실행
test_labels = to_categorical(test_labels)

model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy']) #metrics는 monitoring하는 것을 본다는건데 지금은 accuracy를 보겠다는 의미이다. 
model.fit(train_images, train_labels, epochs=5, batch_size=64) #64라는건 한번 업데이트할때 랜덤하게 뽑은 64개의 데이터를 기준으로 gradient 계산후 업데이트한다는 의미이다.

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fd3d40a5490>

In [None]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
test_acc

> ### The convolution operation

* The difference between a densely connected layer and a convolution layer
  * `Dense` layers learn global patterns in their input feature space.
  * Convolution layers learn local patterns: in the case of images, patterns found in small 2D windows of the inputs.
  
   ><img src="https://drive.google.com/uc?id=15vIiBlYgRb94J-nS7z9_n47W5m5kO-n2" width="300">
  
* Properties of CNNs
  * The patterns they learn are translation invariant.<br>
  translation에 invariant하다는것이 아주 중요한 특징이다<br>
  앞서서 convolutional layer에 equivalient하다는 것은 translation하고 convolution한 결과가 convolution하고 translation한 결과와 같다는 것이다.
  <br>translation invariant하다는건 translation을 하고도 결과가 동일하다는 것이다<br> invariant와 equivalient은 같은말이 아니다<br>convolution은 translation에 equivalient한데 convolution을 여러개 쌓은 convolutional network는 convolution에 invariant합니다.<br>뭔소리야... 8주차 첫번째 강의 20분쯤 다시 보자
  * They can learn spatial hierarchies of patterns.
  
  ><img src="https://drive.google.com/uc?id=163vTpjJeDI7rPPwIcjAFZlK1l7vb83WV" width="400">
  
* Convolutions operate over 3D tensors, called *feature maps*, with two spatial axes (*height* and *width*) as well as *depth* axis (also called the *channels* axis).
  * For an RGB image, the dimension of the depth axis is 3.
  
* The convolution operation extracts patches from its input feature map and applies the same transformation to all of these patches, producing an *output feature map*.
  * The output feature map is still a 3D tensor with user-specified depth.
  * The different channels in that depth axis stand for *filters*.<br>
  convolution하는 필터의 개수로 depth가 정해진다이
  
  ><img src="https://drive.google.com/uc?id=164ODzRvJ43VcwgIRKT6T88sFG4T6Pqs_" width="600">
  
* Convolutions are defined by two key parameters.(두개의 핵심 hyper parameter)
  * *Size of the patches extracted from the inputs*: Typically, 3\*3 or 5\*5(kernel 크기)
  * *Depth of the output feature map* (필터개수 다음레이어의 depth결정)
  
* A convolution works by sliding the windows of size 3\*3 or 5\*5 over the 3D input feature map.
  * At every possible location, it extracts the 3D patch of surrounding features, then transforms (via a tensor product with the same learned weight matrix, called the *convolution kernel*) 3D patch into a 1D vector of shape `(output_depth,)`.
  * All of these vectors are then spatially reassembled into a 3D output map of shape `(height, width, output_depth)`.
  * Every spatial location in the output feature map corresponds to the same location in the input feature map.

  ><img src="https://drive.google.com/uc?id=16I-SdIdrbCbIVIDoSs1pbAoO6Otry9nY" width="600"><br>위의 dot product의 결과는 kernel에 의해서이루어진건데 (3*3)kernel의 필터가 3개 있어서 저런 결과가 나온거겠쬬?<br>input (5,5,2)에 (3,(3,3))인 hyperparameter를 수행해서 (3,3,3)이 나온거다.<br> (3,3,3)에서 앞의 두개의 3은 5-3+1의 연산이 나온거고 제일 뒤의 3은 filter개수의 3에서 나온것이다
  
* Note that the output width and height may differ from the input width and height.
  * Border effects<br>
  여기서의 border effect는 그 stride했을때 버리는 부분 말하는거<br>
  예를 들어 3*3 input에 2*2kernel로 stride를 2를 하면 뒤에 한줄 버리는거 의미
  * The use of *strides*
  

* **Border effects and padding**<br>
그냥 아래는 5\*5의 input을 3\*3으로 convolution을 하면 3\*3이 나오는데 output을 5\*5로 만들려면 zero padding을 해서 맞춘다는것이다

  * Consider a 5\*5 feature map and convolution operation with kernel size 3\*3. Then, the output feature map will be 3\*3.

  ><img src="https://drive.google.com/uc?id=168uo4gVzAYTHs1THFG-mMWYyM4jz5JOw" width="700">
  
  * If you want to get an output feature map with the same spatial dimensions as the input, you can use *padding*.
    * Padding consists of adding an appropriate number of rows and columns on each side of the input feature map.
    
  ><img src="https://drive.google.com/uc?id=16AP8rF498xwxNkjAkTA78Ny7mAOav0xF" width="700">
    
  * In `Conv2D` layers, padding is configurable via the `padding` argument, which takes two values: `valid` and `same`.
  <br>이건 뭔소리냐면 conv2d에 구현되어있는 모델에 padding의 값을 valid와 same으로 설정할수 있는데<br> valid는 padding없이 유효한 영역만 한다는 의미이고<br> same은 input과 output의 크기가 같다는 것을 의미한다.
    
    

* **Convolution strides**

  * The distance between two successive windows is a parameter of the convolution, called its *stride*.
  
  * It is possible to have *strided convolutions*: convolutions with a stride higher than 1.

  ><img src="https://drive.google.com/uc?id=16M-MZjLIS0qpZpzVILlwrYuA8X5xcEq8" width="700">
  
  * Using stride 2 means that the width and height of the feature map are downsampled by a factor of 2.
  
  * To downsample feature maps, we can also use the *max-pooling* operations.
 

> ### The max-pooling operation

* Max pooling consists of extracting windows from the input feature maps and outputting the max value of each channel.
  * It is conceptually similar to convolution, except that instead of transforming local patches via a learned linear transformation (the convolution kernel), they are transformed via a hardcoded `max` tensor operation.
  
* Max pooling is usually done with 2\*2 windows and stride 2, in order to downsample the feature maps by a factor of 2.

* Why downsample feature maps?

In [None]:
model_no_max_pool = models.Sequential() 
model_no_max_pool.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))) 
model_no_max_pool.add(layers.Conv2D(64, (3, 3), activation='relu')) 
model_no_max_pool.add(layers.Conv2D(64, (3, 3), activation='relu'))

model_no_max_pool.summary()

* The reason to use downsampling is to reduce the number of feature-map coefficients to process, as well as induce spatial-filter hierarchies by making successive convolution layers look at increasingly large windows.

* Max pooling is not the only way to downsampling.
  * Strided convolutions, average pooling, etc.

## 5.2 Training a convnet from scratch on a small dataset

* Having to train an image-classification model using very little data is a common situation.

* Here, we will review several strategies to tackle the small dataset problem.
  * Data augmentation
  * Feature extraction with a pretrained network
  * Fine-tuning a pretrained network
  
* The Dogs vs. Cats dataset (https://www.kaggle.com/c/dogs-vs-cats/data)
  * Download URL: https://drive.google.com/uc?id=1AmgANN-SJmCMtLs6CTsZOyY9_W5DVCMT
  * Medium-resolution color JPEGs
  * 25,000 images of dogs and cats (12,500 from each class)
  * We will use a subset of this dataset.
    * A training set with 1,000 samples of each class
    * A validation set with 500 samples of each class
    * A test set with 500 samples of each class
    
* **Load the dataset**

In [None]:
# mount Google Drive
from google.colab import drive
drive.mount('/content/gdrive')

# unzip
import zipfile, os, shutil

dataset = '/content/gdrive/My Drive/deep_learning/deep_learning_lecture/dogs_vs_cats_subset.zip'
dst_path = '/content/dogs_vs_cats_subset'  ##이부분이 가장 중요한데 데이터의 수가 너무 많으면 구글 드라이브에 안들어간다. 그래서 구글 인스턴스인 content에 저장해서 불러오는 방식을 사용한다
dst_file = os.path.join(dst_path, 'dogs_vs_cats_subset.zip')

if not os.path.exists(dst_path):
  os.makedirs(dst_path)

# copy zip file
shutil.copy(dataset, dst_file)
  
with zipfile.ZipFile(dst_file, 'r') as file:
  file.extractall(dst_path)

Mounted at /content/gdrive


In [None]:
%cd /content

/content


In [None]:
!pwd

/content


In [None]:
!ls -al

total 24
drwxr-xr-x 1 root root 4096 Apr 26 03:41 .
drwxr-xr-x 1 root root 4096 Apr 26 03:36 ..
drwxr-xr-x 1 root root 4096 Apr 19 14:22 .config
drwxr-xr-x 4 root root 4096 Apr 26 03:41 dogs_vs_cats_subset
drwx------ 5 root root 4096 Apr 26 03:41 gdrive
drwxr-xr-x 1 root root 4096 Apr 19 14:23 sample_data


In [None]:
%cd dogs_vs_cats_subset/

/content/dogs_vs_cats_subset


In [None]:
!ls -al

total 88752
drwxr-xr-x 4 root root     4096 Apr 26 03:41 .
drwxr-xr-x 1 root root     4096 Apr 26 03:41 ..
-rw------- 1 root root 90863632 Apr 26 03:41 dogs_vs_cats_subset.zip
drwxr-xr-x 3 root root     4096 Apr 26 03:41 __MACOSX
drwxr-xr-x 5 root root     4096 Apr 26 03:41 subset


In [None]:
%cd subset

/content/dogs_vs_cats_subset/subset


In [None]:
!ls -al

total 36
drwxr-xr-x 5 root root  4096 Apr 26 03:41 .
drwxr-xr-x 4 root root  4096 Apr 26 03:41 ..
-rw-r--r-- 1 root root 12292 Apr 26 03:41 .DS_Store
drwxr-xr-x 4 root root  4096 Apr 26 03:41 test
drwxr-xr-x 4 root root  4096 Apr 26 03:41 train
drwxr-xr-x 4 root root  4096 Apr 26 03:41 validation


In [None]:
%cd train

/content/dogs_vs_cats_subset/subset/train


In [None]:
!ls -al

total 92
drwxr-xr-x 4 root root  4096 Apr 26 03:41 .
drwxr-xr-x 5 root root  4096 Apr 26 03:41 ..
drwxr-xr-x 2 root root 36864 Apr 26 03:41 cats
drwxr-xr-x 2 root root 36864 Apr 26 03:41 dogs
-rw-r--r-- 1 root root  8196 Apr 26 03:41 .DS_Store


In [None]:
train_cats_dir = os.path.join(dst_path, 'subset/train/cats')
train_dogs_dir = os.path.join(dst_path, 'subset/train/dogs')

validation_cats_dir = os.path.join(dst_path, 'subset/validation/cats')
validation_dogs_dir = os.path.join(dst_path, 'subset/validation/dogs')

test_cats_dir = os.path.join(dst_path, 'subset/test/cats')
test_dogs_dir = os.path.join(dst_path, 'subset/test/dogs')

print('total training cat images:', len(os.listdir(train_cats_dir)))
print('total training dog images:', len(os.listdir(train_dogs_dir)))

print('total validation cat images:', len(os.listdir(validation_cats_dir)))
print('total validation dog images:', len(os.listdir(validation_dogs_dir)))

print('total test cat images:', len(os.listdir(test_cats_dir)))
print('total test dog images:', len(os.listdir(test_dogs_dir)))

total training cat images: 1000
total training dog images: 1000
total validation cat images: 500
total validation dog images: 500
total test cat images: 500
total test dog images: 500


* **Building the network**
  * Note that we are dealing with bigger images and a more complex problem than MNIST.
  * We will make the network larger.
  * Here, we start from inputs of size 150\*150, and end up with feature maps of size 7\*7 just before the `Flatten` layer.
  * The depth of the feature maps progressively increases in the network, whereas the size of the feature maps decreases. This is a pattern you'll see in almost all CNNs.


In [None]:
from tensorflow.keras import layers 
from tensorflow.keras import models

model = models.Sequential() 
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3))) 
model.add(layers.MaxPooling2D((2, 2))) 
model.add(layers.Conv2D(64, (3, 3), activation='relu')) 
model.add(layers.MaxPooling2D((2, 2))) 
model.add(layers.Conv2D(128, (3, 3), activation='relu')) 
model.add(layers.MaxPooling2D((2, 2))) 
model.add(layers.Conv2D(128, (3, 3), activation='relu')) 
model.add(layers.MaxPooling2D((2, 2))) 
model.add(layers.Flatten()) 
model.add(layers.Dense(512, activation='relu')) 
model.add(layers.Dense(1, activation='sigmoid'))

In [None]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 148, 148, 32)      896       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 74, 74, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 72, 72, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 36, 36, 64)       0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 34, 34, 128)       73856     
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 17, 17, 128)      0

In [None]:
from tensorflow.keras import optimizers

model.compile(loss='binary_crossentropy',
              optimizer=optimizers.RMSprop(learning_rate=1e-4),
              metrics=['acc'])

* **Data preprocessing**

  * Currently, the data is stored on a drive as JPEG files. So we need the following steps:
    * Read the picture files.
    * Decode the JPEG content to RGB grids of pixels.
    * Convert these into floating-point tensors.
    * Rescale the pixel values (between 0 and 255) to the `[0,1]` inverval.
    
  * Keras has a module with image-processing helper tools, located as `keras.preprocesseing.image`.
  
  * In particular, it contains the class `ImageDataGenerator`, which lets us quickly set up Python generators that can automatically turn image files on disk into batches of preprocessed tensors.

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

train_dir = os.path.join(dst_path, 'subset/train')
validation_dir = os.path.join(dst_path, 'subset/validation')

train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(train_dir,
                                                    target_size=(150,150), #100*100이면 키150*150으로 키워주고 200*200이면 150*150으로 줄여준다
                                                    batch_size=20,      #넌 뭐임??
                                                    class_mode='binary') # class mode는 class가 무엇인지 판단하고 label을 어떻게 할지에 대해서이다.
                                                                          #여기서는 개와 고양이 두개의 class라서 0또는 1로 label를 하는것이다.

validation_generator = test_datagen.flow_from_directory(validation_dir, 
                                                        target_size=(150,150),
                                                        batch_size=20,
                                                        class_mode='binary')

Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.


In [None]:
for data_batch, labels_batch in train_generator:
  print('data batch shape:', data_batch.shape)
  print('labels batch shape:', labels_batch.shape)
  break

data batch shape: (20, 150, 150, 3)
labels batch shape: (20,)


In [None]:
labels_batch

array([1., 1., 1., 0., 0., 1., 1., 0., 0., 0., 1., 1., 1., 0., 0., 1., 1.,
       0., 1., 0.], dtype=float32)

In [None]:
print(data_batch[0].shape, data_batch[0])

(150, 150, 3) [[[0.7411765  0.5921569  0.43921572]
  [0.7372549  0.5882353  0.43529415]
  [0.7294118  0.5803922  0.427451  ]
  ...
  [0.5568628  0.5568628  0.40784317]
  [0.5568628  0.5568628  0.40784317]
  [0.5568628  0.5568628  0.40784317]]

 [[0.7176471  0.5686275  0.4156863 ]
  [0.7137255  0.5647059  0.41176474]
  [0.7058824  0.5568628  0.4039216 ]
  ...
  [0.5568628  0.5568628  0.40784317]
  [0.5568628  0.5568628  0.40784317]
  [0.5568628  0.5568628  0.40784317]]

 [[0.70980394 0.5568628  0.4039216 ]
  [0.7019608  0.54901963 0.39607847]
  [0.69803923 0.54509807 0.3921569 ]
  ...
  [0.5568628  0.5568628  0.40784317]
  [0.5568628  0.5568628  0.40784317]
  [0.5568628  0.5568628  0.40784317]]

 ...

 [[0.654902   0.5294118  0.32941177]
  [0.65882355 0.53333336 0.34117648]
  [0.6666667  0.5411765  0.34901962]
  ...
  [0.7294118  0.6156863  0.45098042]
  [0.7294118  0.6156863  0.45098042]
  [0.7294118  0.6156863  0.45098042]]

 [[0.654902   0.5294118  0.32941177]
  [0.65882355 0.5333333

* Let's fit the model to the data using the generator: `fit_generator` method.
  * It expects as its first argument a Python generator that will yield batches of inputs and targets indefinitely.
  * Because the data is being generated endlessly, the Keras model needs to know how many samples to draw from the generator before declaring an epoch over.
    * This is the role of the `steps_per_epoch` argument.
    * In this case, batches are 20 samples, so it will take 100 batches for an epoch.
    * Similarly, `validation_steps` argument is required if you pass a generator as `validation_data`.<br> 여기부분은 다시 들어보자 2차시 28분 먼소리인지 모르겠다

In [None]:
history = model.fit_generator(train_generator,
                              steps_per_epoch=100, # 한 epoch 당 몇번의 iteration을 돌아야하는가
                              epochs=30,
                              validation_data=validation_generator,
                              validation_steps=50)

  """


Epoch 1/30
Epoch 2/30

KeyboardInterrupt: ignored

In [None]:
model.save('cats_and_dogs_subset_1.h5')

In [None]:
import matplotlib.pyplot as plt

acc = history.history['acc'] 
val_acc = history.history['val_acc'] 
loss = history.history['loss'] 
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'bo', label='Training acc') 
plt.plot(epochs, val_acc, 'b', label='Validation acc') 
plt.title('Training and validation accuracy') 
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss') 
plt.plot(epochs, val_loss, 'b', label='Validation loss') 
plt.title('Training and validation loss') 
plt.legend()

plt.show()

* **Data  augmentation**

  * Overfitting is caused by having too few samples to learn from, rendering you unable to train a model that can generalize to new data.

  * Data augmentation takes the approach of generating more training data from existing training samples, by *augmenting* the samples via a number of random transformations.

In [None]:
datagen = ImageDataGenerator(rotation_range=40,
                             width_shift_range=0.2,
                             height_shift_range=0.2,
                             shear_range=0.2,
                             zoom_range=0.2,
                             horizontal_flip=True,
                             fill_mode='nearest')

In [None]:
import matplotlib.pyplot as plt
from tensorflow.keras.preprocessing import image

fnames = [os.path.join(train_cats_dir, fname) for fname in os.listdir(train_cats_dir)]

img_path = fnames[3]

img = image.load_img(img_path, target_size=(150,150))

x = image.img_to_array(img)
x = x.reshape((1,) + x.shape)

i=0
for batch in datagen.flow(x, batch_size=1):
  plt.figure(i)
  imgplot = plt.imshow(image.array_to_img(batch[0]))
  i += 1
  if i%4 == 0: break

In [None]:
model = models.Sequential() 
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3))) 
model.add(layers.MaxPooling2D((2, 2))) 
model.add(layers.Conv2D(64, (3, 3), activation='relu')) 
model.add(layers.MaxPooling2D((2, 2))) 
model.add(layers.Conv2D(128, (3, 3), activation='relu')) 
model.add(layers.MaxPooling2D((2, 2))) 
model.add(layers.Conv2D(128, (3, 3), activation='relu')) 
model.add(layers.MaxPooling2D((2, 2))) 
model.add(layers.Flatten()) 
model.add(layers.Dropout(0.5)) # add dropout
model.add(layers.Dense(512, activation='relu')) 
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer=optimizers.RMSprop(lr=1e-4), 
              metrics=['acc'])

In [None]:
train_datagen = ImageDataGenerator(rescale=1./255,
                                   rotation_range=40,
                                   width_shift_range=0.2,
                                   height_shift_range=0.2,
                                   shear_range=0.2,
                                   zoom_range=0.2,
                                   horizontal_flip=True,
                                   fill_mode='nearest')

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(train_dir,
                                                    target_size=(150,150),
                                                    batch_size=20,
                                                    class_mode='binary')

validation_generator = test_datagen.flow_from_directory(validation_dir,
                                                        target_size=(150,150),
                                                        batch_size=20,
                                                        class_mode='binary')

history = model.fit_generator(train_generator,
                              steps_per_epoch=100,
                              epochs=100,
                              validation_data=validation_generator,
                              validation_steps=50)

In [None]:
model.save('cats_and_dogs_small_2.h5')

In [None]:
import matplotlib.pyplot as plt

acc = history.history['acc'] 
val_acc = history.history['val_acc'] 
loss = history.history['loss'] 
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'bo', label='Training acc') 
plt.plot(epochs, val_acc, 'b', label='Validation acc') 
plt.title('Training and validation accuracy') 
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss') 
plt.plot(epochs, val_loss, 'b', label='Validation loss') 
plt.title('Training and validation loss') 
plt.legend()

plt.show()

## 5.3 Using a pretrained convnet

* A common and highly effective approach of deep learning on small image datasets is to use a pretrained network.
  * A *pretrained network* is a saved network that was previously trained on a large dataset.
  * For instance, you might train a network on ImageNet (where classes are mostly animals and everyday objects) and then repurpose this trained network for identifying furniture items in images.

* Such portability of learned features across different problems is a key advantage of deep learning compared to many other approaches.

* Here, let's consider a large CNN trained on the ImageNet dataset (1.4M labeled images and 1,000 different classes).

* We will use the VGG16 architecture, developed by Karen Simonyan and Andrew Zisserman in 2014.

> ### Feature extraction

* Feature extraction consists of using the representations learned by a previous network to extract interesting features from new samples.

  ><img src="https://drive.google.com/uc?id=16Qbe2uu4I0iR3yCRwEKnaUwLxQ7Sx-uk" width="700">

* Why only reuse the convolutional base? Could we reuse the densely connected classifier as well?

* Note that the level of generality (and therefore reusability) of the representations extracted by specific convolution layers depends on the depth of the layer in the model. 
  * Layers that come earlier in the model extract local, highly generic feature maps (such as visual edges, colors, and textures), whereas layers that are higher up extract more-abstract concepts (such as “cat ear” or “dog eye”).

* The VGG16 model comes prepackaged with Keras.
  * `keras.applications` module
  * Other models: Xception, Inception V3, ResNet 50, ...







In [None]:
from tensorflow.keras.applications import VGG16

conv_base = VGG16(weights='imagenet',
                  include_top=False,
                  input_shape=(150, 150, 3))

In [None]:
conv_base.summary()

* There are two ways we could proceed:
  * Option 1) Extract features using the convolutional base, and then save them on disk.
  * Option 2) Extend the model by adding `Dense` layers on top, and train it.

* **Option 1)**
  * Extracting features using the pretrained convolutional base

In [None]:
import os 
import numpy as np 

from tensorflow.keras.preprocessing.image import ImageDataGenerator

base_dir = '/content/dogs_vs_cats_subset/subset'
train_dir = os.path.join(base_dir, 'train') 
validation_dir = os.path.join(base_dir, 'validation') 
test_dir = os.path.join(base_dir, 'test')

datagen = ImageDataGenerator(rescale=1./255) 
batch_size = 20

def extract_features(directory, sample_count):
  features = np.zeros(shape=(sample_count, 4, 4, 512))
  labels = np.zeros(shape=(sample_count))
  generator = datagen.flow_from_directory(directory,
                                          target_size=(150, 150),
                                          batch_size=batch_size,
                                          class_mode='binary')
  i=0
  for inputs_batch, labels_batch in generator:
    features_batch = conv_base.predict(inputs_batch)
    features[i*batch_size: (i+1)*batch_size] = features_batch
    labels[i*batch_size: (i+1)*batch_size] = labels_batch
    i += 1
    if i*batch_size >= sample_count:
      break
  return features, labels

train_features, train_labels = extract_features(train_dir, 2000)
validation_features, validation_labels = extract_features(validation_dir, 1000)
test_features, test_labels = extract_features(test_dir, 1000)

# reshape
train_features = np.reshape(train_features, (2000, 4*4*512))
validation_features = np.reshape(validation_features, (1000, 4*4*512))
test_features = np.reshape(test_features, (1000, 4*4*512))

In [None]:
from tensorflow.keras import models
from tensorflow.keras import layers
from tensorflow.keras import optimizers

model = models.Sequential() 
model.add(layers.Dense(256, activation='relu', input_dim=4 * 4 * 512)) 
model.add(layers.Dropout(0.5)) 
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(optimizer=optimizers.RMSprop(lr=2e-5), 
              loss='binary_crossentropy', 
              metrics=['acc'])

history = model.fit(train_features, 
                    train_labels,
                    epochs=30, 
                    batch_size=20, 
                    validation_data=(validation_features, validation_labels))

In [None]:
import matplotlib.pyplot as plt

acc = history.history['acc'] 
val_acc = history.history['val_acc'] 
loss = history.history['loss'] 
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'bo', label='Training acc') 
plt.plot(epochs, val_acc, 'b', label='Validation acc') 
plt.title('Training and validation accuracy') 
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss') 
plt.plot(epochs, val_loss, 'b', label='Validation loss') 
plt.title('Training and validation loss') 
plt.legend()

plt.show()

* **Option 2)**
  * Extending the `conv_base` model and running it end to end on the inputs

In [None]:
from tensorflow.keras import models
from tensorflow.keras import layers

model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

In [None]:
model.summary()

* Before you compile and train the model, it is very important to freeze the convolutional base.
  * *Freezing* a layer or a set of layers means preventing their weights from being updated during training.

* In Keras, you freeze a network by setting its `trainable` attribute to `False`.

In [None]:
print('This is the number of trainable weights ' 
      'before freezing the conv base:', len(model.trainable_weights))

In [None]:
conv_base.trainable = False

In [None]:
print('This is the number of trainable weights ' 
      'after freezing the conv base:', len(model.trainable_weights))

In [None]:
train_datagen = ImageDataGenerator(rescale=1./255,
                                   rotation_range=40,
                                   width_shift_range=0.2,
                                   height_shift_range=0.2,
                                   shear_range=0.2,
                                   zoom_range=0.2,
                                   horizontal_flip=True,
                                   fill_mode='nearest')

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(train_dir,
                                                    target_size=(150,150),
                                                    batch_size=20,
                                                    class_mode='binary')

validation_generator = test_datagen.flow_from_directory(validation_dir,
                                                        target_size=(150,150),
                                                        batch_size=20,
                                                        class_mode='binary')

model.compile(loss='binary_crossentropy',
              optimizer=optimizers.RMSprop(lr=1e-4),
              metrics=['acc'])

history = model.fit_generator(train_generator,
                              steps_per_epoch=100,
                              epochs=30,
                              validation_data=validation_generator,
                              validation_steps=50)

> ### Fine-tuning

* Another widely used technique for model reuse is *fine-tuning*.
  * Unfreezing a few of the top layers of a frozen model base, and jointly training both the newly added part of the model and these top layers.

    ><img src="https://drive.google.com/uc?id=16YCVUxDsZ4Qlt5A05jjizSIcOPKVjjJP" width="700">
  
* The steps for fine-tuning a network
  * Add the custom network on top of an already-trained base network.
  * Freeze the base network.
  * Train the part you added.
  * Unfreeze some layers in the base network.
  * Jointly train both these layers and the part you added.

In [None]:
conv_base.summary()

* We will fine-tune the last three convolutional layers.
  * All layers up to `block4_pool` should be frozen.
  
* Why not fine-tune more layers? Why not fine-tune the entire convolutional base?

In [None]:
conv_base.trainable = True

set_trainable = False
for layer in conv_base.layers:
  if layer.name == 'block5_conv1':
    set_trainable = True
  if set_trainable:
    layer.trainable = True
  else:
    layer.trainable = False

In [None]:
model.compile(loss='binary_crossentropy',
              optimizer=optimizers.RMSprop(lr=1e-5), # very low learning rate
              metrics=['acc'])

history = model.fit_generator(train_generator, 
                              steps_per_epoch=100, 
                              epochs=100, 
                              validation_data=validation_generator, 
                              validation_steps=50)

* **Exercise**
  * Evaluate the final model on the test data. 