-
Notifications
You must be signed in to change notification settings - Fork 83
/
03-transfer-learning.Rmd
executable file
·425 lines (324 loc) · 13 KB
/
03-transfer-learning.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
---
title: "Computer vision & CNNs: Transfer Learning"
output:
html_notebook:
toc: yes
toc_float: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
ggplot2::theme_set(ggplot2::theme_bw())
```
In this module, we are going to use a ___pretrained___ CNN model to perform
image classification on our dogs vs. cats images.
Learning objectives:
- Why using pretrained models can be efficient and effective.
- How to perform feature extraction with pretrained models.
- How you can fine tune a pretrained model and run end-to-end.
# Requirements
```{r}
library(keras)
library(glue)
library(ggplot2)
```
# Data
We are working with the same dogs and cats images as before.
```{r image-file-paths}
# define the directories:
if (stringr::str_detect(here::here(), "conf-2020-user")) {
image_dir <- "/home/conf-2020-user/data/dogs-vs-cats"
} else {
image_dir <- here::here("materials", "data", "dogs-vs-cats")
}
train_dir <- file.path(image_dir, "train")
valid_dir <- file.path(image_dir, "validation")
test_dir <- file.path(image_dir, "test")
```
# Transfer Learning
There are _two main_ ways we can apply a pretrained model to perform a CNN
1. __Feature extraction__: Use the convolutional base to do feature engineering
on our images and then feed into a new densely connected classifier.
- Most efficient
- Does not require GPUs
- Does not "personalize" feature extraction to the problem at hand
- Likely leaves room for improvement
2. __Fine tune a pretrained model and run end-to-end__: Build a full sequential
model with the convolutional base and a new densely connected classifier and
train the entire model with some or all of the convolutional base layers
frozen.
- Computationally demanding
- Often requires GPUs
- Tweaks pretrained convolution layers to extract problem-specific features
- Maximize performance
![](images/swapping_fc_classifier.png)
# Transfer Learning: Feature Extraction
## Get our pretrained model
There are several pretrained models available via keras; they all start with
`application_`. Here we'll use the VGG16 model; it is intuitive to understand
the model structure, does a good job with this task, and is good for teaching
purposes.
Best practice for picking pretrained models
- Start with smaller models (i.e. VGG16, VGG19, Resnet34)
- Check out the data the model was built on and see how it aligns to your data.
Most common is [ImageNet](http://image-net.org/about-overview)
Applying pretrained models that are already supplied by keras is simple:
- `weights`: Represents the weights to use. Most pretrained models are built on
imagenet and using these weights tends to do well.
- `include_top`: Whether to include the fully-connected dense classifier. Typically,
we want the classifier to be specific to our problem.
- `input_shape`: The shape of our nputs (150x150 pixel images w/3 color channels).
__Tip__: Check out the [tfhub](https://github.com/rstudio/tfhub) package which
makes it easy to interact with [TensorFlow Hub](https://www.tensorflow.org/hub),
a libarry for publication, discovery, and consumption of resusable models.
```{r pretrained-model}
conv_base <- application_vgg16(
weights = "imagenet",
include_top = FALSE,
input_shape = c(150, 150, 3)
)
```
```{r vgg16-model-structure}
summary(conv_base)
```
## Extracting features using the pretrained convolutional base
This seems a little daunting to understand but this is just implementing more of
a manual approach to what you already have been doing. Here we create a function
that will:
1. Create an empty tensor to hold our transformed features and labels.
2. Create our data batch importing generator. Here we will use batch size of 20
simply because we have 2000 training images and 1000 validation images and
a batch size of 20 divides nicely to both of these values.
3. Loop through our training data:
a. import a batch of images
b. apply the pretrained base to our images to output the predicted features
c. add these new features and the labels to our tensor
d. continue to do this until our number of iterations x batch size equals or
exceeds our total number of samples.
```{r image-generator-feature-extraction}
datagen <- image_data_generator(rescale = 1/255)
batch_size <- 20
extract_features <- function(directory, sample_count) {
features <- array(0, dim = c(sample_count, 4, 4, 512)) # step 1
labels <- array(0, dim = c(sample_count)) # step 1
generator <- flow_images_from_directory( # step 2
directory = directory,
generator = datagen,
target_size = c(150, 150),
batch_size = batch_size,
class_mode = "binary"
)
i <- 0
while (TRUE) { # step 3
message("Processing batch ", i + 1, " of ", ceiling(sample_count / batch_size))
batch <- generator_next(generator) # step 3a
inputs_batch <- batch[[1]]
labels_batch <- batch[[2]]
features_batch <- conv_base %>% predict(inputs_batch) # step 3b
index_range <- ((i * batch_size) + 1):((i + 1) * batch_size)
features[index_range,,,] <- features_batch # step 3c
labels[index_range] <- labels_batch # step 3c
i <- i + 1
if (i * batch_size >= sample_count) break # step 3d
}
list(
features = features,
labels = labels
)
}
```
Let's apply this function to our training, validation, and test data
**Without a GPU this will take approximately 5 minutes to execute**
```{r execute-feature-extraction}
train <- extract_features(train_dir, 2000)
validation <- extract_features(valid_dir, 1000)
test <- extract_features(test_dir, 1000)
```
## Reshape features
The extracted features will be a 4D tensor (samples, 4, 4, 512). We can see this
in the last layer of our `conv_base` model above (`block5_pool (MaxPooling2D)`).
Consequently, we need to reshape (flatten) these into a 2D tensor to feed into
a densely connected classifier. This results in a 2D tensor of size
(samples, 4 * 4 * 512 = 8192).
```{r reshape-features}
reshape_features <- function(features) {
array_reshape(features, dim = c(nrow(features), 4 * 4 * 512))
}
train$features <- reshape_features(train$features)
validation$features <- reshape_features(validation$features)
test$features <- reshape_features(test$features)
dim(train$features)
```
## Define densely connected classifier
We've extracted and flattened our features from the convolution layers so now
we only need to build the densely connected classifier portion of our model.
```{r model-classifier}
model <- keras_model_sequential() %>%
layer_dense(units = 256, activation = "relu", input_shape = 4 * 4 * 512) %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = 1, activation = "sigmoid")
summary(model)
```
Now we can compile and train our model. This will train quickly, taking
approximately 1 min when trained on your local CPU. Our validation loss also
improves over the previous CNN we built.
```{r train-model}
model %>% compile(
optimizer = optimizer_rmsprop(lr = 0.0001),
loss = "binary_crossentropy",
metrics = c("accuracy")
)
history1 <- model %>% fit(
train$features, train$labels,
epochs = 30,
batch_size = 32,
validation_data = list(validation$features, validation$labels),
callbacks = list(
callback_early_stopping(patience = 10),
callback_reduce_lr_on_plateau(patience = 2)
)
)
```
So we acheived a significant decrease in our loss score, increased our accuracy
to 90%, and did so in a fraction of the time!
```{r second-model-results}
best_epoch <- which.min(history1$metrics$val_loss)
best_loss <- history1$metrics$val_loss[best_epoch] %>% round(3)
best_acc <- history1$metrics$val_accuracy[best_epoch] %>% round(3)
glue("Our optimal loss is {best_loss} with an accuracy of {best_acc}")
```
```{r plot-model1-history}
plot(history1) +
scale_x_continuous(limits = c(0, length(history1$metrics$val_loss)))
```
# Transfer Learning: End-to-End
⚠️⚠️ ONLY RUN ON GPU!! ⚠️⚠️
The above approach performed pretty well. However, we can see that we are still
overfitting, which may be reducing model performance. An alternative approach is
to run a pretrained model from end-to-end. This approach is much slower and
computationally intense; however, it offers greater flexibility in using and
adjusting the pretrained model because it lets you:
1. use data augmentation to decrease overfitting (and usually increase model
performance).
2. fine tune parts of the pretrained model.
The following approach simply plugs the pretrained convolution base into a
sequential model but ___freezes___ the convolution base weights.
## Combining a densely-connected neural network with the convolutional base
In this case we can literally plug in our `conv_base` within our model
architecture.
```{r CNN-base-and-classifier}
model <- keras_model_sequential() %>%
conv_base %>%
layer_flatten() %>%
layer_dense(units = 256, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
model
```
## Freezing layers
Before you compile and train the model, it's important to freeze the convolutional
base weights. This prevents the weights from being updated during training. If
you don't do this then the representations found in the pretrained model will be
modified and, potentially, completely destroyed.
```{r freeze-parameters}
cat(length(model$trainable_weights), "trainable weight tensors before freezing.\n")
freeze_weights(conv_base)
cat(length(model$trainable_weights), "trainable weight tensors before freezing.\n")
```
## Training the model end-to-end with a frozen convolutional base
The following trains the model end-to-end using all CNN logic that you have seen
before:
1. data augmentation
2. image generator
3. compile our model
4. train our model
```{r train-end-to-end}
train_datagen = image_data_generator(
rescale = 1/255,
rotation_range = 40,
width_shift_range = 0.2,
height_shift_range = 0.2,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = TRUE,
fill_mode = "nearest"
)
test_datagen <- image_data_generator(rescale = 1/255)
train_generator <- flow_images_from_directory(
train_dir,
train_datagen,
target_size = c(150, 150),
batch_size = 20,
class_mode = "binary"
)
validation_generator <- flow_images_from_directory(
valid_dir,
test_datagen,
target_size = c(150, 150),
batch_size = 20,
class_mode = "binary"
)
model %>% compile(
loss = "binary_crossentropy",
optimizer = optimizer_rmsprop(lr = 1e-5),
metrics = c("accuracy")
)
history2 <- model %>% fit_generator(
train_generator,
steps_per_epoch = 100,
epochs = 30,
validation_data = validation_generator,
validation_steps = 50
)
```
```{r plot-model2-history}
plot(history2)
```
# Transfer Learning: Fine Tune
Another widely used technique for using pretrained models, is to unfreeze a few
of the convolutional base and allow those weights to be updated. Recall that
the early layers in a CNN identify detailed edges and shapes. Later layers put
these edges and shapes together to make higher order parts of the images we are
trying to classify (i.e. cat ears, dog tails).
The more our images deviate from the images used to create the pretrained model,
then the more likely you will want to retrain the last few layers, which will
make the edge and shape features more relevant to your problem.
To fine-tune a pretrained model you:
1. Add your custom network on top of an already-trained base network (executed
in the `CNN-base-and-classifier` code chunk).
2. Freeze the base network (executed in the `freeze-parameters` code chunk).
3. Train the part you added (executed in the `train-end-to-end` code chunk).
4. Unfreeze some layers in the base network.
5. Jointly train both these layers and the part you added.
We already did steps 1-3. The following executes steps 4 and 5.
## Unfreeze some layers in the base network
```{r unfreeze-some-layers}
unfreeze_weights(conv_base, from = "block3_conv1")
```
## Jointly train both these layers and the part you added
```{r fine-tune-model}
model %>% compile(
loss = "binary_crossentropy",
optimizer = optimizer_rmsprop(lr = 1e-5),
metrics = c("accuracy")
)
history2 <- model %>% fit_generator(
train_generator,
steps_per_epoch = 100,
epochs = 100,
validation_data = validation_generator,
validation_steps = 50
)
```
```{r plot-model3-history}
plot(history2)
```
# Takeways
* Pre-trained models can be efficient and effective for problems that align to
common computer vision (and other) tasks.
* Many pre-existing models exist on [TensorFlow Hub](https://www.tensorflow.org/hub)
and are worth researching.
* Three main approaches exists for transfer learning; the decision depends on
your desire to maximize efficiency (compute time) vs effectiveness (accuracy):
- feature extraction
- end-to-end training with existing CNN weights
- end-toend training while fine-tuning latter CNN layer weights
[🏠](https://github.com/rstudio-conf-2020/dl-keras-tf)