In [107]:
%run ../talktools.py

<Figure size 432x288 with 0 Axes>

# Transfer Learning

So far, we've focused on building features and learning models using the data we wish to predict on. This makes a lot of sense for most of our problems: we directly optimize the thing (metric) we care about using the optimization machinery of `keras`, `sklearn`, etc.

However, there are times when we might not want to do this (or dont need to). 

- Perhaps the model training time is very long and we're impatient
- Perhaps the task we're working on is so similar to a task we've already solved
- Perhaps we don't have enough training data to learn a credible model

This is where **transfer learning** comes in. The idea is to use another model to help us solve our current task.

We've already seen this a bit: yesterday we took the intermediate layers of an auto-encoder (trained to get good image reconstruction) to build a random forest model with the bottleneck features. We also used pretrained vectors (word2vec) to featurize documents into vectors (NLP).

There are two regimes where you might use transfer learning:

- data and predictions from another model are very similar to what data you have and want to predict (e.g., use an off-the-shelf 2D convnet model trained on cats and dogs to predict if you image has a cat and dog in it).

- data input is similar to your data input but the predictions are different

Depending on how much training data you have and how similar your problem is you might try different approaches:

<img src="https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2017/05/31112715/finetune1.jpg">
Source: https://www.analyticsvidhya.com/blog/2017/06/transfer-learning-the-art-of-fine-tuning-a-pre-trained-model/

`tensorflow.keras.applications` contain deep learning models that are made available alongside pre-trained weights. These models can be used for prediction, feature extraction, and fine-tuning. Weights are downloaded automatically when instantiating a model. They are stored at `~/.keras/models/.`

Models for image classification with weights trained on ImageNet:
 - Xception
 - VGG16
 - VGG19
 - ResNet, ResNetV2, ResNeXt
 - InceptionV3
 - InceptionResNetV2
 - MobileNet
 - MobileNetV2
 - DenseNet
 - NASNet
 
 E.g., VGG16:
 <img src="https://qph.fs.quoracdn.net/main-qimg-e657c195fc2696c7d5fc0b1e3682fde6">
 
 E.g., MobileNet
 
 <img src="https://cdn-images-1.medium.com/max/1600/1*lrxsPkbVrrIPVmr7jy-noA.png">
 
 The following is a modified version of the workflow in https://towardsdatascience.com/transfer-learning-using-mobilenet-and-keras-c75daf7ff299

In [None]:
%matplotlib inline

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import Image

import tensorflow.keras
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Dense, Activation, Input
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import categorical_crossentropy
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.preprocessing import image
from tensorflow.keras.models import Model
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input, decode_predictions
from tensorflow.keras.layers import Dense,GlobalAveragePooling2D
from tensorflow.keras.applications import mobilenet_v2

from tensorflow.keras.optimizers import Adam

In [None]:
mobile = tensorflow.keras.applications.mobilenet_v2.MobileNetV2()

In [None]:
def prepare_image_file(file):
    img_path = ""
    img = image.load_img(img_path + file, target_size=(224, 224))
    img_array = image.img_to_array(img)
    img_array_expanded_dims = np.expand_dims(img_array, axis=0)
    return tensorflow.keras.applications.mobilenet_v2.preprocess_input(img_array_expanded_dims)

from skimage.color import gray2rgb
from skimage.transform import resize

def prepare_gray_array(arr):
    arr = resize(arr, (224, 224))
    arr = gray2rgb(arr)
    img_array_expanded_dims = np.expand_dims(arr, axis=0)
    return tensorflow.keras.applications.mobilenet_v2.preprocess_input(img_array_expanded_dims)

In [None]:
Image(filename='imgs/German_Shepherd.jpg')

In [None]:
preprocessed_image = prepare_image_file('imgs/German_Shepherd.jpg')
predictions = mobile.predict(preprocessed_image)
results = decode_predictions(predictions)
results

In [None]:
preprocessed_image.shape

How do you think we'll do on fashion MNIST?

In [None]:
from tensorflow.keras.utils import to_categorical

fashion_mnist = tensorflow.keras.datasets.fashion_mnist

nb_classes = 10
batch_size = 32

(x_train, y_train),(x_test, y_test) = fashion_mnist.load_data()
# x_train, x_test = x_train / 255.0, x_test / 255.0  # scale the images to 0-1

x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))  # adapt this if using `channels_first` image data format
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))  # adapt this if using `channels_first` image data format

# convert class vectors to binary class matrices
Y_train =  to_categorical(y_train, nb_classes)
Y_test =  to_categorical(y_test, nb_classes)

input_shape = x_train[0].shape  + (1,)
input_shape
input_img = Input(shape = (28, 28, 1))

In [None]:
plt.imshow((x_train[0,:,:,0]))

In [None]:
arr = x_train[0,:,:,0]
arr_out = prepare_gray_array(arr)
plt.imshow(arr_out[0,:,:,2])

In [None]:
preprocessed_image =prepare_gray_array(arr)
predictions = mobile.predict(preprocessed_image)
results = decode_predictions(predictions)
results

We might still be able to use this to classify our sources

In [None]:
from tensorflow.keras.applications import MobileNetV2

base_model= MobileNetV2(weights='imagenet', include_top=False) #imports the mobilenet model and discards the last 1000 neuron layer.

x=base_model.output
x=GlobalAveragePooling2D()(x)
x=Dense(32,activation='relu')(x) #dense layer 3
preds=Dense(10,activation='softmax')(x) #final layer with softmax activation

model=Model(inputs=base_model.input,outputs=preds)
for i,layer in enumerate(model.layers):
  print(i,layer.name, layer.trainable)

Let's make sure that all the weights are non-trainable. We will only train the last few dense layers.

In [None]:
for layer in model.layers[:156]:
    layer.trainable=False
for layer in model.layers[156:]:
    layer.trainable=True

for i,layer in enumerate(model.layers):
  print(i,layer.name, layer.trainable)

In [None]:
train_x_conv = []
for x in x_train[:3000]:
    arr = resize(x[:,:,0], (224, 224))
    arr = gray2rgb(arr)
    train_x_conv.append(arr)
train_x_conv  = np.array(train_x_conv)

In [None]:
train_x_conv.shape

In [None]:
model.compile(optimizer='Adam',loss='categorical_crossentropy',metrics=['accuracy'])

model.fit(x=train_x_conv, 
          y=Y_train[:3000], 
          epochs=20,
          batch_size=batch_size)

how did we do relative to directly learning a model on the data itself?

## Transfer Learning in NLP

In [1]:
%%html
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">Here are the materials for our <a href="https://twitter.com/NAACLHLT?ref_src=twsrc%5Etfw">@NAACLHLT</a> tutorial on Transfer Learning in NLP with <a href="https://twitter.com/Thom_Wolf?ref_src=twsrc%5Etfw">@Thom_Wolf</a> <a href="https://twitter.com/swabhz?ref_src=twsrc%5Etfw">@swabhz</a> <a href="https://twitter.com/mattthemathman?ref_src=twsrc%5Etfw">@mattthemathman</a>:<br>Slides: <a href="https://t.co/54KVG0K85z">https://t.co/54KVG0K85z</a><br>Colab: <a href="https://t.co/iqWPtVFSVg">https://t.co/iqWPtVFSVg</a><br>Code: <a href="https://t.co/bka5EsuYtP">https://t.co/bka5EsuYtP</a><a href="https://twitter.com/hashtag/NAACLTransfer?src=hash&amp;ref_src=twsrc%5Etfw">#NAACLTransfer</a> <a href="https://t.co/6wPZu9bmc7">pic.twitter.com/6wPZu9bmc7</a></p>&mdash; Sebastian Ruder (@seb_ruder) <a href="https://twitter.com/seb_ruder/status/1135223959828537344?ref_src=twsrc%5Etfw">June 2, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
                            