d-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 1200px">
</div>

# Transfer Learning

The idea behind transfer learning is to take knowledge from one model doing some task, and transfer it to build another model doing a similar task.

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) In this lesson you:<br>
 - Motivate why transfer learning is a promising frontier for deep learning 
 - Compare transfer learning approaches
 - Perform transfer learning to create an cat vs dog vs founder classifier

In [3]:
%run "./Includes/Classroom-Setup"

### Why Transfer Learning

In 2016, Andrew Ng claimed that transfer learning will be the next driver of commercial machine learning success after supervised learning.  Why?<br><br>

- A fundamental assumption of most machine learning approaches is that you train a model from scratch on a new dataset
- Transfer learning stores knowledge gained from solving one problem on a different, related problem
- More closely resembles human learning

What types of features could be transfered from one task to the next in the following cases?<br><br>

- Image recognition
- Natural language processing
- Speech recognition
- Time series

-sandbox
### Common Pre-Trained Models

Keras exposes a number of deep learning models (architectures) along with pre-trained weights.  They are available in the `tensorflow.keras.applications` package and [the full list is available here.](https://www.tensorflow.org/api_docs/python/tf/keras/applications)  

Transfer learning...<br><br>

- Saves a lot of time and resources over retraining models from scratch
- Are often pre-trained using the ImageNet dataset
- Repurposes earlier layers that encode higher level features (e.g. edges in images)
- Uses custom final layers specific to the new task

Below is a comparison of common reference architectures and pre-trained weights used in transfer learning:

| Network | Year | ImageNet Accuracy | # of Params | Floating Point Operations |
|---------|------|-------------------|-------------|---------------------------|
| AlexNet | 2012 | 84.7% | 62M | 1.5B | 
| VGGNet | 2014 | 92.3% | 138M | 19.6B | 
| Inception | 2014 | 93.3% | 6.4M | 2B | 
| ResNet-152 | 2015 | 95.5% | 60.3M | 11B | 

<img alt="Side Note" title="Side Note" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.05em; transform:rotate(15deg)" src="https://files.training.databricks.com/static/images/icon-note.webp"/> [See this article for a full comparison.](https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96)

-sandbox
### The Implementation Details

We want to make a classifier that distinguishes between Databricks founders and the pets of Databricks employees. To do this, we'll use VGG16, but instead of predicting 1000 classes, we will predict 3 classes (cat, dog, or cofounder).  We have 3 options:<br><br>

1. Use **only the architecture from VGG16**, initialize the weights at random. This is a computationally expensive approach and requires a lot of data as you are training hundreds of millions of weights from scratch.
2. Use **both the architecture from VGG16 and weights** pre-trained on ImageNet.  Leave earlier layers frozen and retrain the later layers.  This is less computationally expensive and still requires a good amount of data.
3. Use both the architecture from VGG16 and the weights, but **freeze the entire network, and add an additional layer.**  In this case, we would only train the final classification layer specific to our problem.  This is fast and works with small amounts of data.

Since our dataset is small and similar to the task VGG16 was trained on, we'll choose option 3.

<img alt="Best Practice" title="Best Practice" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.3em" src="https://files.training.databricks.com/static/images/icon-blue-ribbon.svg"/> Would you want to use a high or low learning rate for transfer learning? Or different learning rates for different layers?

Notice how few images we have in our training data. Will our neural network be able to distinguish between animals and founders?

In [8]:
from pyspark.sql.functions import lit

df_cats = spark.read.format("image").load("/mnt/training/dl/img/cats/*.jpg").withColumn("label", lit("cat"))
df_dogs = spark.read.format("image").load("/mnt/training/dl/img/dogs/*.jpg").withColumn("label", lit("dog"))
df_founder = spark.read.format("image").load("/mnt/training/dl/img/founders/*").withColumn("label", lit("founder"))

display(df_cats)

Do a custom train/test split to make sure there are stratified samples.

In [10]:
cat_data = df_cats.toPandas()
dog_data = df_dogs.toPandas()
founder_data = df_founder.toPandas()

train_data = cat_data.iloc[:5].append(dog_data.iloc[:5]).append(founder_data.iloc[:4])
train_data["path"] = train_data["image"].apply(lambda x: x["origin"].replace("dbfs:/","/dbfs/"))

test_data = cat_data.iloc[5:].append(dog_data.iloc[5:]).append(founder_data.iloc[4:])
test_data["path"] = test_data["image"].apply(lambda x: x["origin"].replace("dbfs:/","/dbfs/"))

print(f"Train data samples: {len(train_data)} \tTest data samples: {len(test_data)}")

Here we use Keras functional API. The Keras functional API is a way to create models that is more flexible than the tf.keras.Sequential API. The functional API can handle models with non-linear topology, models with shared layers, and models with multiple inputs or outputs. The main idea that a deep learning model is usually a directed acyclic graph (DAG) of layers. So the functional API is a way to build graphs of layers.

In [12]:
import tensorflow as tf
from tensorflow.keras import applications
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Model 
from tensorflow.keras.layers import Dense
tf.random.set_seed(42)

img_height = 224
img_width = 224

# Load original model with pretrained weights from imagenet
model = applications.VGG16(weights="imagenet", input_shape=(img_height, img_width, 3))

# Freeze all previous layers
model.trainable = False 
    
# Add custom new layer for 3 classes
x = model.output
predictions = Dense(3, activation="softmax")(x)

model_final = Model(inputs=model.input, outputs=predictions)
model_final.summary()

Now we only have to train 3,003 parameters instead of the 138,360,547 present in our network architecture. 

To train the model, we use [ImageDataGenerator](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator) class. Generators are useful when your data is very large, as you only need to load one batch of data into memroy at a time. In general, ImageDataGenerator is used to configure random transformations and normalization operations to be done on your image data during training, as well as instantiate generators of augmented image batches (and their labels). 

These generators can be used with Keras model methods that accept data generators as inputs. [flow_from_dataframe()](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator#flow_from_dataframe) takes the dataframe and the path to a directory to generates batches.

In [14]:
import mlflow.tensorflow

# Check out the MLflow UI as this runs
mlflow.tensorflow.autolog(every_n_iter=2)

model_final.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

# Loading training data
batch_size = 2
train_datagen = ImageDataGenerator(preprocessing_function=applications.vgg16.preprocess_input)
train_generator = train_datagen.flow_from_dataframe(dataframe=train_data, 
                                                    directory=None, 
                                                    x_col="path", 
                                                    y_col="label", 
                                                    class_mode="categorical", 
                                                    target_size=(img_height, img_width), 
                                                    batch_size=batch_size)

print(f"Class labels: {train_generator.class_indices}")

step_size = train_generator.n//train_generator.batch_size

# Train the model 
model_final.fit(train_generator, epochs=10, steps_per_epoch=step_size, verbose=2)

## Evaluate the Accuracy

In [16]:
# Evaluate model on test set
test_datagen = ImageDataGenerator(preprocessing_function=applications.vgg16.preprocess_input)

# Small dataset so we can evaluate it in one batch
batch_size = 8

test_generator = test_datagen.flow_from_dataframe(
  dataframe=test_data, 
  directory=None, 
  x_col="path", 
  y_col="label", 
  class_mode="categorical", 
  target_size=(img_height, img_width),
  shuffle=False,
  batch_size=batch_size
)

step_size = test_generator.n//test_generator.batch_size

eval_results = model_final.evaluate(test_generator, steps=step_size)
print(f"Loss: {eval_results[0]}. Accuracy: {eval_results[1]}")

## Visualize the Results

In [18]:
import pandas as pd

predictions = pd.DataFrame({
  "Prediction": model_final.predict(test_generator, steps=step_size).argmax(axis=1),
  "True": test_generator.classes,
  "Path": test_data["path"].apply(lambda x: x.replace("/dbfs/", "dbfs:/"))
}).replace({v: k for k, v in train_generator.class_indices.items()})

all_images_df = df_cats.union(df_dogs).union(df_founder).drop("label")
predictions_df = spark.createDataFrame(predictions)

display(all_images_df.join(predictions_df, predictions_df.Path==all_images_df.image.origin))


-sandbox
&copy; 2020 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="http://help.databricks.com/">Support</a>