# Transfer Learning

In the image transfer learning example, we use a pre-trained Inception_V1 model as image feature transformer and train another linear classifier to solve the dogs-vs-cats classification problem.

In this notebook we are going to take a slightly different approach. We will still use a pre-trained Inception_V1 model, but this time we will operate on the pre-trained model to freeze first of a few layers, replace the classifier on the top, then fine tune the whole model.

# Preparation

## Get the dogs-vs-cats datasets

Download the training dataset from https://www.kaggle.com/c/dogs-vs-cats and extract it. 

The following commands copy about 1100 images of cats and dogs into demo/cats and demo/dogs separately. 
```shell
mkdir -p demo/dogs
mkdir -p demo/cats
cp train/cat.7* demo/cats
cp train/dog.7* demo/dogs
```

## Get the pre-trained Inception-V1 model

Download the pre-trained Inception-V1 model from [Zoo](https://s3-ap-southeast-1.amazonaws.com/bigdl-models/imageclassification/imagenet/bigdl_inception-v1_imagenet_0.4.0.model) 
 Alternatively, user may also download pre-trained caffe/Tensorflow/keras model. Please refer to programming guide in  [BigDL](https://bigdl-project.github.io/)

In [12]:
import re

from bigdl.nn.criterion import CrossEntropyCriterion
from pyspark import SparkConf
from pyspark.ml import Pipeline
from pyspark.sql.functions import col, udf
from pyspark.sql.types import DoubleType, StringType

from zoo.common.nncontext import *
from zoo.feature.image import *
from zoo.pipeline.api.keras.layers import Dense, Input, Flatten
from zoo.pipeline.api.keras.models import *
from zoo.pipeline.api.net import *
from zoo.pipeline.nnframes import *

In [13]:
sparkConf = SparkConf().setAppName("ImageTransferLearningExample")
sc = get_nncontext(sparkConf)

In [14]:
from bigdl.dlframes.dl_image_reader import *

In [15]:
model_path = "/home/xiaxue/data/bigdl_inception-v1_imagenet_0.4.0.model"
image_path = "/home/xiaxue/data/demo/"
imageDF = NNImageReader.readImages(image_path, sc)

IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"

In [11]:
getName = udf(lambda row:
                  re.search(r'(cat|dog)\.([\d]*)\.jpg', row[0], re.IGNORECASE).group(0),
                  StringType())
getLabel = udf(lambda name: 1.0 if name.startswith('cat') else 2.0, DoubleType())
labelDF = imageDF.withColumn("name", getName(col("image"))) \
        .withColumn("label", getLabel(col('name')))
(trainingDF, validationDF) = labelDF.randomSplit([0.9, 0.1])

IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"

compose a pipeline that includes feature transform, pretrained model and Logistic Regression

In [6]:
transformer = ChainedPreprocessing(
        [RowToImageFeature(), ImageResize(256, 256), ImageCenterCrop(224, 224),
         ImageChannelNormalize(123.0, 117.0, 104.0), ImageMatToTensor(), ImageFeatureToTensor()])

creating: createRowToImageFeature
creating: createImageResize
creating: createImageCenterCrop
creating: createImageChannelNormalize
creating: createImageMatToTensor
creating: createImageFeatureToTensor
creating: createChainedPreprocessing


In [7]:
preTrainedNNModel = NNModel(Model.loadModel(model_path), transformer) \
        .setFeaturesCol("image") \
        .setPredictionCol("embedding")

AttributeError: type object 'Model' has no attribute 'loadModel'

In [8]:
lrModel = Sequential().add(Linear(1000, 2)).add(LogSoftMax())
classifier = NNClassifier(lrModel, ClassNLLCriterion(), SeqToTensor([1000])) \
        .setLearningRate(0.003).setBatchSize(40).setMaxEpoch(20).setFeaturesCol("embedding")

creating: createZooKerasSequential


NameError: name 'Linear' is not defined

In [9]:
pipeline = Pipeline(stages=[preTrainedNNModel, classifier])

NameError: name 'preTrainedNNModel' is not defined

In [10]:
catdogModel = pipeline.fit(trainingDF)
predictionDF = catdogModel.transform(validationDF).cache()
predictionDF.show()

NameError: name 'pipeline' is not defined

In [11]:
evaluator = MulticlassClassificationEvaluator(
        labelCol="label", predictionCol="prediction", metricName="accuracy")
accuracy = evaluator.evaluate(predictionDF)

NameError: name 'MulticlassClassificationEvaluator' is not defined

expected error should be less than 10%

In [12]:
print("Test Error = %g " % (1.0 - accuracy))

NameError: name 'accuracy' is not defined