# Introduction

If you are here to copy and paste a code and learn how to use Keras, Tensorflow and Pytorch or any pre-trained models for transfer learning, this may not a perfect notebook for you. I published [another notebook](https://www.kaggle.com/dardodel/petfinder-keras-inceptionv3-use-images-only) that helps you to use the pre-trained InceptionV3 (or any other available pre-trained models) as the backbone for your transfer learning task. 

During this competition, I learned a lot about the dataset itself and all the facts that we need to consider when we build our model to tackle this problem. I hope you can learn from this notebook and use them in your modeling.

Obviously, one solution to this problem is transfer learning (which I bet the majority of published notebooks followed that path, one in Tensorflow, the other in PyTorch, one with InceptionV3 the other with ResNet101 and so on). If you check the learboard list, you will figure it out that the majority of scores are above 19 or even 21.2 (You will find out where does this number come from in a minute). I'll try to explain why and what the possible reasons are that most participants get this high number.

# Motivation

I got the score of ~18.5 and tried to improve it by testing thousands of options, not limited to but including different pre-trained models, different FC-layer architecture, different type of image augmentation, and other sorts of regularizations. None of them improved the model as I expected. But I found a key fact when I compared the prediction with the actual value. My model was not able to predict the two ends of Pawpularity Spectrum/Distribution. I mean, e.g., the Pawpularity values below 20 and above 80. Before I dive deeper, let's plot the Pawpularity distribution. Have you done that? Plotting the target value distribution is always eye-opening in solving a machine learning problem and helps the scientists to better design the architecture of their model.

# Lesson 1 - Pawpularity Distribution (Median)

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
main_folder = '/kaggle/input/petfinder-pawpularity-score/'

train_image_folder = main_folder + 'train'
train = pd.read_csv(main_folder + 'train.csv')
train['img_fnm'] = train.Id.apply(lambda s: train_image_folder + '/' + s + '.jpg')

In [None]:
# Coarser grid
a = sns.histplot(data=train, x="Pawpularity", binwidth=4) 

In [None]:
# Finer grid
a = sns.histplot(data=train, x="Pawpularity", binwidth=2) 

Yes. The Pawpularity is not very well-distributed. 1) It is skewed to the left and 2) there is a large concentration on Pawpularity=100 as well as Pawpularity less than 4-5. The highest frequency belongs to ~26 < Pawpularity < ~34. 

Now, given this information let's take a look at the results of my model. How does the predicted Pawpularity looks like comparing that of the ground truth.

I load the model that already trained and predict the Pawpularity of the train set. We will find some interesting facts in a moment.

In [None]:
from keras.models import load_model
from keras.preprocessing.image import ImageDataGenerator

target_size = 499

#prediction
test_datagen = ImageDataGenerator(rescale=1/255)

BATCH = 32

# Generator for 2 input
def gen_flow_for_two_inputs_test(datagen, batch, x_train, shuffle=True):
    """
    Args:
        datagen(image.ImageDataGenerator): data generator
        batch(int): batch size 
        x_train: dataframe for input img and metadata
        y_train(np.ndarray): label array for output 
        shuffle(bool): bool to shuffle data
    """
    # Pass index to the 2nd parameter instead of labels
    x_train_2 = x_train.set_index('Id')
    batch = datagen.flow_from_dataframe(x_train, batch_size=batch, shuffle=shuffle, 
                                        x_col='img_fnm', y_col='Id', class_mode = 'raw',
                                        target_size=(target_size, target_size))
    while True:
        batch_image, batch_index = batch.next()
        # Use index values for text(x_text) and labels(y_train)
        yield batch_image, np.zeros(1)


model_dir = '../input/petfinder-inceptionv3-499-64-64-22-1-bestmodel/InceptionV3_499_64_64_22_1.h5' ## Best model
model = load_model(model_dir)



train['Pawpularity_Pred'] = model.predict_generator(
generator = gen_flow_for_two_inputs_test(test_datagen, BATCH, train, shuffle=False), verbose= 1,
steps = np.ceil(train.shape[0] / BATCH))

Let's compare the distribution of actual and predicted Pawpularity:

In [None]:
sns.set(rc = {'figure.figsize':(10,6)})
a = sns.histplot(data=train, x="Pawpularity",      binwidth=4, color = 'blue', alpha=0.4) 
a = sns.histplot(data=train, x="Pawpularity_Pred", binwidth=4, color = 'red' , alpha=0.4)

This is very interesting! Prediction population does not perfectly match with the ground truth. We did not do well for the Pawpularities below ~20 and above ~60, specially for Pawpularity = 100. There is a higher frequency/density around 25-40 for the prediction.

Let's calculate the absolute difference between the actual and predicted numbers:

In [None]:
cols = ['Actual / Prediction', 'Q_1', 'Q_5', 'Q_10', 'Q_25', 'Q_50', 
        'Mean','Q_75', 'Q_90','Q_95' ,'Q_99']
def get_quantiles(data_):
    return [round(np.quantile(data_, q= 0.01),1), 
            round(np.quantile(data_, q= 0.05),1), 
            round(np.quantile(data_, q= 0.10),1),
            round(np.quantile(data_, q= 0.25),1),
            round(np.quantile(data_, q= 0.50),1),
            round(np.mean(data_),1),
            round(np.quantile(data_, q= 0.75),1),
            round(np.quantile(data_, q= 0.90),1),
            round(np.quantile(data_, q= 0.95),1),
            round(np.quantile(data_, q= 0.99),1)]

Q_df = pd.DataFrame(data = [['Pawpularity_Actual'] + get_quantiles(train.Pawpularity)], columns = cols)
Q_df = pd.concat([Q_df, pd.DataFrame([['Pawpularity_Prediction'] + 
                                      get_quantiles(train.Pawpularity_Pred)], columns = cols)])
display(Q_df)

This table clearly shows us that the model tends to spit out the median of Pawpularity for prediction. The lower percentiles of prediction are much higher than that of actual. The higher percentiles of prediction is lower than that of actual. This tells me that the distribution of predicted Pawpularity has a lower variance than that of ground truth (This can be seen from the histogram as well).
Note that how close/identical is the median of prediction and actual. 

There are two potential reasons to explain this: 

1) There are not much or clear differences/clues in the images of pets with very low/high Pawpularity and those in between (around the median). The model cannot learn from the current data, so, it is safe to just give us the expected value of Pawpularity (Median).

2) The number of samples around median is much higher than the tails, so that the model prefers to satisfies them to bring the total loss (MSE) down (Actually, that is why the median is around the middle area since we have more samples over there). In addition, we have relatively fewer samples for the low and high Pawpularity so the model has less chance to learn about them.


Now ...  Interestingly, if you output the median for all samples in the Test set (for submission), you will get a score of 21.2 (This is where that number in the Introduction section comes from)

### Let's explore more ... 

I create a column in the train data frame storing the absolute difference between the actual and prediction. 

In [None]:
train['actual_minus_pred_abs'] = abs(train.Pawpularity - train.Pawpularity_Pred)

I am going to plot this absolute difference across the entire range of Pawpularity. I want to see where this difference is low/high/normal. To reduce the noises and have a clean plot, I bucket the Pawpularities.

In [None]:
bandwidth = 5
train['Pawpularity_group'] = train.Pawpularity.apply(lambda num: (num-1)// bandwidth)
train_groupby = train[['actual_minus_pred_abs', 
                       'Pawpularity_group']].groupby('Pawpularity_group').mean().reset_index()
train_groupby['Pawpularity'] = train_groupby.Pawpularity_group * bandwidth

Finally, plot it ...

In [None]:
sns.set(rc = {'figure.figsize':(10,5)})
ax = sns.barplot(x="Pawpularity", y="actual_minus_pred_abs", data=train_groupby, ci = None)

This plot shows the distribution of errors (losses). Perfectly, aroud the median the loss is very low while it spikes around the two ends. Although we don't have many samples around the two ends, those images impact the Mean_Squared_Error significantly due to the power of 2 in MSE equation. Therefore, if you can find a way to improve your model accuracy for the two ends (without sacrificing the accuracy in the middle range) you will get a way better result. 

### If you have reached up to this point, you probably enjoyed it. Please support this notebook by upvoting. Thanks :-)

Now I am curious that instead of predicting the Pawpularity, how capable the model is to predict the region where pawpularity blongs to. Is it the lower, middle, or upper region. I designed a classification model. This time, I used both the images and the metadata. (Note that, so far, I've only used images for any prediction, see [my other notebook](https://www.kaggle.com/dardodel/petfinder-keras-inceptionv3-use-images-only)). 

The lower region are those with Pawpularity below 20, and upper region are those with Pawpularity higher than 60, then I define a narrow middle region (Pawpularity above 30 and blow 35) just to keep the number of samples in each class (almost) equal. 


In [None]:
train['class_'] = 3
train.class_.loc[train.Pawpularity < 20] = 0  # Lower Region
train.class_.loc[(train.Pawpularity > 30) & (train.Pawpularity < 36)] = 1 # Middle Region
train.class_.loc[train.Pawpularity > 60] = 2  # Upper Region

# keep only these three regions
train_class = train[train.class_.isin([0,1,2])] 

# Plot these regions
a = sns.histplot(data=train_class, x="Pawpularity", binwidth=3)

In [None]:
## How many samples each class has:
train_class[['class_', 'Id']].groupby('class_').count().reset_index()

In [None]:
# Split the data into Train and Test (valid) sets
from sklearn.model_selection import train_test_split
trainset_class, validset_class = \
train_test_split(train_class, test_size = 0.2, random_state=12345)

Define the CNN classifier. Thanks again to [this notebook.](https://www.kaggle.com/genichiroshimizu/keras-multi-imput-image-resnet50-meta-nn)

In [None]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense,concatenate, Dropout, Flatten, Input, BatchNormalization, GlobalAveragePooling2D
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.applications.inception_v3 import InceptionV3
    
target_size = 299 # This lower than what I used in my final model (499)

input1 = Input(shape=(target_size, target_size, 3))
input2 = Input(shape=(12,))
CNN_Model = InceptionV3(input_tensor = input1 , include_top = False, weights = 'imagenet')

x1 = CNN_Model.output
x1 = GlobalAveragePooling2D()(x1)
x1 = Flatten()(x1)
x1 = Dropout(0.25)(x1)
x1 = Dense(32, activation='relu')(x1)
x1 = Model(inputs=input1, outputs=x1)

x2 = Model(inputs=input2, outputs=input2)

combined = concatenate([x1.output, x2.output])
combined = BatchNormalization()(combined)

z = Dropout(0.25)(combined)
z = Dense(3, activation='softmax')(z)

model = Model(inputs=[CNN_Model.input, x2.input], outputs=z)
model.compile(loss='categorical_crossentropy', optimizer=Adam(learning_rate=5e-5), metrics=['accuracy'])
# model.summary()

In [None]:
from keras.utils.np_utils import to_categorical

def gen_flow_for_two_inputs(datagen, batch, x_train, shuffle=True):
    """
    Args:
        datagen(image.ImageDataGenerator): data generator
        batch(int): batch size 
        x_train: dataframe for input img and metadata
        y_train(np.ndarray): label array for output 
        shuffle(bool): bool to shuffle data
    """
    # Pass index to the 2nd parameter instead of labels
    x_train_2 = x_train.set_index('Id')
    batch = datagen.flow_from_dataframe(x_train, batch_size=batch, shuffle=shuffle, 
                                        x_col='img_fnm', y_col='Id', class_mode = 'raw',
                                        target_size=(target_size, target_size))
    while True:
        batch_image, batch_index = batch.next()
        yield [batch_image, 
               x_train_2.loc[batch_index, 
                           ['Subject Focus', 'Eyes', 'Face', 'Near', 'Action', 'Accessory', 
                            'Group', 'Collage', 'Human', 'Occlusion', 'Info', 'Blur']].values], \
                   to_categorical(x_train_2.loc[batch_index, 'class_'], num_classes = 3)

In [None]:
from keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import EarlyStopping

train_datagen = ImageDataGenerator(rescale = 1./255.,
                                   rotation_range = 30,
                                   width_shift_range = 0.2,
                                   height_shift_range = 0.2,
                                   shear_range = 0.1,
                                   zoom_range = 0.3,
                                   horizontal_flip = True)

val_datagen = ImageDataGenerator(rescale = 1./255.)

EPOCH = 5
BATCH = 32

early_stopping =  EarlyStopping(monitor='val_loss', min_delta=1.0, patience=50)

log = model.fit(
    x = gen_flow_for_two_inputs(train_datagen, BATCH, trainset_class),
    steps_per_epoch = np.ceil(trainset_class.shape[0] / BATCH),
    validation_data = gen_flow_for_two_inputs(val_datagen, BATCH, validset_class),
    validation_steps = np.ceil(validset_class.shape[0] / BATCH),
    epochs = EPOCH,
    callbacks=[early_stopping])

I didn't want to put much time and effort to improve this model (which is out of the scope of this notebook), but this is roughly the accuracy that we get from the model. Almost 50% on the Test (or validation) set. Let's get the prediction for the train set.

In [None]:
#prediction
test_datagen = ImageDataGenerator(rescale=1/255)

BATCH = 32

# Generator for 2 input
def gen_flow_for_two_inputs_test(datagen, batch, x_train, shuffle=True):
    """
    Args:
        datagen(image.ImageDataGenerator): data generator
        batch(int): batch size 
        x_train: dataframe for input img and metadata
        y_train(np.ndarray): label array for output 
        shuffle(bool): bool to shuffle data
    """
    # Pass index to the 2nd parameter instead of labels
    x_train_2 = x_train.set_index('Id')
    batch = datagen.flow_from_dataframe(x_train, batch_size=batch, shuffle=shuffle, 
                                        x_col='img_fnm', y_col='Id', class_mode = 'raw',
                                        target_size=(target_size, target_size))
    while True:
        batch_image, batch_index = batch.next()
        # Use index values for text(x_text) and labels(y_train)
        yield [batch_image, 
               x_train_2.loc[batch_index, 
                           ['Subject Focus', 'Eyes', 'Face', 'Near', 'Action', 'Accessory', 
                            'Group', 'Collage', 'Human', 'Occlusion', 'Info', 'Blur']].values], np.zeros(1)
        
pred = model.predict_generator(
generator = gen_flow_for_two_inputs_test(test_datagen, BATCH, trainset_class, shuffle=False), verbose= 1,
steps = np.ceil(trainset_class.shape[0] / BATCH))

trainset_class['predicted_class'] = pred.argmax(axis = 1)

Create the confusion matrix:

In [None]:
y_actu = pd.Series(trainset_class.class_, name='Actual')
y_pred = pd.Series(trainset_class.predicted_class, name='Predicted')
df_confusion = pd.crosstab(y_actu, y_pred)
display(df_confusion)

As expected, this cofusion matrix also confirms that the model struggles to distinguish between the upper/lower regions and the middle one, where the lower region has a higher rate of confusion. Still make sense, because the lower region is closer to the middle region than the upper regions is. 

# Lesson 2 - Image Augmentation (Be Careful)

Note that Pawpularity score is generated using the click rate on the pet image. To me, this score is a combination of Popularity and Eye-catching-ness (Is it a word at all?) of that image. A person first needs to be interested in a pet (cat or dog or the breed, etc), then how eye-catching the image is? How does the background color impact? How does the pet position in the image impact? Is the pet far from the camera or close to it. I call all these features as eye-catching-ness (what is the correct word?!). 

You need to pay a close attention to this fact when you augment the images for generalization. You cannot zoom in or zoom out too much, or rotate the image too much. Maybe by zooming out the image, the pet face gets too small and loses its Pawpularity score because it is no longer as eye-catching (adorable) as before. Shifting horizontally and vertically is probably less impactful. 

Image augmention for classification is slightly different. If you rotate the image of a car or a dog even for 90 degree, it is still a car or a dog, respectively. But this is not true when we deal with this specific problem (Pawpularity score). 

So, there is a sweet spot. The NN models may suffer from no augmentation/generalizations, on the other hand, strong augmentation may impact the Pawpularity score as well.

# Lesson 3 - Optimum Input Image Size

Obviously, the input image size is important. The common image size used for pre-trained CNN models is around 299x299. As long as the run-time lets us (which we don't care that much here), we can choose higher resolutions. I found 399 and 499 produce higher accuracy. Here, we should also find the sweet spot. If you select higher resolution, you may lose the overall accuracy. Larger models have more parameters to learn but we only have around 10,000 samples (overfitting?).

# Lesson 4 - Do I need AveragePooling?

It is a common practice to put an AveragePooling layer after the Conv layers (before the Fully-Connected layers). I found that removing the pooling layer and flattening the final layer of pre-trained models results higher accuracy. Flattening without pooling retains more information (on the other hand increases the number of parameters, so, pay attention to that as well).

# Lesson 5 - Metadata

I know there are any notebooks out there predicting the Pawpularity using the metadata. However, I ,personally, did not benefit from that information. I found that the images by themselves are more powerful in predicting the Pawpularity (no offense)!

### I hope you found this notebook helpful and informative. Please upvote, share it with your friends, and comment your thoughts and feedback. Thank you :)