Hello fellow Kagglers,

This notebook is a naive baseline based on the Pawpularity means of cats and dogs which results in a LB score of 20.19652. This is 0.30922 better than predicting the global mean Pawpularity score, which results in a LB score of 20.50574.

The idea is to use [YOLOV5](https://github.com/ultralytics/yolov5) to classify images as either cat or dog and predict the mean Pawpularity score of that pet for that image. As will be demonstrated, cats and dogs have on average other Pawpularity scores. Dogs have an average Pawpularity score of 42.10, whereas cats only have a Pawpularity score of 34.52, making dogs objectively mote cute.

The takeaway message should be that the pet type, cat/dog, does affect the Pawpularity score. Taking the pet type into account could thus potentially improve the LB score!

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from tqdm.notebook import tqdm
from multiprocessing import cpu_count

import torch
import imageio
import sys

print(f'python version: {sys.version}')
print(f'torch version: {torch.__version__}')

# Test

In [None]:
# Load Test CSV
test = pd.read_csv('/kaggle/input/petfinder-pawpularity-score/test.csv')

In [None]:
# Add File path to test
def get_image_file_path(image_id):
    return f'/kaggle/input/petfinder-pawpularity-score/test/{image_id}.jpg'

test['file_path'] = test['Id'].apply(get_image_file_path)

In [None]:
display(test.head())

In [None]:
display(test.info())

# Train

Use the train DataFrame with a cat/dog classification from [this](https://www.kaggle.com/markwijkhuizen/petfinder-eda-yolov5-obj-detection-tfrecords) notebook.

In [None]:
# Train DataFrame with YOLOV5 info
train = pd.read_pickle('/kaggle/input/petfinder-yolov5-kfold-tfrecords/train.pkl')

In [None]:
display(train.head())

In [None]:
display(train.info())

# Cat/Dog/Unknown Mean Pawpularity Scores

In [None]:
# Cat/Dog/Unknown distribution
display(train['label'].value_counts(normalize=True).to_frame() * 100)

This next cell shows what this notebook is all about. Dogs have on average a Pawpularity score of 42.10, whereas cats only have an average Pawpularity score of 34.52.

In [None]:
# Mean Pawpularity scores per pet
DOG_MEAN = train.loc[train['label'] == 'dog', 'Pawpularity'].mean()
CAT_MEAN = train.loc[train['label'] == 'cat', 'Pawpularity'].mean()
GLOBAL_MEAN = train['Pawpularity'].mean()

# Dogs seem to be more Pawpular than cats on average
print(f'DOG_MEAN: {DOG_MEAN:.2f}, CAT_MEAN: {CAT_MEAN:.2f}, GLOBAL_MEAN: {GLOBAL_MEAN:.2f}')

The next histogram clearly shows cats have a lower Pawpularity score. The most surprising observation is the large amount of dogs having a Pawpularity score of near 100.

In [None]:
# Plot Pawpularity distribution for cats and dogs
plt.figure(figsize=(15, 8))
plt.title('Pawpularity distribution for Cats and Dogs', size=24)
train.loc[train['label'] != 'unknown'].groupby('label')['Pawpularity'].plot(kind='hist', bins=32, alpha=0.50)
plt.legend(prop={'size': 16})
pass

# Load YOLOV5 Offline

In [None]:
# Hacky way of loading YOLOV5 offline, don't try this at home

# Add YOLOV5 master to cache
!cp -R '/kaggle/input/yolov5/torch/root/.cache/torch' '/root/.cache/torch'
# Add Ultralytics (whatever this is) to the config folder
!cp -R '/kaggle/input/yolov5/ultralytics/root/.config/Ultralytics' '/root/.config/Ultralytics'

In [None]:
# Load Best Performing YOLOV5X Model
yolov5x6_model = torch.hub.load('ultralytics/yolov5', 'yolov5x6')

In [None]:
def get_pet_label(file_path):
    # Read Image
    image = imageio.imread(file_path)
    
    # Get YOLOV5 results using Test Time Augmentation for better result
    results = yolov5x6_model(image, augment=True)
    
    # Save info for each pet
    for x1, y1, x2, y2, treshold, label_int in results.xyxy[0].cpu().detach().numpy():
        # Map integer encoded label to label
        label = results.names[int(label_int)]
        # Objects detected are already sorted on confidence, return first cat or dog
        if label in ['cat', 'dog']:
            return label
        
    # Could not detect pet, "unknown" label
    return 'unknown'

# Precitions

In [None]:
# Submission dictionary to create DataFrame from
submission_dict = { 'Id': [], 'Pawpularity': [] }
for pet_id, file_path in tqdm(test[['Id', 'file_path']].itertuples(index=False), total=len(test)):
    submission_dict['Id'].append(pet_id)
    
    # get pet label and assign mean Pawpularity score
    label = get_pet_label(file_path)
    if label == 'cat':
        submission_dict['Pawpularity'].append(CAT_MEAN)
    elif label == 'dog':
        submission_dict['Pawpularity'].append(DOG_MEAN)
    else:
        submission_dict['Pawpularity'].append(GLOBAL_MEAN)

In [None]:
# Create DataFrame from dictionary
submission = pd.DataFrame.from_dict(submission_dict)

In [None]:
display(submission)

In [None]:
# Save submission as CSV
submission.to_csv('submission.csv', index=False)