# <div style="text-align:center; border: 2px solid #FFA500; border-radius: 25px"><span style="color:purple">Pawpularity Prediction</span></div>
In this competition, our goal is to predict the engaggement with a pet's profile based on the appearance of that profile. For example, what kind of pictures are likely to be attracted by someone. (including the pet's name, using props in the picture, using multiple pictures, using accessories etc.)

We are provide with metadata (data about data)
- Image data
- Tabular data


*We want to predict the **Pawpularity score**. We can expect pets with attractive photos to generate more interest and be adopted faster.*
 

# Imports

In [None]:
import warnings
warnings.filterwarnings('ignore')

import random
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.image as mpimg
import matplotlib.pyplot as plt

pd.set_option('display.max_colwidth',None)

loading the `csv` files:

In [None]:
train = pd.read_csv('../input/petfinder-pawpularity-score/train.csv')
test = pd.read_csv('../input/petfinder-pawpularity-score/test.csv')
sample_submission = pd.read_csv('../input/petfinder-pawpularity-score/sample_submission.csv')

In [None]:
train.head(2)

In [None]:
test.head(2)

In [None]:
sample_submission.head(2)

In [None]:
print('train dataset shape: ', train.shape)
print('test dataset shape: ', test.shape)
print('submission dataset shape: ', sample_submission.shape)

There are no `null` in the datasets

In [None]:
print('train dataset Info: ', train.info())
print('test dataset Info: ', test.info())

In [None]:
print('train dataset describe: ', train.describe())
print('\n')
print('test dataset describe: ', test.describe())

Description of the features in the train and test sets:

 Each pet photo is labeled with the value of **1 (Yes)** or **0 (No)** for each of the following features:

- Focus - Pet stands out against uncluttered background, not too close / far.
- Eyes - Both eyes are facing front or near-front, with at least 1 eye / pupil decently clear.
- Face - Decently clear face, facing front or near-front.
- Near - Single pet taking up significant portion of photo (roughly over 50% of photo width or height).
- Action - Pet in the middle of an action (e.g., jumping).
- Accessory - Accompanying physical or digital accessory / prop (i.e. toy, digital sticker), excluding collar and leash.
- Group - More than 1 pet in the photo.
- Collage - Digitally-retouched photo (i.e. with digital photo frame, combination of multiple photos).
- Human - Human in the photo.
- Occlusion - Specific undesirable objects blocking part of the pet (i.e. human, cage or fence). Note that not all blocking objects are considered occlusion.
- Info - Custom-added text or labels (i.e. pet name, description).
- Blur - Noticeably out of focus or noisy, especially for the pet’s eyes and face. For Blur entries, “Eyes” column is always set to 0.

# Exploratory Data Analysis | EDA


#### `Pawpularity` distribution

In [None]:
plt.figure(figsize=(10,5))
sns.set_palette("pastel")
sns.histplot(data=train, x='Pawpularity', kde = True)
plt.axvline(train['Pawpularity'].mean(),c = 'red', ls = '--', lw = 3)

we have no **NULL** values

In [None]:
plt.figure(figsize=(10,5))
sns.heatmap(train.isna(), cbar=False)

### checking for correlation

We can see that there are no noticibly strong correlations but `Eyes` and `Face` also `Collage` and `Info` have a decently strong correlation. 

In [None]:
sns.clustermap(train.corr())

**Positive Correlation**
- Human - Occulasion => (0.63) (if a person appears infront of the photo then a part of the animal's body may be covered.)

**Negative Correlation**
- Blur - Eyes => **(-0.51)**  (When `Blur` is true the `Eyes` is automatically set to zero as stated in the competition's data descriptiion.)
- Near - Group => **(-0.32)**

In [None]:
plt.figure(figsize=(12,8))
sns.heatmap(train.corr(), linewidths = 0.5, linecolor = 'white', annot = True,
           cmap = 'RdYlGn', cbar_kws = {'shrink' : 0.5})

In [None]:
print(train.corr("pearson")['Pawpularity'].sort_values(ascending=False)) 
print("")
print(train.corr("kendall")['Pawpularity'].sort_values(ascending=False))

In [None]:
# sns.pairplot(train)

Counting `0` and `1` distribution for each column in train except `Pawpularity`

In [None]:
plt.figure(figsize=(20,10))

for i in range(train.shape[1] - 3):
    plt.subplot(3, 4, i + 1)
    sns.countplot(data=train, x=train.columns[1:13][i])

### Exploring `Pawpularity`

plotting 4 random image from the `train` folder

In [None]:
rows, cols = 2, 2
fig, axs = plt.subplots(rows, cols, figsize=(12,10))
fig.subplots_adjust(top = 0.99, bottom=0.01, hspace=0.2, wspace=0.4)
for i,ax in zip(train, axs.ravel()):
  random_image = random.randint(0,len(train)-1)
  img = mpimg.imread('../input/petfinder-pawpularity-score/train/'+train['Id'][random_image]+'.jpg')
  ax.imshow(img)
  ax.axis('off')
  ax.set_title(f'Pawpularity: {train["Pawpularity"][random_image]}',{'fontsize': 20})

Plotting images with `Pawpularity == 100`

In [None]:
pawpularity_100 = train[train['Pawpularity'] == 100]

rows, cols = 2, 2
fig, axs = plt.subplots(rows, cols, figsize=(12,10))
fig.subplots_adjust(top = 0.99, bottom=0.01, hspace=0.2, wspace=0.4)
for i,ax in zip(pawpularity_100, axs.ravel()):
  random_image = random.choice(list(pawpularity_100['Id'].index))
  img = mpimg.imread('../input/petfinder-pawpularity-score/train/'+pawpularity_100['Id'][random_image]+'.jpg')
  ax.imshow(img)
  ax.axis('off')
  ax.set_title(f'Pawpularity: {pawpularity_100["Pawpularity"][random_image]}',{'fontsize': 20})

Now lets look at the `1 ~ 0` distribution dor `Pawpularity` score `100`

We can see that the values (1/0) show a difference in general.

In [None]:
colors = ['#c2c2f0','#ffb3e6']

In [None]:
fig, ax = plt.subplots(3,3,figsize=(14,20))

for a in ax.ravel():
    a.set(xticks=[],yticks=[])
    
for r in range(3):
    label = pawpularity_100.columns[r+1]
    count = pawpularity_100[label].value_counts().sort_values()
    for i in [1,0]:
        random_image = random.choice(list(pawpularity_100['Id'].index))
        img =  plt.imread(f'../input/petfinder-pawpularity-score/train/'+pawpularity_100['Id'][random_image]+'.jpg')
        c = 0 if i==1 else 2
        ax[r,c].imshow(img)
        ax[r,c].set_title(f'{label}={i}')
    ax[r, 1].pie(count, labels=[0, 1], autopct='%1.1f%%', wedgeprops = {'linewidth': 3}, colors = colors)
    ax[r, 1].set_title(f'{label}', fontweight='bold', fontsize=20)
        
fig.tight_layout()
fig.show()

`Pawpularity` analysis for animals with poor score

In [None]:
pawpularity_min = train['Pawpularity'].min()

pawpularity_min

In [None]:
pawpularity_minimum = train[train['Pawpularity'] == pawpularity_min]

rows, cols = 1, 2
fig, axs = plt.subplots(rows, cols, figsize=(12,10))
fig.subplots_adjust(top = 0.99, bottom=0.01, hspace=0.2, wspace=0.4)
for i,ax in zip(pawpularity_minimum, axs.ravel()):
  random_image = random.choice(list(pawpularity_minimum['Id'].index))
  img = mpimg.imread('../input/petfinder-pawpularity-score/train/'+pawpularity_minimum['Id'][random_image]+'.jpg')
  ax.imshow(img)
  ax.axis('off')
  ax.set_title(f'Pawpularity: {pawpularity_minimum["Pawpularity"][random_image]}',{'fontsize': 20})