#### What are you trying to do in this notebook?
In this notebook, we’ll analyze raw images and metadata to predict the “Pawpularity” of pet photos. We'll train and test our model on PetFinder.my's thousands of pet profiles. Winning versions will offer accurate recommendations that will improve animal welfare.

#### Why are you trying it?
PetFinder.my is Malaysia’s leading animal welfare platform, featuring over 180,000 animals with 54,000 happily adopted. PetFinder collaborates closely with animal lovers, media, corporations, and global organizations to improve animal welfare.

Currently, PetFinder.my uses a basic Cuteness Meter to rank pet photos. It analyzes picture composition and other factors compared to the performance of thousands of pet profiles. While this basic tool is helpful, it's still in an experimental stage and the algorithm could be improved.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
import numpy as np
import pandas as pd
import os
from glob import glob
import matplotlib.pyplot as plt
import seaborn as sns
from xgboost import XGBRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split, KFold, cross_val_score
from sklearn.metrics import mean_squared_error

In [None]:
df = pd.read_csv('../input/petfinder-pawpularity-score/train.csv')

In [None]:
print('DataFrame shape:', df.shape)
df.head()

In [None]:
df.describe()

In [None]:
sns.set(rc={'figure.figsize':(15,15), "lines.linewidth": 2.5})
sns.set_style("white")
f, axes = plt.subplots(3, 3)
sns.boxplot(data=df, x='Eyes', y='Pawpularity', ax=axes[0, 0])
sns.boxplot(data=df, x='Face', y='Pawpularity', ax=axes[0, 1])
sns.boxplot(data=df, x='Near', y='Pawpularity', ax=axes[0, 2])
sns.boxplot(data=df, x='Action', y='Pawpularity', ax=axes[1, 0])
sns.boxplot(data=df, x='Face', y='Pawpularity', ax=axes[1, 1])
sns.boxplot(data=df, x='Accessory', y='Pawpularity', ax=axes[1, 2])
sns.boxplot(data=df, x='Collage', y='Pawpularity', ax=axes[2, 0])
sns.boxplot(data=df, x='Human', y='Pawpularity', ax=axes[2, 1])
sns.boxplot(data=df, x='Occlusion', y='Pawpularity', ax=axes[2, 2])
plt.subplots_adjust(wspace = 0.3, hspace = 0.3)
f.show()

In [None]:
sns.set(rc={'figure.figsize':(10,5), "lines.linewidth": 2.5})
sns.distplot(df["Pawpularity"], label="Pawpularity")

In [None]:
df = df.loc[(df["Pawpularity"]<100) & (df["Pawpularity"]>3)]
X = df.iloc[:,1:-1]
y = df.iloc[:,-1]

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
model = XGBRegressor(learning_rate =0.1,
 n_estimators=1000,
 max_depth=5,
 min_child_weight=1,
 gamma=0,
 subsample=0.8,
 colsample_bytree=0.8,
 nthread=4,
 scale_pos_weight=1,
 seed=42)
model.fit(X_train, y_train)

In [None]:
kfold = KFold(n_splits=10, random_state=42)
results = cross_val_score(model, X_train, y_train, cv=kfold)

In [None]:
y_test_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_test_pred, squared=False)
mse

In [None]:
model_dtrgr = DecisionTreeRegressor()
model_dtrgr.fit(X_train, y_train)

In [None]:
kfold_dtrgr = KFold(n_splits=10, random_state=42)
results_dtrgr = cross_val_score(model, X_train, y_train, cv=kfold)

In [None]:
y_test_pred_dtrgr = model_dtrgr.predict(X_test)
mse_dtrgr = mean_squared_error(y_test, y_test_pred_dtrgr, squared=False)
mse_dtrgr

In [None]:
df_test = pd.read_csv('../input/petfinder-pawpularity-score/test.csv')

In [None]:
output = pd.DataFrame(np.asarray([list(df_test['Id']), list(model.predict(df_test.iloc[:,1:]))]).T, columns=['Id', 'Pawpularity'])
output.to_csv('submission.csv', encoding='utf-8', index=False)

In [None]:
output

#### Did it work?
We will be adapted into AI tools that will guide shelters and rescuers around the world to improve the appeal of their pet profiles, automatically enhancing photo quality and recommending composition improvements. As a result, stray dogs and cats can find their "furever" homes much faster. With a little assistance from the Kaggle community, many precious lives could be saved and more happy families created.

#### What did you not understand about this process?
Well, everything provides in the competition data page. I've no problem while working on it. If you guys don't understand the thing that I'll do in this notebook then please comment on this notebook.

#### What else do you think you can try as part of this approach?
In this competition, our task is to predict engagement with a pet's profile based on the photograph for that profile.