# <u><center>Adopt a Pet</center></u>
<center>Authored by: Pratham Tripathi</center>

## <u>Aim:</u> 
To predict which pet is accurate when prerequisites are given (like length,Breadth etc). 

## <u>Approach:</u>
The Approach was Supervised Machine Learning Classification model. Here, we used some basic algorithms like K- Nearest Neighbors, Decision Tree Classifier.
### 1. <u>K-Nearest-Neighbors (KNN) :</u> 
The k-nearest neighbors algorithm (k-NN) is a non-parametric method proposed by Thomas Cover used for classification and regression.[1] In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression:

- In kNN classification, the output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.
- In kNN regression, the output is the property value for the object. This value is the average of the values of k nearest neighbors.

### 2. <u>Decision Tree :</u> 
A decision tree is a flowchart-like structure in which each internal node represents a “test” on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes).

# Importing Required Libraries

In [None]:
import itertools
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import NullFormatter
import pylab as py
import matplotlib.ticker as ticker
from sklearn import preprocessing
%matplotlib inline

In [None]:
!pip install pydotplus

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree

In [None]:
import collections
import pydotplus
import matplotlib.image as mpimg

# Reading Data for Model and Prediction

In [None]:
df = pd.read_csv("../input/hackerearth-ml-challenge-pet-adoption/train.csv")
df.head()

In [None]:
test_data = pd.read_csv("../input/hackerearth-ml-challenge-pet-adoption/test.csv")
test_data.head()

# Cleaning The Data

In [None]:
df = df.dropna()
df.head()

In [None]:
df['breed_category'].value_counts()

# Feature and Target Sets

In [None]:
X = df[['length(m)', 'height(cm)',"X1","X2"]]
X[0:5]

In [None]:
y = df["pet_category"]
y[0:5]

# Pre-Processing Data

In [None]:
X = preprocessing.StandardScaler().fit(X).transform(X.astype(float))

# Spliting Data for Testing and Training

In [None]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.3, random_state = 123)

In [None]:
print("Training Set: ",X_train.shape,y_train.shape)
print("Testing Set: ",X_test.shape,y_test.shape)

# K-nearest Neighbor Algorithm

In [None]:
from sklearn.neighbors import KNeighborsClassifier
k = 9
neigh = KNeighborsClassifier(n_neighbors = k).fit(X_train,y_train)
neigh

# Prediction and Evaluation of KNN Model

In [None]:
y_hat = neigh.predict(X_test)
y_hat[0:5]
np.unique(y_hat)

In [None]:
from sklearn import metrics
print("Accuracy Score is : ", metrics.accuracy_score(y_test,y_hat))

In [None]:
ks = 10
mean_acc = np.zeros((ks - 1))
std_acc = np.zeros((ks - 1))
confusion_matrix = []

for n in range(1,ks):
    neigh = KNeighborsClassifier(n_neighbors = n).fit(X_train,y_train)
    yhat = neigh.predict(X_test)
    mean_acc[n-1] = metrics.accuracy_score(y_test,yhat)
    std_acc[n-1] = np.std(yhat == y_test)/np.sqrt(yhat.shape[0])
    
mean_acc

In [None]:
plt.plot(range(1,ks),mean_acc,'g')
plt.fill_between(range(1,ks),mean_acc - 1*std_acc,mean_acc + 1*std_acc,alpha = 1)
plt.legend(["Accuracy", "+/- 3xstd"])
plt.tight_layout()
print("The best accuracy for the model is :",mean_acc.max(),"With k=",mean_acc.argmax()+1)
plt.show()

# Decision Tree 

In [None]:
PetTree = DecisionTreeClassifier(criterion ="entropy", max_depth = 5)
PetTree.fit(X_train,y_train)

# Prediction and Evaluation of Decision Tree

In [None]:
yhat1 = PetTree.predict(X_test) 
print(yhat1[0:5])
print(y_test[0:5])

In [None]:
print("Accuracy Score is : ", metrics.accuracy_score(y_test,yhat1))

# Prediction of test.csv

<p>Since <u>Decision Tree</u> has a better Accuracy here than the KNN model, we are going to choose it as our main model for prediction.</p>

In [None]:
Xtest = test_data[['length(m)', 'height(cm)', 'X1', 'X2']]
Xtest.head()

# Main Prediction using Decision Tree

In [None]:
pred = PetTree.predict(Xtest)

In [None]:
data_feature_names = ['length(m)', 'height(cm)', 'X1', 'X2']

# Visualization of The Decision Tree

In [None]:
# Visualize data
dot_data = tree.export_graphviz(PetTree,
                                feature_names=data_feature_names,
                                out_file=None,
                                filled=True,
                                rounded=True)
graph = pydotplus.graph_from_dot_data(dot_data)

colors = ('turquoise', 'orange')
edges = collections.defaultdict(list)

for edge in graph.get_edge_list():
    edges[edge.get_source()].append(int(edge.get_destination()))

for edge in edges:
    edges[edge].sort()    
    for i in range(2):
        dest = graph.get_node(str(edges[edge][i]))[0]
        dest.set_fillcolor(colors[i])
filename = "tree.png"
graph.write_png(filename)
img = mpimg.imread(filename)
plt.figure(figsize=(100,200))
plt.imshow(img,interpolation = 'nearest')
plt.show()

# Final Output in a csv file

In [None]:
output = pd.DataFrame({'PetId': test_data.pet_id, 'Pet Category': pred})
output.to_csv('Output.csv', index=False)
print("Your submission was successfully saved!")