## **Introduction**

This notebook contains the steps enumerated below for analyzing characteristics of zoo animals and creating classifications.<br> 
Data is available at: https://www.kaggle.com/uciml/zoo-animal-classification/data <br><br>
1. [Import Data & Python Packages](#1-bullet) <br>
2. [Assess Data Quality & Missing Values](#2-bullet)<br>
3. [Exploratory Data Analysis](#3-bullet) <br>


In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))

# Any results you write to the current directory are saved as output.
from sklearn import preprocessing
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(style="white") #white background style for seaborn plots
sns.set(style="whitegrid", color_codes=True)
from sklearn.metrics import accuracy_score

In [None]:
animal=pd.read_csv('../input/zoo.csv')
ani_class=pd.read_csv('../input/class.csv')

**1. Import Data & Python Packages **

In [None]:
animal.head()

In [None]:
animal.tail()

In [None]:
# Check class table for later use.
ani_class

In [None]:
# Check data type for each variable
animal.info()

In [None]:
animal.describe()

In [None]:
print(animal.legs.unique())

In [None]:
# just curious which animal has 5 legs
animal.loc[animal['legs'] == 5]



******3. Exploratory Data Analysis **

In [None]:
# Join animal table and class table to show actual class names
df=pd.merge(animal,ani_class,how='left',left_on='class_type',right_on='Class_Number')
df.head()

In [None]:
plt.hist(df.class_type, bins=7)

In [None]:
# See which class the most zoo animals belong to
sns.factorplot('Class_Type', data=df,kind="count", aspect=2)

In [None]:
# heatmap to show correlations
plt.subplots(figsize=(20,15))
ax = plt.axes()
ax.set_title("Correlation Heatmap")
corr = animal.corr()
sns.heatmap(corr, annot=True,
            xticklabels=corr.columns.values,
            yticklabels=corr.columns.values)

In [None]:
# show vairable correlation which is more than 0.7 (positive or negative)
corr[corr != 1][abs(corr)> 0.7].dropna(how='all', axis=1).dropna(how='all', axis=0)

In [None]:
df.groupby('Class_Type').mean()

It is too obvious that if "milk" exists, then the animal is mammal; if "feathers" exists, then it should be bird.

In [None]:
# checking leg number in each class
g = sns.FacetGrid(df, col="Class_Type")
g.map(plt.hist, "legs")
plt.show()