# Animal Classification

In this notebook, we're going to Predict an animal based on their features , such as fur ,flying ability and etc.







In [1]:
import pandas as pd
import numpy as np

In [2]:
#import visualization library
import plotly.express as px
import seaborn as sns

In [3]:
#import sklearn library
from sklearn.model_selection import train_test_split
import tensorflow as tf

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### dataset: https://www.kaggle.com/datasets/uciml/zoo-animal-classification

This dataset consists of 101 animals from a zoo.
There are 16 variables with various traits to describe the animals.
The 7 Class Types are: Mammal, Bird, Reptile, Fish, Amphibian, Bug and Invertebrate

The purpose for this dataset is to be able to predict the classification of the animals, based upon the variables.
It is the perfect dataset for those who are new to learning Machine Learning.

The data contains the following fields:

1. animal_name: Unique for each instance
2. hair Boolean
3 .feathers Boolean
4. eggs Boolean
5.milk Boolean
6.airborne Boolean
7.aquatic Boolean
8.predator Boolean
9.toothed Boolean
10.backbone Boolean
11.breathes Boolean
12.venomous Boolean
13.fins Boolean
14.legs Numeric (set of values: {0,2,4,5,6,8})
15.tail Boolean
16.domestic Boolean
17.catsize Boolean
18.class_type Numeric (integer values in range [1,7])





In [5]:
data= pd.read_csv("/content/drive/MyDrive/dataset/portofolio/animal_prediction/zoo.csv")

In [6]:
data

Unnamed: 0,animal_name,hair,feathers,eggs,milk,airborne,aquatic,predator,toothed,backbone,breathes,venomous,fins,legs,tail,domestic,catsize,class_type
0,aardvark,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1,1
1,antelope,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1
2,bass,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4
3,bear,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1,1
4,boar,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,wallaby,1,0,0,1,0,0,0,1,1,1,0,0,2,1,0,1,1
97,wasp,1,0,1,0,1,0,0,0,0,1,1,0,6,0,0,0,6
98,wolf,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1
99,worm,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,7


In [7]:
#lets check our data length
print("we have" , str(len(data)) , "rows in our data")

we have 101 rows in our data


In [8]:
data.isna().sum()

animal_name    0
hair           0
feathers       0
eggs           0
milk           0
airborne       0
aquatic        0
predator       0
toothed        0
backbone       0
breathes       0
venomous       0
fins           0
legs           0
tail           0
domestic       0
catsize        0
class_type     0
dtype: int64

**our zoo data is clean** withour missing values.

In [9]:
# lets see class distribution
fig=px.histogram(data,x='class_type',color='class_type',title='Class distribution')
fig.update_layout(bargap=0.1,plot_bgcolor='white',showlegend=False)

fig.show()

we found that class 1 dominated for this dataset

In [32]:
type2 = data.loc[data['class_type']==1]
type2.head(5)

Unnamed: 0,animal_name,hair,feathers,eggs,milk,airborne,aquatic,predator,toothed,backbone,breathes,venomous,fins,legs,tail,domestic,catsize,class_type
0,aardvark,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1,1
1,antelope,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1
3,bear,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1,1
4,boar,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1
5,buffalo,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1


we can assume **class 1** type contains mammals that usually live in the forest.

In [11]:
px.imshow(data.corr())

**class type** was highly correlated to variable **"eggs","aquatic" and "venomous"**

# Splitting data (features and labels)

In [12]:
X = data.iloc[:,:-1]
X.head(4)

Unnamed: 0,animal_name,hair,feathers,eggs,milk,airborne,aquatic,predator,toothed,backbone,breathes,venomous,fins,legs,tail,domestic,catsize
0,aardvark,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1
1,antelope,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1
2,bass,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0
3,bear,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1


In [33]:
y = data.iloc[:,-1:]
y.head(4)

Unnamed: 0,class_type
0,1
1,1
2,4
3,1


In [14]:
print("Feature Data : " ,X.shape)
print("Label Data  :" ,y.shape)

Feature Data :  (101, 17)
Label Data  : (101, 1)


# Split data into Training and Testing

In [15]:
#spliting data into training and testing ,with test size 0.3 
#lets stratify our Y data so we could get the same proportion between testing and training
x_train,x_test,y_train,y_test = train_test_split(X,y,test_size=0.3 ,random_state=64,stratify=y)
print("training data has " , x_train.shape)
print("test data has",x_test.shape)

training data has  (70, 17)
test data has (31, 17)


### Drop animal name column 

In [16]:
#seperate animal name from training data
animal_name_train = x_train['animal_name']
animal_name_test = x_test['animal_name']

x_train = x_train.iloc[:,1:]
x_test = x_test.iloc[:,1:]

print("training data has " , x_train.shape)
print("test data has",x_test.shape)

training data has  (70, 16)
test data has (31, 16)


In [17]:
#some function was replaced in tf2, to fix this ,we'll disable eager execution
tf.compat.v1.disable_eager_execution()

In [18]:
#creating a placeholder for our data 
data_x = tf.compat.v1.placeholder(shape=[None, 16], dtype=tf.float32)
data_y = tf.compat.v1.placeholder(shape=[None, 1], dtype=tf.int32)

## One Hot encoding

In [19]:
y_one_hot = tf.one_hot(data_y,7)

In [20]:
y_one_hot =tf.reshape(y_one_hot,[-1,7])

In [21]:
#create a variable to contain weight and bias randomly
W = tf.Variable(tf.random.normal([16,7],seed=0),name='weight')
b = tf.Variable(tf.random.normal([7],seed=0),name='bias')

In [22]:
'''
    Output = Weight * Input + Bias
    tf.matmul() : for array multiply
    tf.nn.softmax_cross_entropy_with_logits(): for gradient_descent with softmax results(hypothesis).
'''

logits = tf.matmul(data_x ,W) + b

In [23]:
hypotesis = tf.nn.softmax(logits)

In [24]:
# we'll try to reduce the minimum cost
cost_i = tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=y_one_hot)
cost =tf.reduce_mean(cost_i)

In [None]:
#using tf.compat.v1 ,as some function was changed in tf2
import tensorflow.compat.v1 as tf1
tf1.disable_v2_behavior()

In [26]:
#using GradientDescent with 0..05 learning rate
train  = tf1.train.GradientDescentOptimizer(learning_rate=0.05).minimize(cost)

In [27]:
#argmax will return the greatest value ,so our prediction will filled with best value.
prediction = tf.argmax(hypotesis,1)
correct_prediction = tf.equal(prediction,tf.argmax(y_one_hot,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

In [31]:
#train our data with 5000 step
with tf1.Session() as sess:
    sess.run(tf1.global_variables_initializer())
    for step in range(5001):
        sess.run(train, feed_dict={data_x: x_train, data_y: y_train})
        if step % 1000 == 0:
            loss, acc = sess.run([cost, accuracy], feed_dict={data_x: x_train, data_y: y_train})
            print("Step: {:5}\tLoss: {:.3f}\tAcc: {:.2%}".format(step, loss, acc))
            
    train_acc = sess.run(accuracy, feed_dict={data_x: x_train, data_y: y_train})
    test_acc,test_predict,test_correct = sess.run([accuracy,prediction,correct_prediction], feed_dict={data_x: x_test, data_y: y_test})
    print("Model Prediction =", train_acc)
    print("Test Prediction =", test_acc)



Step:     0	Loss: 3.329	Acc: 28.57%
Step:  1000	Loss: 0.128	Acc: 88.57%
Step:  2000	Loss: 0.065	Acc: 91.43%
Step:  3000	Loss: 0.044	Acc: 91.43%
Step:  4000	Loss: 0.034	Acc: 91.43%
Step:  5000	Loss: 0.028	Acc: 91.43%
Model Prediction = 0.9142857
Test Prediction = 0.87096775


for our prediction , we got **model prediction with 91% accuracy** ,and **test prediction with 87% accuracy**

In [34]:
#create a new dataframe to see our predictions better
sub = pd.DataFrame()
sub['Name'] = animal_name_test #calling back animal name that we seperate earlier.
sub['Predict_Type'] = test_predict 
sub['Origin_Type'] = y_test
sub['Correct'] = test_correct
sub.head(5)

Unnamed: 0,Name,Predict_Type,Origin_Type,Correct
76,seasnake,3,3,True
85,starfish,4,7,False
20,dove,2,2,True
82,sole,4,4,True
33,gull,2,2,True


# Summary

We suceesfully created a classification model using Artificial Neural Network.

we got **model prediction with 91% accuracy** ,and **test prediction with 87% accuracy**

using this model ,we could implement it to other dataset that have classification problems.