# Is platypus mammal?

## Summary

* Problem - Research what kind of animal cotegory a platypus is belong to
* Dataset - [Kaggle](https://www.kaggle.com/uciml/zoo-animal-classification)
* Conclusion - platypus is mammal

## Problem


Platypus is known as unique mammal. They have fluffy far and a beak so it sounds like one kind of bird? But they do not have wings but 4 legs. Are they mammal, even if most of mammals do not have a beak? Somehow, they give birth of eggs and they have cloaca which usually is shown in fish, birds, reptiles and amphibians for birth of eggs.

In recent conclusion, mammals are defined as animals which feed milk to grow up babies. Platypus has organs for milking so that they are categorized as one of mammals.

Besides that, I would like to know what category platypus is belong to in terms of data science.

I used `LogisticRegression` to make multinomial classification model and evaluated it with accuracy. This model gives you a result of what animal types an animal is with using animals features such below.

* Whether they have feather
* Whether they have eggs
* Whether they feed milk
* and more

After all I fed features of platypus then conclude what kind of animals they are.

## Dataset

I use a dataset which contains animals and its features downloaded from [Kaggle](https://www.kaggle.com/uciml/zoo-animal-classification)

In [1]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
from sklearn.cluster import DBSCAN, KMeans
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
# import dataset of animals

animals = pd.read_csv('../data/animal.csv')
animal_class = pd.read_csv('../data/class.csv')

In [3]:
# animal name and their biological information
animals.head()

Unnamed: 0,animal_name,hair,feathers,eggs,milk,airborne,aquatic,predator,toothed,backbone,breathes,venomous,fins,legs,tail,domestic,catsize,class_type
0,aardvark,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1,1
1,antelope,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1
2,bass,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4
3,bear,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1,1
4,boar,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1


In [4]:
# contains animal class type for each animal
animal_class

Unnamed: 0,Class_Number,Number_Of_Animal_Species_In_Class,Class_Type,Animal_Names
0,1,41,Mammal,"aardvark, antelope, bear, boar, buffalo, calf,..."
1,2,20,Bird,"chicken, crow, dove, duck, flamingo, gull, haw..."
2,3,5,Reptile,"pitviper, seasnake, slowworm, tortoise, tuatara"
3,4,13,Fish,"bass, carp, catfish, chub, dogfish, haddock, h..."
4,5,4,Amphibian,"frog, frog, newt, toad"
5,6,8,Bug,"flea, gnat, honeybee, housefly, ladybird, moth..."
6,7,10,Invertebrate,"clam, crab, crayfish, lobster, octopus, scorpi..."


`animals` contains animals and its features. I used these features to train mutlinomial `LogisticRegression`. `animal_calss` contains actual categories for each animal.

I would like to add unique features for platypus, which are `cloaca`, `webbed`, `beak`.

* `cloaca` - Organ which defecates eggs, urine and feces. This is usually found among fish, birds, reptiles, amphibians
* `webbed` - Webbed feet. Mostly found among birds, reptiles and amphibians
* `beak` - The one all birds have

In [5]:
# add the columns

cloaca = '00100001100110001010110101100010011000100100000000001000111111110000000101001011101100110011100100001'
animals['cloaca'] = [c for c in cloaca]

webbing = '00000000000000000000010101100000010000000000000000001000001000100000010000010001000000010011000000000'
animals['webbed'] = [w for w in webbing]

beak = '00000000000100001000110100000000010001000100000000000000111100010000000100000011000100010000000100001'
animals['beak'] = [b for b in beak]

In [6]:
animals.head()

Unnamed: 0,animal_name,hair,feathers,eggs,milk,airborne,aquatic,predator,toothed,backbone,...,venomous,fins,legs,tail,domestic,catsize,class_type,cloaca,webbed,beak
0,aardvark,1,0,0,1,0,0,1,1,1,...,0,0,4,0,0,1,1,0,0,0
1,antelope,1,0,0,1,0,0,0,1,1,...,0,0,4,1,0,1,1,0,0,0
2,bass,0,0,1,0,0,1,1,1,1,...,0,1,0,1,0,0,4,1,0,0
3,bear,1,0,0,1,0,0,1,1,1,...,0,0,4,0,0,1,1,0,0,0
4,boar,1,0,0,1,0,0,1,1,1,...,0,0,4,1,0,1,1,0,0,0


In [7]:
# legs column is not binary

animals = pd.get_dummies(data=animals, columns={'legs'})

This dataset already has `platypus`. I removed it and create `platypus_female` and `platypus_male`. This is the testing `DataFrame` and I separated it by gender becuase only male platypus has poisonous claw.

In [8]:
# get a row of platypus and rename the animal_name

platypus_female = animals[animals['animal_name'] == 'platypus']

# rename platypus to platypus female
platypus_female['animal_name'] = 'platypus female'
platypus_female

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  platypus_female['animal_name'] = 'platypus female'


Unnamed: 0,animal_name,hair,feathers,eggs,milk,airborne,aquatic,predator,toothed,backbone,...,class_type,cloaca,webbed,beak,legs_0,legs_2,legs_4,legs_5,legs_6,legs_8
63,platypus female,1,0,1,1,0,1,1,0,1,...,1,1,0,1,0,0,1,0,0,0


In [9]:
# drop platypus row

animals = animals.drop(platypus_female.index)

In [10]:
# make male paltypus row and modify the venomous

platypus_male = platypus_female.copy()

# rename
platypus_male['animal_name'] = 'platypus male'

# male platypus has poison on its claw
platypus_male['venomous'] = 1
platypus_male

Unnamed: 0,animal_name,hair,feathers,eggs,milk,airborne,aquatic,predator,toothed,backbone,...,class_type,cloaca,webbed,beak,legs_0,legs_2,legs_4,legs_5,legs_6,legs_8
63,platypus male,1,0,1,1,0,1,1,0,1,...,1,1,0,1,0,0,1,0,0,0


In [11]:
# reset index
animals.reset_index(inplace=True, drop=True)

In [12]:
# predictors and the label

X = animals.drop(['animal_name', 'class_type'], axis=1)

y = animals['class_type']

In [13]:
X.head()

Unnamed: 0,hair,feathers,eggs,milk,airborne,aquatic,predator,toothed,backbone,breathes,...,catsize,cloaca,webbed,beak,legs_0,legs_2,legs_4,legs_5,legs_6,legs_8
0,1,0,0,1,0,0,1,1,1,1,...,1,0,0,0,0,0,1,0,0,0
1,1,0,0,1,0,0,0,1,1,1,...,1,0,0,0,0,0,1,0,0,0
2,0,0,1,0,0,1,1,1,1,0,...,0,1,0,0,1,0,0,0,0,0
3,1,0,0,1,0,0,1,1,1,1,...,1,0,0,0,0,0,1,0,0,0
4,1,0,0,1,0,0,1,1,1,1,...,1,0,0,0,0,0,1,0,0,0


In [14]:
y.head()

0    1
1    1
2    4
3    1
4    1
Name: class_type, dtype: int64

In [15]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

lr = LogisticRegression(multi_class='multinomial', solver='newton-cg')
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
lr.fit(X_train, y_train)

print('Train score: ', lr.score(X_train, y_train))
print('Test score: ', lr.score(X_test, y_test))

Train score:  1.0
Test score:  0.96


Model `lr` predicts at 96% accuracy with unseen data. I assume this is trustworthy.

### Now let's identify platypus

Remember `Class_Number` of each animal type

In [16]:
animal_class[['Class_Number', 'Class_Type']]

Unnamed: 0,Class_Number,Class_Type
0,1,Mammal
1,2,Bird
2,3,Reptile
3,4,Fish
4,5,Amphibian
5,6,Bug
6,7,Invertebrate


In [17]:
# female platypus is ..
class_type = lr.predict(platypus_female.drop(columns={'animal_name', 'class_type'}))
print('Class_Number: ', class_type[0])
print(f"Platypus (female) is {animal_class[animal_class['Class_Number']==class_type[0]]['Class_Type'][0]}")

Class_Number:  1
Plutypus (female) is Mammal


In [18]:
# male platypus is ...
class_type = lr.predict(platypus_male.drop(columns={'animal_name', 'class_type'}))
print('Class_Number: ', class_type[0])
print(f"Platypus (male) is {animal_class[animal_class['Class_Number']==class_type[0]]['Class_Type'][0]}")

Class_Number:  1
Plutypus (male) is Mammal


## Conclusion

They both got `1` and which means they are mammal. `cloaca`, `webbed feet`, `beak`, and `venomous` are not so strong characteristics that they can determines wheter the animal is mammal

## Note

In terms of data science, they are identified as mammal however this dataset was not very big so it could give different result with bigger dataset.