In this notebook we will try to answer at the following question:
# Is it possible to build a classifier to identify legendary pokémon?

In [1]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd

pokemon=pd.read_csv("../dataset/pokemon.csv")

We will proceed by building a <b> decision tree </b> considering the sum of the values of the stat, the capture rate, the primary type.<br>
We will train the classifier with all the pokémon from the first to the sixth generation and some seventh-generation pokémon inserted in the subset for the proper functioning of the training.<br>
We will ask the classifier to classify the remaining pokemon. <br> 
For the proper functioning of the training, we must add to the subset a pokémon that has a base stat of 570 like Tapu Koko and a pokémon with base stat of 338 like Crabrawler.

In [20]:
subset=pd.concat([pokemon[pokemon['generation']<=6], 
                  pokemon[pokemon['name']=='Tapu Koko'], 
                  pokemon[pokemon['name']=='Crabrawler'], 
                  pokemon[pokemon['name']=='Crabominable'],
                  pokemon[pokemon['name']=='Oricorio'],
                  pokemon[pokemon['name']=='Wishiwashi'],
                  pokemon[pokemon['name']=='Wimpod'],
                  pokemon[pokemon['name']=='Dhelmise']])

In [21]:
features = ['base_total', 'capture_rate', 'type1']

In [22]:
from sklearn.preprocessing import LabelEncoder

sum_encoder = LabelEncoder()
sum_encoder.fit(subset['base_total'])

rate_encoder = LabelEncoder()
rate_encoder.fit(subset['capture_rate'])

type1_encoder = LabelEncoder()
type1_encoder.fit(subset['type1'])

subset['base_total'] = sum_encoder.transform(subset['base_total'])
subset['capture_rate'] = rate_encoder.transform(subset['capture_rate'])
subset['type1'] = type1_encoder.transform(subset['type1'])

In [23]:
from sklearn import tree

X = subset[features]

In [24]:
Y = subset['is_legendary']

In [25]:
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)

In [26]:
def filter(obj):
    transformed_obj = obj[['base_total','capture_rate','type1']]
    transformed_obj['base_total'] = sum_encoder.transform([transformed_obj['base_total']])[0]
    transformed_obj['capture_rate'] = rate_encoder.transform([transformed_obj['capture_rate']])[0]
    transformed_obj['type1'] = type1_encoder.transform([transformed_obj['type1']])[0]
    return transformed_obj

In [33]:
unknown_mons = pokemon.drop(subset.index)
unknown_mons.set_index('name',inplace=True)
unknown_mons

Unnamed: 0_level_0,abilities,against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,...,percentage_male,pokedex_number,sp_attack,sp_defense,speed,type1,type2,weight_kg,generation,is_legendary
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Rowlet,"['Overgrow', 'Long Reach']",1.00,1.0,1.0,1.0,1.0,0.50,2.00,2.00,1.0,...,88.1,722,50,50,42,grass,flying,1.5,7,0
Dartrix,"['Overgrow', 'Long Reach']",1.00,1.0,1.0,1.0,1.0,0.50,2.00,2.00,1.0,...,88.1,723,70,70,52,grass,flying,16.0,7,0
Decidueye,"['Overgrow', 'Long Reach']",1.00,2.0,1.0,0.5,1.0,0.00,2.00,2.00,2.0,...,88.1,724,100,100,70,grass,ghost,36.6,7,0
Litten,"['Blaze', 'Intimidate']",0.50,1.0,1.0,1.0,0.5,1.00,0.50,1.00,1.0,...,88.1,725,60,40,70,fire,,4.3,7,0
Torracat,"['Blaze', 'Intimidate']",0.50,1.0,1.0,1.0,0.5,1.00,0.50,1.00,1.0,...,88.1,726,80,50,90,fire,,25.0,7,0
Incineroar,"['Blaze', 'Intimidate']",1.00,0.5,1.0,1.0,1.0,2.00,0.50,1.00,0.5,...,88.1,727,80,90,60,fire,dark,83.0,7,0
Popplio,"['Torrent', 'Liquid Voice']",1.00,1.0,1.0,2.0,1.0,1.00,0.50,1.00,1.0,...,88.1,728,66,56,40,water,,7.5,7,0
Brionne,"['Torrent', 'Liquid Voice']",1.00,1.0,1.0,2.0,1.0,1.00,0.50,1.00,1.0,...,88.1,729,91,81,50,water,,17.5,7,0
Primarina,"['Torrent', 'Liquid Voice']",0.50,0.5,0.0,2.0,1.0,0.50,0.50,1.00,1.0,...,88.1,730,126,116,60,water,fairy,44.0,7,0
Pikipek,"['Keen Eye', 'Skill Link', 'Pickup']",0.50,1.0,1.0,2.0,1.0,1.00,1.00,1.00,0.0,...,50.0,731,30,30,65,normal,flying,1.2,7,0


In [35]:
mon = filter(unknown_mons.loc['Popplio']) #try to change the pokémon name!
arr = clf.predict([mon]) 
if arr[0]:
    print('\n legendary')
else:
    print('\n non-legendary')


 no legendary


## Classifier analysis
We build a <b> confusion matrix </b> to evaluate the goodness of our classifier.

In [56]:
tot = len(unknown_mons)
totp = len(unknown_mons[unknown_mons ['is_legendary']==1])
totn = len(unknown_mons[unknown_mons ['is_legendary']==0])

tp = 0
fp = 0
tn = 0
fn = 0

fplist=[]
fnlist=[]

for x in unknown_mons.index:
    pred = clf.predict([filter(unknown_mons.loc[x])])
    truth = unknown_mons.loc[x]['is_legendary']
    if pred[0]:
        if truth:
            tp+=1
        else:
            fp+=1
            fplist.append(x)
            
    else:
        if truth==0:
            tn+=1
        else:
            fn+=1
            fnlist.append(x)
totcp=tp+fp
totcn=tn+fn

print("Total mons: " + str(tot) + ", Total leggendary: " + str(totp) + ", Total non-leggendary: " + str(totn))
print("True positive: " + str(tp) + ", False positive: " + str(fp) + ", Pokémon classified as legendary: " + str(totcp))
print("True negative: " + str(tn) + ", False negative: " + str(fn) + ", Pokémon classified as non_legendary: " + str(totcn))

Total mons: 73, Total leggendary: 16, Total non-leggendary: 57
True positive: 7, False positive: 2, Pokémon classified as legendary: 9
True negative: 55, False negative: 9, Pokémon classified as non_legendary: 64


Classify 2 non-legendary pokémon as legendary is a tolerable mistake because we do not produce a gameplay disinterest towards that species. <br>
Let's see who they are:

In [57]:
fplist

['Type: Null', 'Silvally']

Type: Null and Silvally are unique pokémon obtainable only through a certain event along the SM plot. It's acceptable to classify them as legendary.

In [59]:
from IPython.core.display import display, HTML
display(HTML("<img src='../img/silvally.png' width='300px' height='300px'>"))

Classifying legendary pokémon as common pokémon instead is a more serious mistake because the player could lose interest in particular subquest, post-league plot and pokémon events in the real world. 

In [60]:
fnlist

['Cosmog',
 'Cosmoem',
 'Nihilego',
 'Buzzwole',
 'Pheromosa',
 'Xurkitree',
 'Celesteela',
 'Kartana',
 'Guzzlord']

Cosmog and Cosmoem are false negatives for their base stat. We should refine the classifier to avoid them being considered non-legendary pokémon.<br>
We can see that the remaining false negatives are all UBs. We should consider adding a boolean is_UB field.