**Clasificador de Pokémon legendario**

Se importan las librerías necesarias , tanto para el preprocesamiento como para el entrenamiento.

In [38]:
import numpy as np
import pandas as pd
import sklearn
from sklearn.model_selection import  train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

from sklearn.tree import DecisionTreeClassifier

Se importan los datos a utilizar

In [10]:
pokemon = pd.read_csv('Pokemon.csv')
pokemon.head(20)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


Se puede ver que muchos pokemones no tienen tipo 2, entonces esta tabla está llena de NaN. Hay dos opciones, o quitar la columna de tipo 2 porque no aporta nada, o llenar la columna.

También, se puede ver que la columna # se puede quitar.

In [11]:
pokemon = pokemon.drop(['#'], axis=1)

#Opción 1

#pokemon['Type 2'].fillna(pokemon['Type 1'], inplace=True)

#Opción 2

pokemon = pokemon.drop(['Type 2'], axis=1)

In [12]:
pokemon.head(20)

Unnamed: 0,Name,Type 1,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,Bulbasaur,Grass,318,45,49,49,65,65,45,1,False
1,Ivysaur,Grass,405,60,62,63,80,80,60,1,False
2,Venusaur,Grass,525,80,82,83,100,100,80,1,False
3,VenusaurMega Venusaur,Grass,625,80,100,123,122,120,80,1,False
4,Charmander,Fire,309,39,52,43,60,50,65,1,False
5,Charmeleon,Fire,405,58,64,58,80,65,80,1,False
6,Charizard,Fire,534,78,84,78,109,85,100,1,False
7,CharizardMega Charizard X,Fire,634,78,130,111,130,85,100,1,False
8,CharizardMega Charizard Y,Fire,634,78,104,78,159,115,100,1,False
9,Squirtle,Water,314,44,48,65,50,64,43,1,False


Se decide que es mejor hacer drop de la columna de tipo 2 porque no aporta nada tener datos iguales, y ahorra memoria

Ahora, se escogen las características para hacer el clasificador. Por facilidad, se puede condensar las estadísticas de ataque y ataque especial y defensa y defensa especial en una columna de ataque total y defensa total.

In [13]:
pokemon['Ataque_total'] = pokemon['Attack'] + pokemon['Sp. Atk']
pokemon['Defensa_total'] = pokemon['Defense'] + pokemon['Sp. Def']

In [14]:
pokemon.head()

Unnamed: 0,Name,Type 1,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Ataque_total,Defensa_total
0,Bulbasaur,Grass,318,45,49,49,65,65,45,1,False,114,114
1,Ivysaur,Grass,405,60,62,63,80,80,60,1,False,142,143
2,Venusaur,Grass,525,80,82,83,100,100,80,1,False,182,183
3,VenusaurMega Venusaur,Grass,625,80,100,123,122,120,80,1,False,222,243
4,Charmander,Fire,309,39,52,43,60,50,65,1,False,112,93


Ahora se mira las estadísticas de los legendarios y se comparan con respecto a los normales

In [16]:
Stats_legendario= pokemon.loc[pokemon['Legendary']==True]

pd.DataFrame(Stats_legendario)

Unnamed: 0,Name,Type 1,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Ataque_total,Defensa_total
156,Articuno,Ice,580,90,85,100,95,125,85,1,True,180,225
157,Zapdos,Electric,580,90,90,85,125,90,100,1,True,215,175
158,Moltres,Fire,580,90,100,90,125,85,90,1,True,225,175
162,Mewtwo,Psychic,680,106,110,90,154,90,130,1,True,264,180
163,MewtwoMega Mewtwo X,Psychic,780,106,190,100,154,100,130,1,True,344,200
...,...,...,...,...,...,...,...,...,...,...,...,...,...
795,Diancie,Rock,600,50,100,150,100,150,50,6,True,200,300
796,DiancieMega Diancie,Rock,700,50,160,110,160,110,110,6,True,320,220
797,HoopaHoopa Confined,Psychic,600,80,110,60,150,130,70,6,True,260,190
798,HoopaHoopa Unbound,Psychic,680,80,160,60,170,130,80,6,True,330,190


Se puede ver que los legendarios tienen en general mejores estadísticas que los demás (en ataque y en defensa) y esto también se puede ver en la estadística de TOTAL

In [30]:
#Se crea un vector que contiene las estadiísticas principales

caract= pokemon[['Total', 'Ataque_total', 'Defensa_total', 'HP']].to_numpy()


Se crean los vectores de prueba y de entrenamiento

In [31]:
X_train, X_test, y_train, y_test = train_test_split(caract, pokemon['Legendary'], random_state=0)


Se hace la implementación por KNN

In [32]:
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

print('Accuracy of K-NN classifier on training set: {:.2f}'
     .format(knn.score(X_train, y_train)))
print('Accuracy of K-NN classifier on test set: {:.2f}'
     .format(knn.score(X_test, y_test)))

pred = knn.predict(X_test)
print(confusion_matrix(y_test, pred))
print(classification_report(y_test, pred))

Accuracy of K-NN classifier on training set: 0.96
Accuracy of K-NN classifier on test set: 0.95
[[181   1]
 [  9   9]]
              precision    recall  f1-score   support

       False       0.95      0.99      0.97       182
        True       0.90      0.50      0.64        18

    accuracy                           0.95       200
   macro avg       0.93      0.75      0.81       200
weighted avg       0.95      0.95      0.94       200



Se hace la implementación por árbol de decisiones, se cambia el conjunto de pruebas y entrenamiento.

In [46]:
X_train, X_test, y_train, y_test = train_test_split(caract, pokemon['Legendary'], stratify=pokemon['Legendary'], random_state=42)


tree = DecisionTreeClassifier(max_depth=5, random_state=0)
tree.fit(X_train, y_train)

print("Accuracy on training set: {:.3f}".format(tree.score(X_train, y_train)))
print("Accuracy on test set: {:.3f}".format(tree.score(X_test, y_test)))

pred = tree.predict(X_test)
print(confusion_matrix(y_test, pred))
print(classification_report(y_test, pred))

Accuracy on training set: 0.978
Accuracy on test set: 0.960
[[180   4]
 [  4  12]]
              precision    recall  f1-score   support

       False       0.98      0.98      0.98       184
        True       0.75      0.75      0.75        16

    accuracy                           0.96       200
   macro avg       0.86      0.86      0.86       200
weighted avg       0.96      0.96      0.96       200



Implementación por SVM

In [50]:
from sklearn import svm

clf = svm.SVC(kernel='linear')

clf.fit(X_train, y_train)

pred = clf.predict(X_test)

print("Accuracy on training set: {:.3f}".format(clf.score(X_train, y_train)))
print("Accuracy on test set: {:.3f}".format(clf.score(X_test, y_test)))

print(confusion_matrix(y_test, pred))
print(classification_report(y_test, pred))

Accuracy on training set: 0.943
Accuracy on test set: 0.935
[[183   1]
 [ 12   4]]
              precision    recall  f1-score   support

       False       0.94      0.99      0.97       184
        True       0.80      0.25      0.38        16

    accuracy                           0.94       200
   macro avg       0.87      0.62      0.67       200
weighted avg       0.93      0.94      0.92       200

