Bibliotecas Utilizadas no Código

In [68]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV, KFold
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

Importação do Dataset

In [69]:
ds = pd.read_csv('pokemon.csv')

Filtrando apenas os itens que type1 == "water" ou type1 == "normal"

In [70]:
ds = ds[ds['type1'].isin(['water', 'normal'])]

Separação dos Parâmetros e transformando em matriz com uma coluna para o algoritmo de learn, e as Classes. Logo em seguida separamos o dataset em um conjunto para treinamento e outro para testes, para o final. Utilizei um seed em random_state para conseguir reproduzir e avaliar os resultados, igual ao EP1.

In [71]:
X = ds[['against_electric', 'base_total']]
y = ds['type1']
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

Normalizando os valores

In [72]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Declarando Classificadores

In [73]:
tree_classifier = DecisionTreeClassifier()
logistic_classifier = LogisticRegression()
svm_classifier = SVC()
forest_classifier = RandomForestClassifier()
kfolds = KFold(n_splits=5, shuffle=True, random_state=42)


Procurando os melhores hiperparâmetros para os Classificadores Acima

Parâmetros do Dicision Tree

In [74]:
tree_parameters = {"max_depth": [None, 1, 2, 4, 6, 8, 10], "min_samples_leaf": [1,4,10]}

grid_search = GridSearchCV(tree_classifier, tree_parameters, cv=kfolds)

grid_search.fit(X_train, y_train)

best_parameters_tree = grid_search.best_params_

print(best_parameters_tree)

{'max_depth': 4, 'min_samples_leaf': 4}


Parâmetros do SVM

In [75]:
svm_parameters = {'kernel': ['linear', 'rbf', 'poly'], 'C': [0.1, 1, 4], "degree": [1,3,10]}

grid_search = GridSearchCV(svm_classifier, svm_parameters, cv=kfolds)

grid_search.fit(X_train, y_train)

best_parameters_svm = grid_search.best_params_

print(best_parameters_svm)

{'C': 1, 'degree': 1, 'kernel': 'rbf'}


Parâmetros do Logistic

In [76]:
logistic_parameters ={
    'C': [ 0.001, 0.01, 0.1, 1, 10, 100]
}
grid_search = GridSearchCV(logistic_classifier, logistic_parameters, cv=kfolds)

grid_search.fit(X_train, y_train)

best_parameters_logistic = grid_search.best_params_

print(best_parameters_logistic)

{'C': 1}


Parâmetros do Random Forest

In [77]:
forest_parameters = {
    'n_estimators': [100, 200, 300],
    'max_depth': [None, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

grid_search = GridSearchCV(forest_classifier, forest_parameters, cv=kfolds)

grid_search.fit(X_train, y_train)

best_parameters_forest = grid_search.best_params_

print(best_parameters_forest)

{'max_depth': 5, 'min_samples_leaf': 2, 'n_estimators': 100}


Obtendo os Novos Classificadores com os Melhores Parâmetros 

In [78]:
tree_classifier = DecisionTreeClassifier(**best_parameters_tree)
logistic_classifier = LogisticRegression(**best_parameters_logistic)
svm_classifier = SVC(**best_parameters_svm)
forest_classifier = RandomForestClassifier(**best_parameters_forest)

Treinamento do Modelo

In [79]:
tree_classifier.fit(X_train, y_train)

y_pred = tree_classifier.predict(X_test)

tree_accuracy = accuracy_score(y_test, y_pred)

print(f"A precisão do classificador Decision Tree é: {tree_accuracy}")


A precisão do classificador Decision Tree é: 0.9090909090909091


In [80]:
logistic_classifier.fit(X_train, y_train)

y_pred = logistic_classifier.predict(X_test)

logistic_accuracy = accuracy_score(y_test, y_pred)

print(f"A precisão do classificador Logistic Regression é: {logistic_accuracy}")

A precisão do classificador Logistic Regression é: 0.8181818181818182


In [81]:
svm_classifier.fit(X_train, y_train)

y_pred = svm_classifier.predict(X_test)

svm_accuracy = accuracy_score(y_test, y_pred)

print(f"A precisão do classificador SVM é: {svm_accuracy}")

A precisão do classificador SVM é: 0.8636363636363636


In [85]:
forest_classifier.fit(X_train, y_train)

y_pred = forest_classifier.predict(X_test)

forest_accuracy = accuracy_score(y_test, y_pred)

print(f"A precisão do classificador Random Forest é: {forest_accuracy}")

A precisão do classificador Random Forest é: 0.9318181818181818
