# **1. DESCRIÇÃO DO PROBLEMA E DA BASE DE DADOS**

O problema que vamos abordar busca determinar, a partir de um conjunto de estatísticas para cada jogador da NBA (liga americana de basquete), quem foi o MVP (most valuable player) da temporada, partindo da temporada de 1996/97 até a de 2019/20. O MVP é um prêmio dado ao jogador com melhor desempenho na competição.

A base, disponibilizada em formato .csv, possui inicialmente 11145 instâncias, divididas entre 22 atributos.

# **2. COMO OS DADOS FORAM ADQUIRIDOS?**

Os dados foram adquiridos pela plataforma Kaggle. Ela foi disponibilizada por Justinas Cirtautas, e pode ser obtida [aqui](https://www.kaggle.com/justinas/nba-players-data).

# **3. SOBRE A BASE**

A base apresenta os seguintes atributos:


*   **Unnamed**: coluna com a ordenação das instâncias;
*   **player_name**: nome do jogador;
*   **team_abbreviation**: abreviação do nome do time do jogador numa temporada específica;
*   **age**: idade do jogador numa temporada específica;
*   **player_height**: altura do jogador;
*   **player_weight**: peso do jogador;
*   **college**: a faculdade que ele frequentava antes de ser draftado;
*   **country**: nacionalidade do jogador;
*   **draft_year**: ano em que o jogador foi draftado;
*   **draft_round**: em qual rodada do draft o jogador foi escolhido;
*   **draft_number**: qual foi o número da escolha do jogador;
*   **gp**: partidas disputadas na temporada;
*   **pts**: média de pontos por partida do jogador;
*   **reb**: média de rebotes por partida do jogador; 
*   **ast**: média de assistências por partida do jogador;
*   **net_rating**: diferencial da quantidade de pontos marcados quando o jogador está em quadra por 100 posses de bola;
*   **oreb_pct**: porcentagem de rebotes ofensivos que o jogador conseguiu pegar em quadra;
*   **dreb_pct**: porcentagem de rebotes ofensivos que o jogador conseguiu pegar em quadra;
*   **usg_pct**: porcentagens de jogadas da equipe que o jogador participou diretamente enquanto estava em quadra;
*   **ts_pct**: porcentagem de eficiência do jogador em arremessos;
*   **ast_pct**: porcentagem de eficiência dos companheiros do jogador quando receberam assistências dele em quadra;
*   **season**: temporada em que essas estatísticas do jogador foram registradas.



# **4. LIMPEZA E TRANSFORMAÇÃO DA BASE**

In [None]:
# Import das bibliotecas e métodos

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from google.colab import drive
from sklearn.preprocessing import LabelEncoder

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# Import do dataset

df = pd.read_csv("all_seasons.csv")

In [None]:
# Análise inicial do dataset: Shape

df.shape

In [None]:
# Análise inicial do dataset: Tipos de atributos

df.dtypes

In [None]:
# Adicionando a coluna target

df['was_mvp'] = 0

for i in range(0, 11145):
  df['was_mvp'][i] = 0

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [None]:
df['was_mvp'][381] = 1
df['was_mvp'][1277] = 1
df['was_mvp'][467] = 1
df['was_mvp'][1556] = 1
df['was_mvp'][1907] = 1
df['was_mvp'][2324] = 1
df['was_mvp'][2942] = 1
df['was_mvp'][3134] = 1
df['was_mvp'][3647] = 1
df['was_mvp'][4101] = 1
df['was_mvp'][4474] = 1
df['was_mvp'][5215] = 1
df['was_mvp'][5430] = 1
df['was_mvp'][6127] = 1
df['was_mvp'][7004] = 1
df['was_mvp'][7428] = 1
df['was_mvp'][6466] = 1
df['was_mvp'][7972] = 1
df['was_mvp'][8497] = 1
df['was_mvp'][9008] = 1
df['was_mvp'][9304] = 1
df['was_mvp'][10003] = 1
df['was_mvp'][10111] = 1
df['was_mvp'][10638] = 1

In [None]:
# Excluindo colunas que não serão úteis para o problema

df = df.drop(['Unnamed: 0', 'team_abbreviation', 'age', 'player_height', 'player_weight', 'college', 'country', 'draft_year', 'draft_round', 'draft_number'], axis=1)

In [None]:
# Verificação de valores faltantes

df.isna().sum()

In [None]:
# Verificação de valores duplicados

df.duplicated().sum()

In [None]:
# Verificação de outliers

df_numerical = df.drop(['player_name', 'season'],  axis=1)

for item in df_numerical:
    plt.figure(figsize=(16, 8), dpi=80)
    plt.ylabel(item)
    plt.xlabel('season')
    plt.scatter(df['season'], df[item])
    plt.show()

In [None]:
# Conversão de variáveis categóricas

set(df['season'])

In [None]:
set(df['player_name'])

In [None]:
label_encoder = LabelEncoder()
season_num = label_encoder.fit_transform(df['season'])
player_num = label_encoder.fit_transform(df['player_name'])
df_season = pd.DataFrame(data=season_num, columns=['season'])
df_player = pd.DataFrame(data=player_num, columns=['player_id'])

In [None]:
df = df.drop(['player_name', 'season'], axis=1)
df = df.join(df_season)
df = df.join(df_player)

In [None]:
# Reorganizando o dataset

df = df[['player_id', 'gp', 'pts', 'reb', 'ast', 'net_rating', 'oreb_pct',
          'dreb_pct', 'usg_pct', 'ts_pct', 'ast_pct', 'season', 'was_mvp']]

In [None]:
df.to_csv("all_seasons_pre_processing.csv", index=False)

A base já veio praticamente pronta: não possuia valores faltantes, nem valores duplicados e os outliers detectados fazem sentido dentro do contexto do problema. Logo, o único processamento necessário foi remover as colunas que não gerariam impacto no resultado e transformar as variáveis categóricas em numéricas.

Além disso, foi necessário criar uma coluna target para o problema que queremos abordar. Para isso, criamos um atributo binário chamado **'was_mvp'**, onde 0 indica que o jogador não foi o MVP da temporada e o 1 indica o caso contrário.

# **5. REDUÇÃO DE INSTÂNCIAS**

Para realizarmos a redução de instâncias, utilizaremos a amostragem aleatória. 

In [None]:
df_sample = df.sample(frac=0.70)
df_sample.shape

(7801, 13)

In [None]:
df_sample.to_csv("BaseReduzida1.csv", index=False)

# **6. SELEÇÃO DE ATRIBUTOS**

In [None]:
from sklearn.feature_selection import SelectFromModel
from sklearn.ensemble import RandomForestClassifier

In [None]:
df_sample.head()

Unnamed: 0,player_id,gp,pts,reb,ast,net_rating,oreb_pct,dreb_pct,usg_pct,ts_pct,ast_pct,season,was_mvp
3941,852,74,5.6,1.2,3.4,-8.3,0.013,0.068,0.18,0.47,0.299,8,0
917,188,31,6.5,5.4,1.8,-0.5,0.073,0.155,0.136,0.535,0.111,2,0
4322,925,42,9.3,3.2,5.0,-0.5,0.037,0.096,0.222,0.458,0.336,9,0
6549,2009,21,5.0,2.4,1.8,-13.6,0.015,0.198,0.242,0.427,0.244,14,0
6208,1431,43,2.6,1.2,1.0,-9.1,0.033,0.097,0.168,0.404,0.151,13,0


In [None]:
X = df_sample.drop('was_mvp', axis=1)
y = df_sample['was_mvp']

In [None]:
oneshot_selector = SelectFromModel(RandomForestClassifier(),threshold="median",
                                   max_features=6)
n_df = pd.DataFrame(oneshot_selector.fit_transform(X, y), 
                    columns=X.columns[oneshot_selector.get_support()])

In [None]:
df_y = pd.DataFrame(data=y, columns=['was_mvp'])
df_y = df_y.reset_index()
df_y = df_y.drop(['index'], axis=1)

In [None]:
df_base = n_df.join(df_y, how='left')

In [None]:
df_base.head()

Unnamed: 0,pts,ast,net_rating,usg_pct,ts_pct,ast_pct,was_mvp
0,5.6,3.4,-8.3,0.18,0.47,0.299,0
1,6.5,1.8,-0.5,0.136,0.535,0.111,0
2,9.3,5.0,-0.5,0.222,0.458,0.336,0
3,5.0,1.8,-13.6,0.242,0.427,0.244,0
4,2.6,1.0,-9.1,0.168,0.404,0.151,0


In [None]:
df_base.to_csv("BaseReduzida2.csv", index=False)

# **7. EXTRAÇÃO DE ATRIBUTOS**

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

In [None]:
X = df_base.drop('was_mvp', axis=1)
y = df_base['was_mvp']

In [None]:
X = StandardScaler().fit_transform(X)

In [None]:
pca = PCA(n_components=6)
X = pca.fit_transform(X)

In [None]:
df_x = pd.DataFrame(data = X, 
                     columns = ['pts', 'reb', 'ast', 'net_rating', 'usg_pct',
                                'ast_pct'])

In [None]:
df_x = df_x.reset_index()

In [None]:
df_pos_pca = pd.concat([df_x, df_y], axis=1)

In [None]:
df_pos_pca = df_pos_pca.drop(['index'], axis=1)
df_pos_pca.head()

Unnamed: 0,pts,reb,ast,net_rating,usg_pct,ast_pct,was_mvp
0,0.824912,1.418159,-1.159437,-0.588578,0.459201,0.129227,0
1,-0.504409,-0.465027,-0.553922,-0.435319,-0.316482,-0.057336,0
2,2.200235,1.466886,-1.264041,0.052273,0.442082,-0.167962,0
3,0.306816,1.800097,0.160936,0.074007,0.997667,0.187826,0
4,-1.12276,1.082891,-0.393137,0.144204,0.251875,0.102992,0


In [None]:
df_pos_pca.to_csv("BaseReduzida3.csv", index=False)

# **8. k-NN**

In [None]:
from sklearn.model_selection import RepeatedKFold
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score, f1_score
from sklearn.preprocessing import MinMaxScaler
from sklearn.neighbors import KNeighborsClassifier

In [None]:
df1 = pd.read_csv('all_seasons_pre_processing.csv')
df2 = pd.read_csv('BaseReduzida1.csv')
df3 = pd.read_csv('BaseReduzida2.csv')
df4 = pd.read_csv('BaseReduzida3.csv')

In [None]:
X1 = df1.iloc[:,:-1].values
y1 = df1.iloc[:,-1].values

X2 = df2.iloc[:,:-1].values
y2 = df2.iloc[:,-1].values

X3 = df3.iloc[:,:-1].values
y3 = df3.iloc[:,-1].values

X4 = df4.iloc[:,:-1].values
y4 = df4.iloc[:,-1].values

In [None]:
rkf = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)

In [None]:
for train, test in rkf.split(X1, y1):
    X_train1 = X1[train]
    X_test1 = X1[test]
    y_train1 = y1[train]
    y_test1 = y1[test]

In [None]:
for train, test in rkf.split(X2, y2):
    X_train2 = X2[train]
    X_test2 = X2[test]
    y_train2 = y2[train]
    y_test2 = y2[test]

In [None]:
for train, test in rkf.split(X3, y3):
    X_train3 = X3[train]
    X_test3 = X3[test]
    y_train3 = y3[train]
    y_test3 = y3[test]

In [None]:
for train, test in rkf.split(X4, y4):
    X_train4 = X4[train]
    X_test4 = X4[test]
    y_train4 = y4[train]
    y_test4 = y4[test]

In [None]:
knn = KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')

In [None]:
knn.fit(X_train1, y_train1)
y_pred1 = knn.predict(X_test1)

knn.fit(X_train2, y_train2)
y_pred2 = knn.predict(X_test2)

knn.fit(X_train3, y_train3)
y_pred3 = knn.predict(X_test3)

knn.fit(X_train4, y_train4)
y_pred4 = knn.predict(X_test4)

In [None]:
acc = str(accuracy_score(y_test1, y_pred1))
print("Acurácia: ", acc)

prec = str(precision_score(y_test, y_pred, average= 'macro'))
print("Precisão: ", prec)

dp = stdev(y_pred
print("Desvio-padrão: ", dp)

rec = str(recall_score(y_test1, y_pred1, average = 'macro'))
print("Recall: ", rec)

f1 = str(f1_score(y_test1, y_pred1, average = 'macro'))
print("F1 Score: ", f1)

cm = confusion_matrix(y_test1, y_pred1)
print("Matriz de confusão")
print(cm)

SyntaxError: ignored

In [None]:
acc = str(accuracy_score(y_test2, y_pred2))
print("Acurácia: ", acc)

prec = str(precision_score(y_test2, y_pred2, average = 'macro'))
print("Precisão: ", prec)

rec = str(recall_score(y_test2, y_pred2, average = 'macro'))
print("Recall: ", rec)

f1 = str(f1_score(y_test2, y_pred2, average = 'macro'))
print("F1 Score: ", f1)

cm = confusion_matrix(y_test2, y_pred2)
print("Matriz de confusão")
print(cm)

Acurácia:  0.9987179487179487
Precisão:  0.9993581514762516
Recall:  0.75
F1 Score:  0.8330122029543995
Matriz de confusão
[[778   0]
 [  1   1]]


In [None]:
acc = str(accuracy_score(y_test3, y_pred3))
print("Acurácia: ", acc)

prec = str(precision_score(y_test3, y_pred3, average = 'macro'))
print("Precisão: ", prec)

rec = str(recall_score(y_test3, y_pred3, average = 'macro'))
print("Recall: ", rec)

f1 = str(f1_score(y_test3, y_pred3, average = 'macro'))
print("F1 Score: ", f1)

cm = confusion_matrix(y_test3, y_pred3)
print("Matriz de confusão")
print(cm)

Acurácia:  0.9961538461538462
Precisão:  0.666023166023166
Recall:  0.7487146529562982
F1 Score:  0.6990353697749196
Matriz de confusão
[[776   2]
 [  1   1]]


In [None]:
acc = str(accuracy_score(y_test4, y_pred4))
print("Acurácia: ", acc)

prec = str(precision_score(y_test4, y_pred4, average = 'macro'))
print("Precisão: ", prec)

rec = str(recall_score(y_test4, y_pred4, average = 'macro'))
print("Recall: ", rec)

f1 = str(f1_score(y_test4, y_pred4, average = 'macro'))
print("F1 Score: ", f1)

cm = confusion_matrix(y_test4, y_pred4)
print("Matriz de confusão")
print(cm)

Acurácia:  0.9974358974358974
Precisão:  0.7493573264781491
Recall:  0.7493573264781491
F1 Score:  0.7493573264781491
Matriz de confusão
[[777   1]
 [  1   1]]


In [None]:
# Escalonando os dados
minmax = MinMaxScaler()

X1 = minmax.fit_transform(X1)
X2 = minmax.fit_transform(X2)
X3 = minmax.fit_transform(X3)
X4 = minmax.fit_transform(X4)

In [None]:
for train, test in rkf.split(X1, y1):
    X_train1 = X1[train]
    X_test1 = X1[test]
    y_train1 = y1[train]
    y_test1 = y1[test]

In [None]:
for train, test in rkf.split(X2, y2):
    X_train2 = X2[train]
    X_test2 = X2[test]
    y_train2 = y2[train]
    y_test2 = y2[test]

In [None]:
for train, test in rkf.split(X3, y3):
    X_train3 = X3[train]
    X_test3 = X3[test]
    y_train3 = y3[train]
    y_test3 = y3[test]

In [None]:
for train, test in rkf.split(X4, y4):
    X_train4 = X4[train]
    X_test4 = X4[test]
    y_train4 = y4[train]
    y_test4 = y4[test]

In [None]:
knn.fit(X_train1, y_train1)
y_pred1 = knn.predict(X_test1)

knn.fit(X_train2, y_train2)
y_pred2 = knn.predict(X_test2)

knn.fit(X_train3, y_train3)
y_pred3 = knn.predict(X_test3)

knn.fit(X_train4, y_train4)
y_pred4 = knn.predict(X_test4)

In [None]:
acc = str(accuracy_score(y_test1, y_pred1))
print("Acurácia: ", acc)

prec = str(precision_score(y_test, y_pred, average= 'macro'))
print("Precisão: ", prec)

dp = stdev(y_pred
print("Desvio-padrão: ", dp)

rec = str(recall_score(y_test1, y_pred1, average = 'macro'))
print("Recall: ", rec)

f1 = str(f1_score(y_test1, y_pred1, average = 'macro'))
print("F1 Score: ", f1)

cm = confusion_matrix(y_test1, y_pred1)
print("Matriz de confusão")
print(cm)

Acurácia:  0.9955116696588869
Precisão:  0.6653165316531653
Recall:  0.6240990990990991
F1 Score:  0.6417315237666431
Matriz de confusão
[[1108    2]
 [   3    1]]


In [None]:
acc = str(accuracy_score(y_test2, y_pred2))
print("Acurácia: ", acc)

prec = str(precision_score(y_test2, y_pred2, average = 'macro'))
print("Precisão: ", prec)

rec = str(recall_score(y_test2, y_pred2, average = 'macro'))
print("Recall: ", rec)

f1 = str(f1_score(y_test2, y_pred2, average = 'macro'))
print("F1 Score: ", f1)

cm = confusion_matrix(y_test2, y_pred2)
print("Matriz de confusão")
print(cm)

Acurácia:  0.9974358974358974
Precisão:  0.7493573264781491
Recall:  0.7493573264781491
F1 Score:  0.7493573264781491
Matriz de confusão
[[777   1]
 [  1   1]]


In [None]:
acc = str(accuracy_score(y_test3, y_pred3))
print("Acurácia: ", acc)

prec = str(precision_score(y_test3, y_pred3, average = 'macro'))
print("Precisão: ", prec)

rec = str(recall_score(y_test3, y_pred3, average = 'macro'))
print("Recall: ", rec)

f1 = str(f1_score(y_test3, y_pred3, average = 'macro'))
print("F1 Score: ", f1)

cm = confusion_matrix(y_test3, y_pred3)
print("Matriz de confusão")
print(cm)

Acurácia:  0.9974358974358974
Precisão:  0.7493573264781491
Recall:  0.7493573264781491
F1 Score:  0.7493573264781491
Matriz de confusão
[[777   1]
 [  1   1]]


In [None]:
acc = str(accuracy_score(y_test4, y_pred4))
print("Acurácia: ", acc)

prec = str(precision_score(y_test, y_pred, average= 'macro'))
print("Precisão: ", prec)

rec = str(recall_score(y_test4, y_pred4, average = 'macro'))
print("Recall: ", rec)

f1 = str(f1_score(y_test4, y_pred4, average = 'macro'))
print("F1 Score: ", f1)

cm = confusion_matrix(y_test4, y_pred4)
print("Matriz de confusão")
print(cm)

Acurácia:  0.9961538461538462
Precisão:  0.4987163029525032
Recall:  0.4993573264781491
F1 Score:  0.49903660886319845
Matriz de confusão
[[777   1]
 [  2   0]]


# **9. ÁRVORES DE DECISÃO**

In [None]:
from sklearn.tree import DecisionTreeClassifier

In [None]:
df1 = pd.read_csv('all_seasons_pre_processing.csv')
df2 = pd.read_csv('BaseReduzida1.csv')
df3 = pd.read_csv('BaseReduzida2.csv')
df4 = pd.read_csv('BaseReduzida3.csv')

In [None]:
X1 = df1.iloc[:,:-1].values
y1 = df1.iloc[:,-1].values

X2 = df2.iloc[:,:-1].values
y2 = df2.iloc[:,-1].values

X3 = df3.iloc[:,:-1].values
y3 = df3.iloc[:,-1].values

X4 = df4.iloc[:,:-1].values
y4 = df4.iloc[:,-1].values

In [None]:
for train, test in rkf.split(X1, y1):
    X_train1 = X1[train]
    X_test1 = X1[test]
    y_train1 = y1[train]
    y_test1 = y1[test]

In [None]:
for train, test in rkf.split(X2, y2):
    X_train2 = X2[train]
    X_test2 = X2[test]
    y_train2 = y2[train]
    y_test2 = y2[test]

In [None]:
for train, test in rkf.split(X3, y3):
    X_train3 = X3[train]
    X_test3 = X3[test]
    y_train3 = y3[train]
    y_test3 = y3[test]

In [None]:
for train, test in rkf.split(X4, y4):
    X_train4 = X4[train]
    X_test4 = X4[test]
    y_train4 = y4[train]
    y_test4 = y4[test]

In [None]:
ad = DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)

In [None]:
#Executa só da primeira vez
i = 0

if(i == 0):
  n_nodes = []
  err = []
  i += 1

In [None]:
ad.fit(X_train1, y_train1)
y_pred1 = ad.predict(X_test1)
n_nodes.append(ad.tree_.node_count)

ad.fit(X_train2, y_train2)
y_pred2 = ad.predict(X_test2)
n_nodes.append(ad.tree_.node_count)

ad.fit(X_train3, y_train3)
y_pred3 = ad.predict(X_test3)
n_nodes.append(ad.tree_.node_count)

ad.fit(X_train4, y_train4)
y_pred4 = ad.predict(X_test4)
n_nodes.append(ad.tree_.node_count)

In [None]:
prec = str(precision_score(y_test, y_pred, average= 'macro'))
dp = stdev(y_pred)
print("Precisão: ", prec)
print("Desvio-padrão: ", dp)

In [None]:
print(y_pred1)

In [None]:
print(n_nodes)

In [None]:
acc = str(accuracy_score(y_test1, y_pred1))
print("Acurácia:", acc)

cm = confusion_matrix(y_test1, y_pred1)
print("Matriz de confusão: ")
print(cm)

err.append((cm[0][1]+cm[1][0])/(cm[0][0]+cm[0][1]+cm[1][0]+cm[1][1]))

print(err)

Acurácia: 0.9955116696588869
Matriz de confusão: 
[[1108    2]
 [   3    1]]
[0.004488330341113106]


In [None]:
acc = str(accuracy_score(y_test2, y_pred2))
print("Acurácia:", acc)

cm = confusion_matrix(y_test2, y_pred2)
print("Matriz de confusão: ")
print(cm)

err.append((cm[0][1]+cm[1][0])/(cm[0][0]+cm[0][1]+cm[1][0]+cm[1][1]))

print(err)

Acurácia: 0.9974358974358974
Matriz de confusão: 
[[778   0]
 [  2   0]]
[0.004488330341113106, 0.002564102564102564]


In [None]:
acc = str(accuracy_score(y_test3, y_pred3))
print("Acurácia:", acc)

cm = confusion_matrix(y_test3, y_pred3)
print("Matriz de confusão: ")
print(cm)

err.append((cm[0][1]+cm[1][0])/(cm[0][0]+cm[0][1]+cm[1][0]+cm[1][1]))

print(err)

Acurácia: 0.9974358974358974
Matriz de confusão: 
[[778   0]
 [  2   0]]
[0.004488330341113106, 0.002564102564102564, 0.002564102564102564]


In [None]:
acc = str(accuracy_score(y_test4, y_pred4))
print("Acurácia:", acc)

cm = confusion_matrix(y_test4, y_pred4)
print("Matriz de confusão: ")
print(cm)

err.append((cm[0][1]+cm[1][0])/(cm[0][0]+cm[0][1]+cm[1][0]+cm[1][1]))

print(err)

Acurácia: 0.9974358974358974
Matriz de confusão: 
[[778   0]
 [  2   0]]
[0.004488330341113106, 0.002564102564102564, 0.002564102564102564, 0.002564102564102564]


#    **10. NAIVE BAYES**

In [None]:
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import RepeatedKFold
import numpy as np 
import pylab 
import scipy.stats as stats
import pandas as pd
import matplotlib.pyplot as plt
from google.colab import drive


In [None]:
df1 = pd.read_csv('all_seasons_pre_processing.csv')
df2 = pd.read_csv('BaseReduzida1.csv')
df3 = pd.read_csv('BaseReduzida2.csv')
df4 = pd.read_csv('BaseReduzida3.csv')

In [None]:
rkf = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)

In [None]:
X1 = df1.iloc[:,:-1].values
y1 = df1.iloc[:,-1].values

X2 = df2.iloc[:,:-1].values
y2 = df2.iloc[:,-1].values

X3 = df3.iloc[:,:-1].values
y3 = df3.iloc[:,-1].values

X4 = df4.iloc[:,:-1].values
y4 = df4.iloc[:,-1].values

In [None]:
for train, test in rkf.split(X1, y1):
    X_train1 = X1[train]
    X_test1 = X1[test]
    y_train1 = y1[train]
    y_test1 = y1[test]

In [None]:
for train, test in rkf.split(X2, y2):
    X_train2 = X2[train]
    X_test2 = X2[test]
    y_train2 = y2[train]
    y_test2 = y2[test]

In [None]:
for train, test in rkf.split(X3, y3):
    X_train3 = X3[train]
    X_test3 = X3[test]
    y_train3 = y3[train]
    y_test3 = y3[test]

In [None]:
for train, test in rkf.split(X4, y4):
    X_train4 = X4[train]
    X_test4 = X4[test]
    y_train4 = y4[train]
    y_test4 = y4[test]

In [None]:
nb = GaussianNB()

In [None]:
nb.fit(X_train1, y_train1)
y_pred1 = nb.predict(X_test1)

nb.fit(X_train2, y_train2)
y_pred2 = nb.predict(X_test2)

nb.fit(X_train3, y_train3)
y_pred3 = nb.predict(X_test3)

nb.fit(X_train4, y_train4)
y_pred4 = nb.predict(X_test4)

In [None]:
prec = str(precision_score(y_test, y_pred, average= 'macro'))
dp = stdev(y_pred)
print("Precisão: ", prec)
print("Desvio-padrão: ", dp)

Precisão:  0.5588235294117647


In [None]:
acc = str(accuracy_score(y_test1, y_pred1))
print("Acurácia:", acc)

cm = confusion_matrix(y_test1, y_pred1)
print("Matriz de confusão: ")
print(cm)

Acurácia: 0.9730700179533214
Matriz de confusão: 
[[1080   30]
 [   0    4]]


In [None]:
acc = str(accuracy_score(y_test2, y_pred2))
print("Acurácia:", acc)

cm = confusion_matrix(y_test2, y_pred2)
print("Matriz de confusão: ")
print(cm)

Acurácia: 0.9782051282051282
Matriz de confusão: 
[[761  17]
 [  0   2]]


In [None]:
acc = str(accuracy_score(y_test3, y_pred3))
print("Acurácia:", acc)

cm = confusion_matrix(y_test3, y_pred3)
print("Matriz de confusão: ")
print(cm)

Acurácia: 0.9807692307692307
Matriz de confusão: 
[[763  15]
 [  0   2]]


In [None]:
acc = str(accuracy_score(y_test4, y_pred4))
print("Acurácia:", acc)

cm = confusion_matrix(y_test4, y_pred4)
print("Matriz de confusão: ")
print(cm)

Acurácia: 0.9935897435897436
Matriz de confusão: 
[[774   4]
 [  1   1]]


#    **11. REDES NEURAIS**

In [None]:
from sklearn.neural_network import MLPClassifier

In [None]:
df1 = pd.read_csv('all_seasons_pre_processing.csv')
df2 = pd.read_csv('BaseReduzida1.csv')
df3 = pd.read_csv('BaseReduzida2.csv')
df4 = pd.read_csv('BaseReduzida3.csv')

In [None]:
X1 = df1.iloc[:,:-1].values
y1 = df1.iloc[:,-1].values

X2 = df2.iloc[:,:-1].values
y2 = df2.iloc[:,-1].values

X3 = df3.iloc[:,:-1].values
y3 = df3.iloc[:,-1].values

X4 = df4.iloc[:,:-1].values
y4 = df4.iloc[:,-1].values

In [None]:
for train, test in rkf.split(X1, y1):
    X_train1 = X1[train]
    X_test1 = X1[test]
    y_train1 = y1[train]
    y_test1 = y1[test]

In [None]:
for train, test in rkf.split(X2, y2):
    X_train2 = X2[train]
    X_test2 = X2[test]
    y_train2 = y2[train]
    y_test2 = y2[test]

In [None]:
for train, test in rkf.split(X3, y3):
    X_train3 = X3[train]
    X_test3 = X3[test]
    y_train3 = y3[train]
    y_test3 = y3[test]

In [None]:
for train, test in rkf.split(X4, y4):
    X_train4 = X4[train]
    X_test4 = X4[test]
    y_train4 = y4[train]
    y_test4 = y4[test]

In [None]:
mlp = MLPClassifier(solver='sgd', momentum=0.8, hidden_layer_sizes=(300), learning_rate='constant', learning_rate_init=0.01, max_iter=500, random_state=1)

In [None]:
mlp.fit(X_train1, y_train1)
y_pred1 = mlp.predict(X_test1)

mlp.fit(X_train2, y_train2)
y_pred2 = mlp.predict(X_test2)

mlp.fit(X_train3, y_train3)
y_pred3 = mlp.predict(X_test3)

mlp.fit(X_train4, y_train4)
y_pred4 = mlp.predict(X_test4)



In [None]:
prec = str(precision_score(y_test, y_pred, average= 'macro'))
dp = stdev(y_pred)
print("Precisão: ", prec)
print("Desvio-padrão: ", dp)

Precisão:  0.4982046678635548


  _warn_prf(average, modifier, msg_start, len(result))


In [None]:
acc = str(accuracy_score(y_test1, y_pred1))
print("Acurácia:", acc)

cm = confusion_matrix(y_test1, y_pred1)
print("Matriz de confusão: ")
print(cm)

Acurácia: 0.9964093357271095
Matriz de confusão: 
[[1110    0]
 [   4    0]]


In [None]:
acc = str(accuracy_score(y_test2, y_pred2))
print("Acurácia:", acc)

cm = confusion_matrix(y_test2, y_pred2)
print("Matriz de confusão: ")
print(cm)

Acurácia: 0.9974358974358974
Matriz de confusão: 
[[778   0]
 [  2   0]]


In [None]:
acc = str(accuracy_score(y_test3, y_pred3))
print("Acurácia:", acc)

cm = confusion_matrix(y_test3, y_pred3)
print("Matriz de confusão: ")
print(cm)

Acurácia: 0.9935897435897436
Matriz de confusão: 
[[773   5]
 [  0   2]]


In [None]:
acc = str(accuracy_score(y_test4, y_pred4))
print("Acurácia:", acc)

cm = confusion_matrix(y_test4, y_pred4)
print("Matriz de confusão: ")
print(cm)

Acurácia: 0.9974358974358974
Matriz de confusão: 
[[778   0]
 [  2   0]]


# **12. k-MEANS**

In [None]:
from sklearn.cluster import KMeans
from sklearn.metrics import davies_bouldin_score
from sklearn.metrics import silhouette_score

In [None]:
db = []
si = []
dp_db = []
dp_si = []
mean_db = []
mean_si = []

In [None]:
for x in range(2, 21):
  df = pd.read_csv('all_seasons_pre_processing.csv')
  df = df.drop('was_mvp', axis=1)

  km = KMeans(n_clusters=x, init='k-means++', max_iter = 300, n_init=5, random_state = 0)
  km.fit(df)
  predict = km.fit_predict(df)
  centroids = km.cluster_centers_

  df["Cluster"] = km.labels_
  labels = df["Cluster"]

  db.append(davies_bouldin_score(df, labels))
  si.append(silhouette_score(df, labels))
  dp_db.append(np.std(davies_bouldin_score(df, labels)))
  dp_si.append(np.std(silhouette_score(df, labels)))
  mean_db.append(np.mean(davies_bouldin_score(df, labels)))
  mean_si.append(np.mean(silhouette_score(df, labels)))


In [None]:
k_means = [db, si, dp_db, dp_si, mean_db, mean_si]

In [None]:
print(k_means)

[[0.500677060354062, 0.5096348806776979, 0.500677060354062, 0.5096348806776979, 0.5134773410705106, 0.531496987569788, 0.5355760821337144, 0.5424387607374069, 0.5512869236507786, 0.5723842549587991, 0.5856093788500542, 0.5945567488982121, 0.6122187006877642, 0.6312778473934646, 0.6424252166360283, 0.6546885874043039, 0.6720843189528113, 0.6856431271641682, 0.7049493772780531, 0.7172006283154517, 0.7395140283991966], [0.6266815471364442, 0.5851547696511393, 0.6266815471364442, 0.5851547696511393, 0.5645419762910631, 0.539246754909167, 0.5312074318697858, 0.5215396372059196, 0.5132007203710692, 0.4957268647494867, 0.486204994872483, 0.480793153691667, 0.470209085116004, 0.4575059510435769, 0.4506165441105029, 0.44588125218548347, 0.43683268228800615, 0.42848594012608804, 0.41955074909052564, 0.4158827246269716, 0.40199112268243203], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

# **13. HIERÁRQUICO AGLOMERATIVO**

In [None]:
from sklearn.cluster import AgglomerativeClustering
from sklearn.metrics import davies_bouldin_score
from sklearn.metrics import silhouette_score

In [None]:
db = []
si = []
dp_db = []
dp_si = []
mean_db = []
mean_si = []

In [None]:
for x in range(2, 21):
  df = pd.read_csv('all_seasons_pre_processing.csv')
  df = df.drop('was_mvp', axis=1)

  ahc = AgglomerativeClustering(n_clusters=x, affinity='euclidean', linkage='complete')
  ahc.fit(df)
  ahc.fit_predict(df)

  df["Cluster"] = ahc.labels_
  labels = df["Cluster"]

  db.append(davies_bouldin_score(df, labels))
  si.append(silhouette_score(df, labels))
  dp_db.append(np.std(davies_bouldin_score(df, labels)))
  dp_si.append(np.std(silhouette_score(df, labels)))
  mean_db.append(np.mean(davies_bouldin_score(df, labels)))
  mean_si.append(np.mean(silhouette_score(df, labels)))

In [None]:
aglomerative = [db, si, dp_db, dp_si, mean_db, mean_si]

In [None]:
print(aglomerative)

[[0.5006768688432589, 0.5094854202913034, 0.5189099940648586, 0.5403680013834244, 0.5475044939039189, 0.5513715648960146, 0.5641373372973887, 0.5731139092541927, 0.540123030987208, 0.5142688604098352, 0.5378752425525694, 0.5578092957187313, 0.5731142891599573, 0.5828337057063842, 0.5658145864523132, 0.6242797052932993, 0.6413973967447048, 0.6867515914100577, 0.7002614875787653], [0.626549872465282, 0.5298998434509639, 0.5425899775142866, 0.49049173235392085, 0.46458573613701065, 0.4824682696326485, 0.4684299173850125, 0.47681462872011826, 0.47564181042234066, 0.47555409727398923, 0.446321849031763, 0.4410086111783164, 0.4332247483492806, 0.4357888669323669, 0.4354108725949094, 0.42946314267793506, 0.4106932900189419, 0.38845583810488904, 0.3825504768223273], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.5006768688432589, 0.50948542029130

# **14. EXPECTATION MAXIMIZATION (EM)**

In [None]:
from sklearn.mixture import GaussianMixture #EM
from sklearn.metrics import davies_bouldin_score
from sklearn.metrics import silhouette_score

In [None]:
db = []
si = []
dp_db = []
dp_si = []
mean_db = []
mean_si = []

In [None]:
for x in range(2, 21):
  df = pd.read_csv('all_seasons_pre_processing.csv')
  df = df.drop('was_mvp', axis=1)

  gmm = GaussianMixture(n_components=x, n_init=5, covariance_type='full')
  gmm.fit(df)

  df["Cluster"] = gmm.predict(df)
  labels = df["Cluster"]

  db.append(davies_bouldin_score(df, labels))
  si.append(silhouette_score(df, labels))
  dp_db.append(np.std(davies_bouldin_score(df, labels)))
  dp_si.append(np.std(silhouette_score(df, labels)))
  mean_db.append(np.mean(davies_bouldin_score(df, labels)))
  mean_si.append(np.mean(silhouette_score(df, labels)))

In [None]:
em = [db, si, dp_db, dp_si, mean_db, mean_si]

In [None]:
print(em)

[[24.23388433762704, 40.199151538280404, 45.095836386634886, 41.65304820880058, 53.42866730573683, 98.15534966194998, 82.88368838571918, 49.88548601834146, 67.88162161227312, 81.18275560929501, 52.97465908207689, 73.0504973626924, 60.3552921704178, 64.88255603606984, 74.45640511558452, 60.68929849192163, 49.83191181072512, 61.193867835881306, 29.617821348126746], [-0.0012670346738484063, -0.008426890057027361, -0.011354648903902061, -0.017858896615094783, -0.02556860871209009, -0.04153506934535949, -0.028277404210741104, -0.04509985013112345, -0.04470309002002829, -0.029664279796381258, -0.06274012240091459, -0.03703241641285682, -0.06763226011257079, -0.06388077966831082, -0.25993774369156936, -0.3338880985620983, -0.34410548721770334, -0.429223412941188, -0.4804974334440304], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [24.2338843376270



---



- Em todos os três casos, o melhor k foi 2 (era o que a gente esperava, já que nosso target é binário: ou o cara é MVP, ou ele não é).

- Nos três casos, k = 2 teve o menor DB e o maior Silhouette.

- O resultado foi o mesmo nos três casos.

# **15.   BOOSTING**

In [None]:
from sklearn.ensemble import BaggingClassifier, StackingClassifier, AdaBoostClassifier
from statistics import stdev

In [None]:
df = pd.read_csv("all_seasons_pre_processing.csv")

In [None]:
X = df.iloc[:,:-1].values
y = df.iloc[:,-1].values

In [None]:
rkf = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)

In [None]:
for train, test in rkf.split(X, y):
    X_train = X[train]
    X_test = X[test]
    y_train = y[train]
    y_test = y[test]



---



In [None]:
clf = AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10), n_estimators=15, random_state=0)

In [None]:
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

In [None]:
acc = str(accuracy_score(y_test2, y_pred2))
print("Acurácia:", acc)
cm = confusion_matrix(y_test2, y_pred2)
print("Matriz de confusão: ")
print(cm)
prec = str(precision_score(y_test, y_pred, average= 'macro'))
dp = stdev(y_pred)
print("Precisão: ", prec)
print("Desvio-padrão: ", dp)

Precisão:  0.4982030548068284
Desvio-padrão:  0.0




---



In [None]:
clf = AdaBoostClassifier(base_estimator=GaussianNB(), n_estimators=20, random_state=0)

In [None]:
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

In [None]:
acc = str(accuracy_score(y_test2, y_pred2))
print("Acurácia:", acc)
cm = confusion_matrix(y_test2, y_pred2)
print("Matriz de confusão: ")
print(cm)
prec = str(precision_score(y_test, y_pred, average= 'macro'))
dp = stdev(y_pred)
print("Precisão: ", prec)
print("Desvio-padrão: ", dp)

Precisão:  0.4981965734896303
Desvio-padrão:  0.0




---



In [None]:
class customMLP(MLPClassifier):
  def resample_with_replacement(self, X_train, y_train, sample_weight):
    sample_weight = sample_weight / sample_weight.sum(dtype=np.float64)

    X_train_resampled = np.zeros((len(X_train), len(X_train[0])), dtype=np.float32)
    y_train_resampled = np.zeros((len(y_train)), dtype=np.int)
    for i in range(len(X_train)):
      draw = np.random.choice(np.arange(len(X_train)), p=sample_weight)

      X_train_resampled[i] = X_train[draw]
      y_train_resampled[i] = y_train[draw]

    return X_train_resampled, y_train_resampled
  

  def fit(self, X, y, sample_weight=None):
    if sample_weight is not None:
      X, y = self.resample_with_replacement(X, y, sample_weight)

    return self._fit(X, y, incremental=(self.warm_start and
                                        hasattr(self, "classes_")))

In [None]:
clf = AdaBoostClassifier(base_estimator=customMLP(solver='sgd', momentum=0.8, hidden_layer_sizes=(300), learning_rate='constant', learning_rate_init=0.01, max_iter=500, random_state=1), n_estimators=20, random_state=0)

In [None]:
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

In [None]:
acc = str(accuracy_score(y_test2, y_pred2))
print("Acurácia:", acc)
cm = confusion_matrix(y_test2, y_pred2)
print("Matriz de confusão: ")
print(cm)
prec = str(precision_score(y_test, y_pred, average= 'macro'))
dp = stdev(y_pred)
print("Precisão: ", prec)
print("Desvio-padrão: ", dp)

Precisão:  0.4982046678635548
Desvio-padrão:  0.0


  _warn_prf(average, modifier, msg_start, len(result))




---



In [None]:
class customKNN(KNeighborsClassifier):
  def resample_with_replacement(self, X_train, y_train, sample_weight):
    sample_weight = sample_weight / sample_weight.sum(dtype=np.float64)

    X_train_resampled = np.zeros((len(X_train), len(X_train[0])), dtype=np.float32)
    y_train_resampled = np.zeros((len(y_train)), dtype=np.int)
    for i in range(len(X_train)):
      draw = np.random.choice(np.arange(len(X_train)), p=sample_weight)

      X_train_resampled[i] = X_train[draw]
      y_train_resampled[i] = y_train[draw]

    return X_train_resampled, y_train_resampled
  
  def init(self, n_neighbors):
    super().init(n_neighbors=n_neighbors)

  def fit(self, X, y, sample_weight=None):
    if sample_weight is not None:
      X, y = self.resample_with_replacement(X, y, sample_weight)

    return super().fit(X, y)

In [None]:
clf = AdaBoostClassifier(base_estimator=customKNN(n_neighbors=2, metric='euclidean', weights='distance'), n_estimators=20, random_state=0)

In [None]:
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

In [None]:
acc = str(accuracy_score(y_test2, y_pred2))
print("Acurácia:", acc)
cm = confusion_matrix(y_test2, y_pred2)
print("Matriz de confusão: ")
print(cm)
prec = str(precision_score(y_test, y_pred, average= 'macro'))
dp = stdev(y_pred)
print("Precisão: ", prec)
print("Desvio-padrão: ", dp)

Precisão:  0.4982014388489209
Desvio-padrão:  0.0


# **16. BAGGING**

In [None]:
clf = BaggingClassifier(base_estimator=DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10), n_estimators=10, random_state=0)

In [None]:
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

In [None]:
acc = str(accuracy_score(y_test2, y_pred2))
print("Acurácia:", acc)
cm = confusion_matrix(y_test2, y_pred2)
print("Matriz de confusão: ")
print(cm)
prec = str(precision_score(y_test, y_pred, average= 'macro'))
dp = stdev(y_pred)
print("Precisão: ", prec)
print("Desvio-padrão: ", dp)

Precisão:  0.4982030548068284
Desvio-padrão:  0.0




---



In [None]:
clf = BaggingClassifier(base_estimator=GaussianNB(), n_estimators=20, random_state=0)

In [None]:
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

In [None]:
acc = str(accuracy_score(y_test2, y_pred2))
print("Acurácia:", acc)
cm = confusion_matrix(y_test2, y_pred2)
print("Matriz de confusão: ")
print(cm)
prec = str(precision_score(y_test, y_pred, average= 'macro'))
dp = stdev(y_pred)
print("Precisão: ", prec)
print("Desvio-padrão: ", dp)

Precisão:  0.5647590961622763
Desvio-padrão:  0.0




---



In [None]:
clf = BaggingClassifier(base_estimator=MLPClassifier(solver='sgd', momentum=0.8, hidden_layer_sizes=(300), learning_rate='constant', learning_rate_init=0.01, max_iter=500, random_state=1), n_estimators=20, random_state=0)

In [None]:
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

In [None]:
acc = str(accuracy_score(y_test2, y_pred2))
print("Acurácia:", acc)
cm = confusion_matrix(y_test2, y_pred2)
print("Matriz de confusão: ")
print(cm)
prec = str(precision_score(y_test, y_pred, average= 'macro'))
dp = stdev(y_pred)
print("Precisão: ", prec)
print("Desvio-padrão: ", dp)



---



In [None]:
clf = BaggingClassifier(base_estimator=KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance'), n_estimators=20, random_state=0)

In [None]:
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

In [None]:
acc = str(accuracy_score(y_test2, y_pred2))
print("Acurácia:", acc)
cm = confusion_matrix(y_test2, y_pred2)
print("Matriz de confusão: ")
print(cm)
prec = str(precision_score(y_test, y_pred, average= 'macro'))
dp = stdev(y_pred)
print("Precisão: ", prec)
print("Desvio-padrão: ", dp)

Precisão:  0.4982030548068284
Desvio-padrão:  0.0


# **17. STACKING**

**STACKING HOMOGÊNEO**

In [None]:
estimators = [('ad1', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad2', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad3', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad4', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad5', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad6', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad7', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad8', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad9', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad10', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10))]

In [None]:
estimators = [('ad1', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad2', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad3', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad4', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad5', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad6', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad7', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad8', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad9', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad10', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad11', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad12', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad13', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad14', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad15', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10))]

In [None]:
estimators = [('ad1', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad2', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad3', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad4', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad5', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad6', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad7', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad8', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad9', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad10', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad11', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad12', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad13', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad14', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad15', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad16', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad17', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad18', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad19', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad20', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10))]

In [None]:
sclf = StackingClassifier(estimators=estimators, final_estimator=DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10))

In [None]:
sclf.fit(X_train, y_train)
y_pred = sclf.predict(X_test)

In [None]:
acc = str(accuracy_score(y_test2, y_pred2))
print("Acurácia:", acc)
cm = confusion_matrix(y_test2, y_pred2)
print("Matriz de confusão: ")
print(cm)
prec = str(precision_score(y_test, y_pred, average= 'macro'))
dp = stdev(y_pred)
print("Precisão: ", prec)
print("Desvio-padrão: ", dp)

Precisão:  0.4982046678635548
Desvio-padrão:  0.0


  _warn_prf(average, modifier, msg_start, len(result))




---



In [None]:
estimators = [('nb1', GaussianNB()), 
              ('nb2', GaussianNB()),
              ('nb3', GaussianNB()),
              ('nb4', GaussianNB()),
              ('nb5', GaussianNB()),
              ('nb6', GaussianNB()),
              ('nb7', GaussianNB()),
              ('nb8', GaussianNB()),
              ('nb9', GaussianNB()),
              ('nb10', GaussianNB())]

In [None]:
estimators = [('nb1', GaussianNB()), 
              ('nb2', GaussianNB()),
              ('nb3', GaussianNB()),
              ('nb4', GaussianNB()),
              ('nb5', GaussianNB()),
              ('nb6', GaussianNB()),
              ('nb7', GaussianNB()),
              ('nb8', GaussianNB()),
              ('nb9', GaussianNB()),
              ('nb10', GaussianNB()),
              ('nb11', GaussianNB()),
              ('nb12', GaussianNB()),
              ('nb13', GaussianNB()),
              ('nb14', GaussianNB()),
              ('nb15', GaussianNB())]

In [None]:
estimators = [('nb1', GaussianNB()), 
              ('nb2', GaussianNB()),
              ('nb3', GaussianNB()),
              ('nb4', GaussianNB()),
              ('nb5', GaussianNB()),
              ('nb6', GaussianNB()),
              ('nb7', GaussianNB()),
              ('nb8', GaussianNB()),
              ('nb9', GaussianNB()),
              ('nb10', GaussianNB()),
              ('nb11', GaussianNB()),
              ('nb12', GaussianNB()),
              ('nb13', GaussianNB()),
              ('nb14', GaussianNB()),
              ('nb15', GaussianNB()),
              ('nb16', GaussianNB()),
              ('nb17', GaussianNB()),
              ('nb18', GaussianNB()),
              ('nb19', GaussianNB()),
              ('nb20', GaussianNB())]

In [None]:
sclf = StackingClassifier(estimators=estimators, final_estimator=GaussianNB())

In [None]:
sclf.fit(X_train, y_train)
y_pred = sclf.predict(X_test)

In [None]:
acc = str(accuracy_score(y_test2, y_pred2))
print("Acurácia:", acc)
cm = confusion_matrix(y_test2, y_pred2)
print("Matriz de confusão: ")
print(cm)
prec = str(precision_score(y_test, y_pred, average= 'macro'))
dp = stdev(y_pred)
print("Precisão: ", prec)
print("Desvio-padrão: ", dp)

Precisão:  0.5571428571428572
Desvio-padrão:  0.0




---



In [None]:
#Tive que tirar os parãmetros, porque com os parâmetros não gerava erros, mas a execução ficou mais de uma hora rodando e não terminou

estimators = [('mlp1', MLPClassifier()),
              ('mlp2', MLPClassifier()),
              ('mlp3', MLPClassifier()),
              ('mlp4', MLPClassifier()),
              ('mlp5', MLPClassifier()),
              ('mlp6', MLPClassifier()),
              ('mlp7', MLPClassifier()),
              ('mlp8', MLPClassifier()),
              ('mlp9', MLPClassifier()),
              ('mlp10', MLPClassifier())]

In [None]:
#Tive que tirar os parãmetros, porque com os parâmetros não gerava erros, mas a execução ficou mais de uma hora rodando e não terminou

estimators = [('mlp1', MLPClassifier()),
              ('mlp2', MLPClassifier()),
              ('mlp3', MLPClassifier()),
              ('mlp4', MLPClassifier()),
              ('mlp5', MLPClassifier()),
              ('mlp6', MLPClassifier()),
              ('mlp7', MLPClassifier()),
              ('mlp8', MLPClassifier()),
              ('mlp9', MLPClassifier()),
              ('mlp10', MLPClassifier()),
              ('mlp11', MLPClassifier()),
              ('mlp12', MLPClassifier()),
              ('mlp13', MLPClassifier()),
              ('mlp14', MLPClassifier()),
              ('mlp15', MLPClassifier())]

In [None]:
#Tive que tirar os parãmetros, porque com os parâmetros não gerava erros, mas a execução ficou mais de uma hora rodando e não terminou

estimators = [('mlp1', MLPClassifier()),
              ('mlp2', MLPClassifier()),
              ('mlp3', MLPClassifier()),
              ('mlp4', MLPClassifier()),
              ('mlp5', MLPClassifier()),
              ('mlp6', MLPClassifier()),
              ('mlp7', MLPClassifier()),
              ('mlp8', MLPClassifier()),
              ('mlp9', MLPClassifier()),
              ('mlp10', MLPClassifier()),
              ('mlp11', MLPClassifier()),
              ('mlp12', MLPClassifier()),
              ('mlp13', MLPClassifier()),
              ('mlp14', MLPClassifier()),
              ('mlp15', MLPClassifier()),
              ('mlp16', MLPClassifier()),
              ('mlp17', MLPClassifier()),
              ('mlp18', MLPClassifier()),
              ('mlp19', MLPClassifier()),
              ('mlp20', MLPClassifier())]

In [None]:
#Só de backup, caso precise
"""estimators = [('mlp1', MLPClassifier(solver='sgd', momentum=0.8, hidden_layer_sizes=(300), learning_rate='constant', learning_rate_init=0.01, max_iter=500, random_state=1))]"""



In [None]:
sclf = StackingClassifier(estimators=estimators, final_estimator=MLPClassifier())

In [None]:
sclf.fit(X_train, y_train)
y_pred = sclf.predict(X_test)

In [None]:
acc = str(accuracy_score(y_test2, y_pred2))
print("Acurácia:", acc)
cm = confusion_matrix(y_test2, y_pred2)
print("Matriz de confusão: ")
print(cm)
prec = str(precision_score(y_test, y_pred, average= 'macro'))
dp = stdev(y_pred)
print("Precisão: ", prec)
print("Desvio-padrão: ", dp)

Precisão:  0.4982046678635548
Desvio-padrão:  0.0


  _warn_prf(average, modifier, msg_start, len(result))




---



In [None]:
estimators = [('knn1', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')), 
              ('knn2', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn3', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn4', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn5', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn6', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn7', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn8', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn9', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn10', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance'))]

In [None]:
estimators = [('knn1', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')), 
              ('knn2', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn3', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn4', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn5', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn6', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn7', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn8', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn9', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn10', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn11', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn12', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn13', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn14', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn15', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance'))]

In [None]:
estimators = [('knn1', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')), 
              ('knn2', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn3', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn4', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn5', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn6', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn7', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn8', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn9', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn10', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn11', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn12', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn13', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn14', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn15', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn16', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn17', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn18', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn19', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn20', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance'))]

In [None]:
sclf = StackingClassifier(estimators=estimators, final_estimator=KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance'))

In [None]:
sclf.fit(X_train, y_train)
y_pred = sclf.predict(X_test)

In [None]:
acc = str(accuracy_score(y_test2, y_pred2))
print("Acurácia:", acc)
cm = confusion_matrix(y_test2, y_pred2)
print("Matriz de confusão: ")
print(cm)
prec = str(precision_score(y_test, y_pred, average= 'macro'))
dp = stdev(y_pred)
print("Precisão: ", prec)
print("Desvio-padrão: ", dp)

Precisão:  0.4982046678635548
Desvio-padrão:  0.0


  _warn_prf(average, modifier, msg_start, len(result))


**STACKING HETEROGÊNEO**

**Método A:** Árvores de Decisão

**Método B:** k-NN

**Método C:** Naive Bayes



---
**A:**


In [None]:
estimators = [('ad1', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad2', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad3', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad4', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad5', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('knn1', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')), 
              ('knn2', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn3', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn4', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn5', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance'))]

In [None]:
estimators = [('ad1', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad2', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad3', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad4', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad5', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad6', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad7', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad8', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('knn1', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')), 
              ('knn2', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn3', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn4', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn5', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn6', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn7', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance'))]

In [None]:
estimators = [('ad1', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad2', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad3', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad4', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad5', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad6', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad7', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad8', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad9', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad10', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('knn1', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')), 
              ('knn2', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn3', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn4', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn5', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn6', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn7', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn8', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn9', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn10', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance'))]

In [None]:
sclf = StackingClassifier(estimators=estimators, final_estimator=DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10))

In [None]:
sclf.fit(X_train, y_train)
y_pred = sclf.predict(X_test)

In [None]:
acc = str(accuracy_score(y_test2, y_pred2))
print("Acurácia:", acc)
cm = confusion_matrix(y_test2, y_pred2)
print("Matriz de confusão: ")
print(cm)
prec = str(precision_score(y_test, y_pred, average= 'macro'))
dp = stdev(y_pred)
print("Precisão: ", prec)
print("Desvio-padrão: ", dp)

Precisão:  0.4982046678635548
Desvio-padrão:  0.0


  _warn_prf(average, modifier, msg_start, len(result))




---
**B:**


In [None]:
estimators = [('ad1', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad2', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad3', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad4', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad5', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('nb1', GaussianNB()), 
              ('nb2', GaussianNB()),
              ('nb3', GaussianNB()),
              ('nb4', GaussianNB()),
              ('nb5', GaussianNB())]

In [None]:
estimators = [('ad1', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad2', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad3', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad4', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad5', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad6', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad7', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('nb1', GaussianNB()), 
              ('nb2', GaussianNB()),
              ('nb3', GaussianNB()),
              ('nb4', GaussianNB()),
              ('nb5', GaussianNB()),
              ('nb6', GaussianNB()),
              ('nb7', GaussianNB()),
              ('nb8', GaussianNB())]

In [None]:
estimators = [('ad1', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad2', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad3', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad4', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad5', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad6', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad7', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad8', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad9', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad10', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('nb1', GaussianNB()), 
              ('nb2', GaussianNB()),
              ('nb3', GaussianNB()),
              ('nb4', GaussianNB()),
              ('nb5', GaussianNB()),
              ('nb6', GaussianNB()),
              ('nb7', GaussianNB()),
              ('nb8', GaussianNB()),
              ('nb9', GaussianNB()),
              ('nb10', GaussianNB())]

In [None]:
sclf = StackingClassifier(estimators=estimators, final_estimator=GaussianNB())

In [None]:
sclf.fit(X_train, y_train)
y_pred = sclf.predict(X_test)

In [None]:
acc = str(accuracy_score(y_test2, y_pred2))
print("Acurácia:", acc)
cm = confusion_matrix(y_test2, y_pred2)
print("Matriz de confusão: ")
print(cm)
prec = str(precision_score(y_test, y_pred, average= 'macro'))
dp = stdev(y_pred)
print("Precisão: ", prec)
print("Desvio-padrão: ", dp)

Precisão:  0.5555555555555556
Desvio-padrão:  0.0




---
**C:**


In [None]:
estimators = [('knn1', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')), 
              ('knn2', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn3', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn4', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn5', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('nb1', GaussianNB()), 
              ('nb2', GaussianNB()),
              ('nb3', GaussianNB()),
              ('nb4', GaussianNB()),
              ('nb5', GaussianNB())]

In [None]:
estimators = [('knn1', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')), 
              ('knn2', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn3', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn4', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn5', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn6', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn7', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn8', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('nb1', GaussianNB()), 
              ('nb2', GaussianNB()),
              ('nb3', GaussianNB()),
              ('nb4', GaussianNB()),
              ('nb5', GaussianNB()),
              ('nb6', GaussianNB()),
              ('nb7', GaussianNB())]

In [None]:
estimators = [('knn1', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')), 
              ('knn2', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn3', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn4', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn5', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn6', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn7', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn8', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn9', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn10', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('nb1', GaussianNB()), 
              ('nb2', GaussianNB()),
              ('nb3', GaussianNB()),
              ('nb4', GaussianNB()),
              ('nb5', GaussianNB()),
              ('nb6', GaussianNB()),
              ('nb7', GaussianNB()),
              ('nb8', GaussianNB()),
              ('nb9', GaussianNB()),
              ('nb10', GaussianNB())]

In [None]:
sclf = StackingClassifier(estimators=estimators, final_estimator=KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance'))

In [None]:
sclf.fit(X_train, y_train)
y_pred = sclf.predict(X_test)

In [None]:
acc = str(accuracy_score(y_test2, y_pred2))
print("Acurácia:", acc)
cm = confusion_matrix(y_test2, y_pred2)
print("Matriz de confusão: ")
print(cm)
prec = str(precision_score(y_test, y_pred, average= 'macro'))
dp = stdev(y_pred)
print("Precisão: ", prec)
print("Desvio-padrão: ", dp)

Precisão:  0.7486510791366907
Desvio-padrão:  0.0




---
**D:**


In [None]:
estimators = [('ad1', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad2', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad3', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('knn1', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')), 
              ('knn2', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn3', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('nb1', GaussianNB()), 
              ('nb2', GaussianNB()),
              ('nb3', GaussianNB()),
              ('nb4', GaussianNB())]

In [None]:
estimators = [('ad1', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad2', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad3', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad4', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad5', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('knn1', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')), 
              ('knn2', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn3', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn4', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn5', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('nb1', GaussianNB()), 
              ('nb2', GaussianNB()),
              ('nb3', GaussianNB()),
              ('nb4', GaussianNB()),
              ('nb5', GaussianNB())]

In [None]:
estimators = [('ad1', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad2', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad3', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad4', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad5', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('ad6', DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10)),
              ('knn1', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')), 
              ('knn2', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn3', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn4', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn5', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn6', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('knn7', KNeighborsClassifier(n_neighbors=2, metric='euclidean', weights='distance')),
              ('nb1', GaussianNB()), 
              ('nb2', GaussianNB()),
              ('nb3', GaussianNB()),
              ('nb4', GaussianNB()),
              ('nb5', GaussianNB()),
              ('nb6', GaussianNB()),
              ('nb7', GaussianNB())]

In [None]:
sclf = StackingClassifier(estimators=estimators, final_estimator=DecisionTreeClassifier(max_depth = 10, min_samples_split=10, min_samples_leaf=10))

In [None]:
sclf.fit(X_train, y_train)
y_pred = sclf.predict(X_test)

In [None]:
acc = str(accuracy_score(y_test2, y_pred2))
print("Acurácia:", acc)
cm = confusion_matrix(y_test2, y_pred2)
print("Matriz de confusão: ")
print(cm)
prec = str(precision_score(y_test, y_pred, average= 'macro'))
dp = stdev(y_pred)
print("Precisão: ", prec)
print("Desvio-padrão: ", dp)

Precisão:  0.6653165316531653
Desvio-padrão:  0.0


# **18. TESTE ESTATÍSTICOS**

Esses testes foram executados utilizando a linguagem R e os códigos estão no arquivo do relatório final.