# Classificação com TF-IDF

- Versão feita em **21 março de 2024**

- Usando __texto completo__

O arquivo é separado em 2 partes principais: a importação do dataset, e o emprego de classificadores.

- Descrição: *corpus* com pré-processamento (incluindo steamming). É feita a vetorização com TF-IDF e analisado os algoritmos de classificação 'Multinomial Naive Bayes', 'KNN', 'SVM', 'Random Forest', dentre outros. Usa-se um k-fold com k=5. A amostragem **É estratificada**.

- Nr Semente utilizado ==> 42

In [2]:
# dataset.csv   ou  dataset_pre_processado_1.csv  ou  dataset_pre_processado_stem_2.csv
#     CSV1                  CSV2                                   CSV3
# dataset = "dataset_pre_processado_2.csv"
dataset = "dataset_pre_processado_stem_2.csv"

In [3]:
print("Lembre-se estamos usando o dataset: " + dataset)

Lembre-se estamos usando o dataset: dataset_pre_processado_stem_2.csv


Naive Bayes (NB)
e Super Vector Machine (SVM),

#Tarefa de Classificação

### Importando bibliotecas:

In [4]:
from sklearn.metrics import classification_report # metricas de validação

In [5]:
import pandas as pd
import numpy as np
from sklearn.model_selection import StratifiedKFold
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import cross_val_predict, KFold
from sklearn.pipeline import Pipeline
from sklearn.metrics import confusion_matrix, accuracy_score, f1_score
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import LinearSVC
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn import naive_bayes, svm
from sklearn.naive_bayes import ComplementNB
from sklearn.ensemble import AdaBoostClassifier

## Carregando Dataset

In [6]:
df = pd.read_csv(dataset)
df.head(2)

Unnamed: 0,id,titulo,autor,url,tipo_documento,rotulo,resumo,texto
0,61,guerra eletrônica e defesa cibernética na amaz...,"cristiano torres do amaral, edilson vasconcelo...",https://www.sige.ita.br/edicoes-anteriores/201...,Artigo de Simpósio,2,regia amazon possu extens are fronteir biodive...,expansa grup lig facco crimin milic avanc dife...
1,31,centro de avaliações do exército finaliza test...,Noticiário do Exército,https://www.eb.mil.br/web/noticias/noticiario-...,Notícia,3,rio jan rj centr avaliaco exercit caex camp pr...,rio jan rj centr avaliaco exercit caex camp pr...


## Vetorização

- Vetorização:

In [7]:
# Vetorização usando TF-IDF
# vectorizer = TfidfVectorizer()

## Algoritmos de classificação

- Algoritmos de classificação:

In [8]:
# Algoritmos de classificação
classifiers = [
    ('Multinomial Naive Bayes', MultinomialNB()),
    ('Complement Naive Bayes Classifier', ComplementNB()),
    ('KNN', KNeighborsClassifier()),  #n_neighbors default é 5
    ('SVM', svm.SVC( )),
     ('Random Forest', RandomForestClassifier(random_state=42)),
      ('AdaBoost',    AdaBoostClassifier(random_state=42)) #n_estimators default é 50
]

- Criando nosso dataframe para armazenar resultados

In [9]:
lista_classificador_nome = list()
for classifier_name, classifier in classifiers:
    lista_classificador_nome.append(classifier_name)

In [10]:
df_acc = pd.DataFrame(columns=['Classificador','Rodada 1', 'Rodada 2', 'Rodada 3', 'Rodada 4', 'Rodada 5', 'Media'])
df_f1 = pd.DataFrame(columns=['Classificador','Rodada 1', 'Rodada 2', 'Rodada 3', 'Rodada 4', 'Rodada 5', 'Media'])
df_f1_ponderado = pd.DataFrame(columns=['Classificador','Rodada 1', 'Rodada 2', 'Rodada 3', 'Rodada 4', 'Rodada 5', 'Media'])

In [11]:
for classifier_name, classifier in classifiers:
    nova_linha = pd.DataFrame({'Classificador': [classifier_name], 'Rodada 1':[0] , 'Rodada 2':[0], 'Rodada 3':[0], 'Rodada 4':[0], 'Rodada 5':[0], 'Media':[0]})
    df_acc = pd.concat([df_acc, nova_linha], ignore_index=True)
    df_f1 = pd.concat([df_f1, nova_linha], ignore_index=True)
    df_f1_ponderado = pd.concat([df_f1_ponderado, nova_linha], ignore_index=True)

In [12]:
df_f1

Unnamed: 0,Classificador,Rodada 1,Rodada 2,Rodada 3,Rodada 4,Rodada 5,Media
0,Multinomial Naive Bayes,0,0,0,0,0,0
1,Complement Naive Bayes Classifier,0,0,0,0,0,0
2,KNN,0,0,0,0,0,0
3,SVM,0,0,0,0,0,0
4,Random Forest,0,0,0,0,0,0
5,AdaBoost,0,0,0,0,0,0


In [13]:
df_f1_ponderado

Unnamed: 0,Classificador,Rodada 1,Rodada 2,Rodada 3,Rodada 4,Rodada 5,Media
0,Multinomial Naive Bayes,0,0,0,0,0,0
1,Complement Naive Bayes Classifier,0,0,0,0,0,0
2,KNN,0,0,0,0,0,0
3,SVM,0,0,0,0,0,0
4,Random Forest,0,0,0,0,0,0
5,AdaBoost,0,0,0,0,0,0


In [14]:
df_acc

Unnamed: 0,Classificador,Rodada 1,Rodada 2,Rodada 3,Rodada 4,Rodada 5,Media
0,Multinomial Naive Bayes,0,0,0,0,0,0
1,Complement Naive Bayes Classifier,0,0,0,0,0,0
2,KNN,0,0,0,0,0,0
3,SVM,0,0,0,0,0,0
4,Random Forest,0,0,0,0,0,0
5,AdaBoost,0,0,0,0,0,0


* K-fold

Perceba que ** FOI FEITA UMA AMOSTRAGEM ESTRATIFICADA**

In [15]:
# Avaliação dos modelos usando k-fold
k = 5
# kf = KFold(n_splits=k, shuffle=True, random_state=42)
kf = StratifiedKFold(n_splits=k, shuffle=True, random_state=42)

- Criando uma função de avaliação:

 Usei como parâmetro para **average** o **'macro'** para o f1, e o **'weighted'** para o f1-ponderado

**'weighted'**:

Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.

**'macro'**:

Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

 vide [Documentação oficial](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_fscore_support.html)

In [16]:
from sklearn.metrics import confusion_matrix, accuracy_score, f1_score
from sklearn.metrics import precision_recall_fscore_support as score

def evaluate_model(model, X_test, y_test):
    # Predição dos rótulos
    y_pred = model.predict(X_test)
    print(classification_report(y_test, y_pred))

    # Cálculo da matriz de confusão
    cm = confusion_matrix(y_test, y_pred)

    # Cálculo da acurácia
    acc = accuracy_score(y_test, y_pred)

    # Cálculo do F1-score
    f1 = f1_score(y_test, y_pred, average='macro')

    # Cálculo do F1-score
    f1_poderado = f1_score(y_test, y_pred, average='weighted')

    # Outras métricas
    precision, recall, f1score, support = score(y_test, y_pred, average='macro')
    return cm, acc, f1, precision, recall, f1score, support, f1_poderado

In [17]:
classificador=0
for classifier_name, classifier in classifiers:
    print('---', classifier_name, '---')
    y_true = []
    y_pred = []
    contador = 0
    serie_acc = pd.Series()
    serie_f1 = pd.Series()
    serie_f1_ponderado = pd.Series()
    for train_index, test_index in kf.split(df['texto'],df['rotulo']):
        contador +=1
        X_train, X_test = df.iloc[train_index]['texto'], df.iloc[test_index]['texto']
        y_train, y_test = df.iloc[train_index]['rotulo'], df.iloc[test_index]['rotulo']

        # Vetorização dos dados de treinamento
        vectorizer = TfidfVectorizer()
        X_train_vectorized = vectorizer.fit_transform(X_train)

        # Treinamento do modelo
        classifier.fit(X_train_vectorized, y_train)

        # Vetorização dos dados de teste
        X_test_vectorized = vectorizer.transform(X_test)

        # # Predição dos rótulos
        # y_pred.extend(classifier.predict(X_test_vectorized))
        # y_true.extend(y_test)

        cm, acc, f1, precision, recall, f1score, support, f1_poderado = evaluate_model(classifier, X_test_vectorized, y_test)


        print(classifier_name + " Rodada " + str(contador) )
        print('Matriz de Confusão:')
        print(cm)
        print('Acurácia:', acc)
        print('F1-Score:', f1)
        print("outras métricas:")
        print('precision:', precision)
        print('recall:', recall)
        print('f1score:', f1score)
        print('support:', support)
        print('-------------------------------------')
        # serie_acc = serie_acc.append(pd.Series([acc]))
        serie_acc = pd.concat([serie_acc, pd.Series([acc])])
        # serie_f1 = serie_f1.append(pd.Series([f1]))
        serie_f1 = pd.concat([serie_f1, pd.Series([f1])])
        serie_f1_ponderado = pd.concat([serie_f1_ponderado, pd.Series([f1_poderado])])


    # Avaliação do modelo: Aqui estamos inserindo os valores das medias na serie
    media_acc = serie_acc[:5].mean()
    media_f1 = serie_f1[:5].mean()
    media_f1_ponderado = serie_f1_ponderado[:5].mean()
    # serie_acc = serie_acc.append(pd.Series([media_acc]))
    # serie_f1 = serie_f1.append(pd.Series([media_f1]))
    serie_acc = pd.concat([serie_acc, pd.Series([media_acc])])
    serie_f1 = pd.concat([serie_f1, pd.Series([media_f1])])
    serie_f1_ponderado = pd.concat([serie_f1_ponderado, pd.Series([media_f1_ponderado])])

    # print("Acurácia: " )
    # print(serie_acc)
    # print("F-1: " )
    # print(serie_f1)
    df_acc.loc[classificador, ['Rodada 1', 'Rodada 2', 'Rodada 3', 'Rodada 4', 'Rodada 5', 'Media']] = serie_acc.values
    df_f1.loc[classificador, ['Rodada 1', 'Rodada 2', 'Rodada 3', 'Rodada 4', 'Rodada 5', 'Media']] = serie_f1.values
    df_f1_ponderado.loc[classificador, ['Rodada 1', 'Rodada 2', 'Rodada 3', 'Rodada 4', 'Rodada 5', 'Media']] = serie_f1_ponderado.values
    classificador+=1
    print("=======================================================================================")
    # cm = confusion_matrix(y_true, y_pred)
    # acc = accuracy_score(y_true, y_pred)

--- Multinomial Naive Bayes ---


  serie_acc = pd.Series()
  serie_f1 = pd.Series()
  serie_f1_ponderado = pd.Series()
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           1       0.66      1.00      0.79        19
           2       0.00      0.00      0.00         9
           3       0.80      0.67      0.73         6

    accuracy                           0.68        34
   macro avg       0.49      0.56      0.51        34
weighted avg       0.51      0.68      0.57        34

Multinomial Naive Bayes Rodada 1
Matriz de Confusão:
[[19  0  0]
 [ 8  0  1]
 [ 2  0  4]]
Acurácia: 0.6764705882352942
F1-Score: 0.5063131313131313
outras métricas:
precision: 0.4850574712643678
recall: 0.5555555555555555
f1score: 0.5063131313131313
support: None
-------------------------------------


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           1       0.68      1.00      0.81        19
           2       0.00      0.00      0.00         8
           3       0.67      0.57      0.62         7

    accuracy                           0.68        34
   macro avg       0.45      0.52      0.47        34
weighted avg       0.52      0.68      0.58        34

Multinomial Naive Bayes Rodada 2
Matriz de Confusão:
[[19  0  0]
 [ 6  0  2]
 [ 3  0  4]]
Acurácia: 0.6764705882352942
F1-Score: 0.47463175122749596
outras métricas:
precision: 0.44841269841269843
recall: 0.5238095238095238
f1score: 0.47463175122749596
support: None
-------------------------------------


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           1       0.73      1.00      0.84        19
           2       0.00      0.00      0.00         8
           3       0.75      0.86      0.80         7

    accuracy                           0.74        34
   macro avg       0.49      0.62      0.55        34
weighted avg       0.56      0.74      0.64        34

Multinomial Naive Bayes Rodada 3
Matriz de Confusão:
[[19  0  0]
 [ 6  0  2]
 [ 1  0  6]]
Acurácia: 0.7352941176470589
F1-Score: 0.5481481481481482
outras métricas:
precision: 0.4935897435897436
recall: 0.6190476190476191
f1score: 0.5481481481481482
support: None
-------------------------------------


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           1       0.62      1.00      0.77        18
           2       0.00      0.00      0.00         8
           3       1.00      0.57      0.73         7

    accuracy                           0.67        33
   macro avg       0.54      0.52      0.50        33
weighted avg       0.55      0.67      0.57        33

Multinomial Naive Bayes Rodada 4
Matriz de Confusão:
[[18  0  0]
 [ 8  0  0]
 [ 3  0  4]]
Acurácia: 0.6666666666666666
F1-Score: 0.4977433913604126
outras métricas:
precision: 0.5402298850574713
recall: 0.5238095238095238
f1score: 0.4977433913604126
support: None
-------------------------------------


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  serie_acc = pd.Series()
  serie_f1 = pd.Series()
  serie_f1_ponderado = pd.Series()


              precision    recall  f1-score   support

           1       0.63      1.00      0.78        19
           2       0.00      0.00      0.00         8
           3       1.00      0.50      0.67         6

    accuracy                           0.67        33
   macro avg       0.54      0.50      0.48        33
weighted avg       0.55      0.67      0.57        33

Multinomial Naive Bayes Rodada 5
Matriz de Confusão:
[[19  0  0]
 [ 8  0  0]
 [ 3  0  3]]
Acurácia: 0.6666666666666666
F1-Score: 0.4807256235827664
outras métricas:
precision: 0.5444444444444444
recall: 0.5
f1score: 0.4807256235827664
support: None
-------------------------------------
--- Complement Naive Bayes Classifier ---


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           1       0.68      1.00      0.81        19
           2       0.00      0.00      0.00         9
           3       0.83      0.83      0.83         6

    accuracy                           0.71        34
   macro avg       0.50      0.61      0.55        34
weighted avg       0.53      0.71      0.60        34

Complement Naive Bayes Classifier Rodada 1
Matriz de Confusão:
[[19  0  0]
 [ 8  0  1]
 [ 1  0  5]]
Acurácia: 0.7058823529411765
F1-Score: 0.5472813238770686
outras métricas:
precision: 0.503968253968254
recall: 0.6111111111111112
f1score: 0.5472813238770686
support: None
-------------------------------------


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           1       0.70      1.00      0.83        19
           2       0.00      0.00      0.00         8
           3       0.57      0.57      0.57         7

    accuracy                           0.68        34
   macro avg       0.43      0.52      0.47        34
weighted avg       0.51      0.68      0.58        34

Complement Naive Bayes Classifier Rodada 2
Matriz de Confusão:
[[19  0  0]
 [ 5  0  3]
 [ 3  0  4]]
Acurácia: 0.6764705882352942
F1-Score: 0.4658385093167701
outras métricas:
precision: 0.42504409171075835
recall: 0.5238095238095238
f1score: 0.4658385093167701
support: None
-------------------------------------


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           1       0.73      1.00      0.84        19
           2       0.00      0.00      0.00         8
           3       0.75      0.86      0.80         7

    accuracy                           0.74        34
   macro avg       0.49      0.62      0.55        34
weighted avg       0.56      0.74      0.64        34

Complement Naive Bayes Classifier Rodada 3
Matriz de Confusão:
[[19  0  0]
 [ 6  0  2]
 [ 1  0  6]]
Acurácia: 0.7352941176470589
F1-Score: 0.5481481481481482
outras métricas:
precision: 0.4935897435897436
recall: 0.6190476190476191
f1score: 0.5481481481481482
support: None
-------------------------------------


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           1       0.67      1.00      0.80        18
           2       0.00      0.00      0.00         8
           3       1.00      0.86      0.92         7

    accuracy                           0.73        33
   macro avg       0.56      0.62      0.57        33
weighted avg       0.58      0.73      0.63        33

Complement Naive Bayes Classifier Rodada 4
Matriz de Confusão:
[[18  0  0]
 [ 8  0  0]
 [ 1  0  6]]
Acurácia: 0.7272727272727273
F1-Score: 0.5743589743589744
outras métricas:
precision: 0.5555555555555555
recall: 0.6190476190476191
f1score: 0.5743589743589744
support: None
-------------------------------------


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  serie_acc = pd.Series()
  serie_f1 = pd.Series()
  serie_f1_ponderado = pd.Series()


              precision    recall  f1-score   support

           1       0.70      1.00      0.83        19
           2       0.00      0.00      0.00         8
           3       1.00      1.00      1.00         6

    accuracy                           0.76        33
   macro avg       0.57      0.67      0.61        33
weighted avg       0.59      0.76      0.66        33

Complement Naive Bayes Classifier Rodada 5
Matriz de Confusão:
[[19  0  0]
 [ 8  0  0]
 [ 0  0  6]]
Acurácia: 0.7575757575757576
F1-Score: 0.6086956521739131
outras métricas:
precision: 0.5679012345679012
recall: 0.6666666666666666
f1score: 0.6086956521739131
support: None
-------------------------------------
--- KNN ---
              precision    recall  f1-score   support

           1       0.69      0.95      0.80        19
           2       0.50      0.11      0.18         9
           3       0.83      0.83      0.83         6

    accuracy                           0.71        34
   macro avg       0.68

  serie_acc = pd.Series()
  serie_f1 = pd.Series()
  serie_f1_ponderado = pd.Series()
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           1       0.66      1.00      0.79        19
           2       0.00      0.00      0.00         9
           3       0.80      0.67      0.73         6

    accuracy                           0.68        34
   macro avg       0.49      0.56      0.51        34
weighted avg       0.51      0.68      0.57        34

SVM Rodada 1
Matriz de Confusão:
[[19  0  0]
 [ 8  0  1]
 [ 2  0  4]]
Acurácia: 0.6764705882352942
F1-Score: 0.5063131313131313
outras métricas:
precision: 0.4850574712643678
recall: 0.5555555555555555
f1score: 0.5063131313131313
support: None
-------------------------------------


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           1       0.68      1.00      0.81        19
           2       0.00      0.00      0.00         8
           3       0.67      0.57      0.62         7

    accuracy                           0.68        34
   macro avg       0.45      0.52      0.47        34
weighted avg       0.52      0.68      0.58        34

SVM Rodada 2
Matriz de Confusão:
[[19  0  0]
 [ 6  0  2]
 [ 3  0  4]]
Acurácia: 0.6764705882352942
F1-Score: 0.47463175122749596
outras métricas:
precision: 0.44841269841269843
recall: 0.5238095238095238
f1score: 0.47463175122749596
support: None
-------------------------------------
              precision    recall  f1-score   support

           1       0.76      1.00      0.86        19
           2       1.00      0.12      0.22         8
           3       0.75      0.86      0.80         7

    accuracy                           0.76        34
   macro avg       0.84      0.66      0.63        34
weighted

  serie_acc = pd.Series()
  serie_f1 = pd.Series()
  serie_f1_ponderado = pd.Series()
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           1       0.68      1.00      0.81        19
           2       0.00      0.00      0.00         9
           3       0.83      0.83      0.83         6

    accuracy                           0.71        34
   macro avg       0.50      0.61      0.55        34
weighted avg       0.53      0.71      0.60        34

Random Forest Rodada 1
Matriz de Confusão:
[[19  0  0]
 [ 8  0  1]
 [ 1  0  5]]
Acurácia: 0.7058823529411765
F1-Score: 0.5472813238770686
outras métricas:
precision: 0.503968253968254
recall: 0.6111111111111112
f1score: 0.5472813238770686
support: None
-------------------------------------


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           1       0.70      1.00      0.83        19
           2       0.00      0.00      0.00         8
           3       0.57      0.57      0.57         7

    accuracy                           0.68        34
   macro avg       0.43      0.52      0.47        34
weighted avg       0.51      0.68      0.58        34

Random Forest Rodada 2
Matriz de Confusão:
[[19  0  0]
 [ 5  0  3]
 [ 3  0  4]]
Acurácia: 0.6764705882352942
F1-Score: 0.4658385093167701
outras métricas:
precision: 0.42504409171075835
recall: 0.5238095238095238
f1score: 0.4658385093167701
support: None
-------------------------------------
              precision    recall  f1-score   support

           1       0.76      1.00      0.86        19
           2       1.00      0.12      0.22         8
           3       0.75      0.86      0.80         7

    accuracy                           0.76        34
   macro avg       0.84      0.66      0.63        34


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           1       0.67      1.00      0.80        18
           2       0.00      0.00      0.00         8
           3       1.00      0.86      0.92         7

    accuracy                           0.73        33
   macro avg       0.56      0.62      0.57        33
weighted avg       0.58      0.73      0.63        33

Random Forest Rodada 4
Matriz de Confusão:
[[18  0  0]
 [ 8  0  0]
 [ 1  0  6]]
Acurácia: 0.7272727272727273
F1-Score: 0.5743589743589744
outras métricas:
precision: 0.5555555555555555
recall: 0.6190476190476191
f1score: 0.5743589743589744
support: None
-------------------------------------
              precision    recall  f1-score   support

           1       0.74      0.89      0.81        19
           2       0.50      0.25      0.33         8
           3       1.00      1.00      1.00         6

    accuracy                           0.76        33
   macro avg       0.75      0.71      0.71        33
w

  serie_acc = pd.Series()
  serie_f1 = pd.Series()
  serie_f1_ponderado = pd.Series()


              precision    recall  f1-score   support

           1       0.79      0.79      0.79        19
           2       0.45      0.56      0.50         9
           3       1.00      0.67      0.80         6

    accuracy                           0.71        34
   macro avg       0.75      0.67      0.70        34
weighted avg       0.74      0.71      0.71        34

AdaBoost Rodada 1
Matriz de Confusão:
[[15  4  0]
 [ 4  5  0]
 [ 0  2  4]]
Acurácia: 0.7058823529411765
F1-Score: 0.6964912280701755
outras métricas:
precision: 0.7480063795853269
recall: 0.6705653021442495
f1score: 0.6964912280701755
support: None
-------------------------------------
              precision    recall  f1-score   support

           1       0.80      0.84      0.82        19
           2       0.25      0.25      0.25         8
           3       0.67      0.57      0.62         7

    accuracy                           0.65        34
   macro avg       0.57      0.55      0.56        34
weight

In [18]:
df_acc

Unnamed: 0,Classificador,Rodada 1,Rodada 2,Rodada 3,Rodada 4,Rodada 5,Media
0,Multinomial Naive Bayes,0.676471,0.676471,0.735294,0.666667,0.666667,0.684314
1,Complement Naive Bayes Classifier,0.705882,0.676471,0.735294,0.727273,0.757576,0.720499
2,KNN,0.705882,0.705882,0.764706,0.727273,0.606061,0.701961
3,SVM,0.676471,0.676471,0.764706,0.69697,0.727273,0.708378
4,Random Forest,0.705882,0.676471,0.764706,0.727273,0.757576,0.726381
5,AdaBoost,0.705882,0.647059,0.529412,0.363636,0.363636,0.521925


In [19]:
df_f1

Unnamed: 0,Classificador,Rodada 1,Rodada 2,Rodada 3,Rodada 4,Rodada 5,Media
0,Multinomial Naive Bayes,0.506313,0.474632,0.548148,0.497743,0.480726,0.501512
1,Complement Naive Bayes Classifier,0.547281,0.465839,0.548148,0.574359,0.608696,0.548865
2,KNN,0.605051,0.587302,0.663895,0.63814,0.627981,0.624474
3,SVM,0.506313,0.474632,0.62862,0.571258,0.657505,0.567666
4,Random Forest,0.547281,0.465839,0.62862,0.574359,0.714286,0.586077
5,AdaBoost,0.696491,0.561966,0.508614,0.417243,0.325042,0.501871


In [20]:
df_f1_ponderado

Unnamed: 0,Classificador,Rodada 1,Rodada 2,Rodada 3,Rodada 4,Rodada 5,Media
0,Multinomial Naive Bayes,0.570744,0.578512,0.636601,0.572065,0.567718,0.585128
1,Complement Naive Bayes Classifier,0.598874,0.579284,0.636601,0.632168,0.657444,0.620874
2,KNN,0.642246,0.67507,0.730648,0.68144,0.612324,0.668345
3,SVM,0.570744,0.578512,0.699614,0.627094,0.681145,0.631422
4,Random Forest,0.598874,0.579284,0.699614,0.632168,0.728716,0.647731
5,AdaBoost,0.714706,0.644042,0.549271,0.413987,0.395549,0.543511
