## Avaliação - Classificação Automática de Segmentos de Imagens 

Nesta prática você irá avaliar um dataset de 1.500 segmentos de imagens. Nesse projeto, cada instancia representa um segmento de 3x3 pixels de uma imagem de algum dos seguintes elementos:

<img src="segments.png" alt="Imagens que foram seguementadas">

Assim, esta tarefa consiste em classificar tais segmentos de 3x3 pixels em um dos tipos de imagens externas (cimento, janela, grama, etc.). Cada instancia é representada da seguinte forma: 

<ol>
    <li>region-centroid-col:  média do valor dos pixels (coluna) </li>
    <li>region-centroid-row:  média do valor dos pixels (linha) </li>
    <li>region-pixel-count:  o número de pixels em uma região(3x3 = 9 neste caso) </li>
    <li>short-line-density-5: resultados de uma linha extraída no algoritmo que conta quantas linhas de comprimento 5 (qualquer orientação) com baixo contraste, menor ou igual a 5, passam pela região. </li>
    <li>short-line-density-2:  igual a densidade de linha curta-5, mas conta linhas de alto contraste, maiores que 2 </li>
    <li>vedge-mean: mede o contraste de pixels adjacentes horizontalmente na região. Existem 6, a média e o desvio padrão são dados. Este atributo é usado como um detector de borda vertical.</li>
    <li>vegde-sd: desvio padrão do contraste de pixels adjacentes horizontalmente </li>
    <li>hedge-mean: mede o contraste de pixels adjacentes verticalmente. Usado para detecção de linha horizontal. </li>
    <li>hedge-sd: desvio padrão do contraste de pixels adjacentes verticalmente.</li>
    <li>intensity-mean:  a média na região de (R + G + B) / 3 </li>
    <li>rawred-mean: a média sobre a região do valor R (cor vermelha) </li>
    <li>rawblue-mean: a média sobre a região do valor B (cor azul) </li>
    <li>rawgreen-mean: a média sobre a região do valor G (cor verde) </li>
    <li>exred-mean: mede o excesso de vermelho: (2R - (G + B)) </li>
    <li>exblue-mean: mede o excesso de azul: (2B - (G + R)) </li>
    <li>exgreen-mean: mede o excesso de verde:  (2G - (R + B)) </li>
    <li>value-mean: transformação não-linear 3-d de RGB </li>
    <li>saturatoin-mean: média de saturação do RGB</li>
    <li>hue-mean: média de tonalidade do RGB </li>
    <b><li style="color: red">y-i: classe a ser inferida (ver figura acima)</li></b>
</ol>

<a href="https://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/segment-challenge.arff">**Referência**</a>

**Atividade 7 - Leitura do Dataset e criação dos folds:** Leia o dataset [`segment.csv`](segment.csv). Faça a validação cruzada de 5 partições.

In [1]:
from resultado import Fold
import pandas as pd

dataframe = pd.read_csv('segment.csv')

folds = Fold.gerar_k_folds(dataframe,val_k=5,col_classe="y-i", num_repeticoes=1, seed=1, num_folds_validacao=3,num_repeticoes_validacao=2)

**Atividade 8 - Variação de parametros e exibição e analise de resultados: ** Aplique, pelo menos, os métodos RandomForest e Árvore de Decisão no problema variando os parametros (no mínimo, da mesma forma que foi variado na Parte 2). Apresente os resultados faça uma analise e responda, pelo menos: quais são as classes mais dificieis/fácieis de prever? Quais se confundem mais? Qual é o melhor método de classificação? Quais são os melhores parametros para cada método de aprendizado de máquina?

Para fazer a análise por classe, use as predições de todos os folds (apenas uma repetição) e gere a matriz de confusão. Qualquer dúvida, veja a aula sobre avaliação de métodos de aprendizado de máquina. A classe Resultado implementa essa matriz. 

In [2]:
from avaliacao import Experimento, OtimizacaoObjetivoRandomForest, OtimizacaoObjetivoArvoreDecisao
from metodo import ScikitLearnAprendizadoDeMaquina
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
import optuna

clf_dtree = RandomForestClassifier(random_state=1)
ml_method = ScikitLearnAprendizadoDeMaquina(clf_dtree)
exp_random_forest = Experimento(folds,ml_method, OtimizacaoObjetivoRandomForest, num_trials=10,
                    sampler=optuna.samplers.TPESampler(seed=1, n_startup_trials=3))
result_random_forest = exp_random_forest.calcula_resultados()

dtree = DecisionTreeClassifier(random_state=1)
ml_method = ScikitLearnAprendizadoDeMaquina(dtree)
exp_decision_tree = Experimento(folds,ml_method, OtimizacaoObjetivoArvoreDecisao, num_trials=10,
                    sampler=optuna.samplers.TPESampler(seed=1, n_startup_trials=3))
result_decision_tree = exp_decision_tree.calcula_resultados()























In [3]:
result_random_forest

[<resultado.Resultado at 0x7fd582cacf60>,
 <resultado.Resultado at 0x7fd582cac860>,
 <resultado.Resultado at 0x7fd582ce4e10>,
 <resultado.Resultado at 0x7fd582c538d0>,
 <resultado.Resultado at 0x7fd582c424e0>]

In [4]:
result_decision_tree

[<resultado.Resultado at 0x7fd582c08630>,
 <resultado.Resultado at 0x7fd582c53fd0>,
 <resultado.Resultado at 0x7fd582c705c0>,
 <resultado.Resultado at 0x7fd582bf8da0>,
 <resultado.Resultado at 0x7fd582bf8518>]

In [5]:
studdy_fold_0_random_forest = exp_random_forest.studies_per_fold[0]
studdy_fold_0_random_forest.trials_dataframe().sort_values("value",ascending=False)

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_max_features,params_min_samples_split,params_num_arvores,state
6,6,0.942922,2021-03-14 23:14:17.943976,2021-03-14 23:14:18.020754,0 days 00:00:00.076778,0.498236,0.008141,3,COMPLETE
5,5,0.940839,2021-03-14 23:14:17.856353,2021-03-14 23:14:17.943937,0 days 00:00:00.087584,0.489083,0.015217,4,COMPLETE
7,7,0.93429,2021-03-14 23:14:18.020795,2021-03-14 23:14:18.094672,0 days 00:00:00.073877,0.495727,0.024054,3,COMPLETE
9,9,0.87873,2021-03-14 23:14:18.157523,2021-03-14 23:14:18.195865,0 days 00:00:00.038342,0.392813,0.094162,1,COMPLETE
2,2,0.837582,2021-03-14 23:14:17.633059,2021-03-14 23:14:17.705548,0 days 00:00:00.072489,0.193955,0.19829,5,COMPLETE
1,1,0.776977,2021-03-14 23:14:17.598060,2021-03-14 23:14:17.633016,0 days 00:00:00.034956,0.073378,0.151166,1,COMPLETE
0,0,0.776347,2021-03-14 23:14:17.439827,2021-03-14 23:14:17.598025,0 days 00:00:00.158198,0.360162,0.208511,2,COMPLETE
8,8,0.705014,2021-03-14 23:14:18.094713,2021-03-14 23:14:18.157482,0 days 00:00:00.062769,0.370631,0.33851,3,COMPLETE
4,4,0.499388,2021-03-14 23:14:17.779804,2021-03-14 23:14:17.856311,0 days 00:00:00.076507,0.248497,0.39841,5,COMPLETE
3,3,0.441788,2021-03-14 23:14:17.705610,2021-03-14 23:14:17.779762,0 days 00:00:00.074152,0.138669,0.49396,5,COMPLETE


In [6]:
studdy_fold_1_random_forest = exp_random_forest.studies_per_fold[1]
studdy_fold_1_random_forest.trials_dataframe().sort_values("value",ascending=False)

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_max_features,params_min_samples_split,params_num_arvores,state
2,2,0.931895,2021-03-14 23:14:18.347556,2021-03-14 23:14:18.398415,0 days 00:00:00.050859,0.335234,0.013694,2,COMPLETE
8,8,0.920476,2021-03-14 23:14:18.659958,2021-03-14 23:14:18.710966,0 days 00:00:00.051008,0.365232,0.078967,2,COMPLETE
3,3,0.907754,2021-03-14 23:14:18.398454,2021-03-14 23:14:18.443710,0 days 00:00:00.045256,0.48083,0.01936,1,COMPLETE
4,4,0.890048,2021-03-14 23:14:18.443749,2021-03-14 23:14:18.483484,0 days 00:00:00.039735,0.439881,0.022577,1,COMPLETE
6,6,0.817871,2021-03-14 23:14:18.538216,2021-03-14 23:14:18.586783,0 days 00:00:00.048567,0.312482,0.192539,2,COMPLETE
0,0,0.813989,2021-03-14 23:14:18.210437,2021-03-14 23:14:18.269913,0 days 00:00:00.059476,0.269408,0.198384,3,COMPLETE
1,1,0.61469,2021-03-14 23:14:18.269945,2021-03-14 23:14:18.347518,0 days 00:00:00.077573,0.221726,0.262274,5,COMPLETE
5,5,0.560922,2021-03-14 23:14:18.483524,2021-03-14 23:14:18.538175,0 days 00:00:00.054651,0.014002,0.456304,3,COMPLETE
9,9,0.554181,2021-03-14 23:14:18.711024,2021-03-14 23:14:18.776814,0 days 00:00:00.065790,0.145948,0.366877,4,COMPLETE
7,7,0.423106,2021-03-14 23:14:18.586821,2021-03-14 23:14:18.659917,0 days 00:00:00.073096,0.123942,0.482002,5,COMPLETE


In [7]:
studdy_fold_2_random_forest = exp_random_forest.studies_per_fold[2]
studdy_fold_2_random_forest.trials_dataframe().sort_values("value",ascending=False)

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_max_features,params_min_samples_split,params_num_arvores,state
5,5,0.950472,2021-03-14 23:14:19.100076,2021-03-14 23:14:19.191979,0 days 00:00:00.091903,0.286631,0.008352,5,COMPLETE
4,4,0.949127,2021-03-14 23:14:19.011660,2021-03-14 23:14:19.100036,0 days 00:00:00.088376,0.315033,0.009035,5,COMPLETE
3,3,0.930991,2021-03-14 23:14:18.921961,2021-03-14 23:14:19.011619,0 days 00:00:00.089658,0.494566,0.067764,4,COMPLETE
8,8,0.86226,2021-03-14 23:14:19.329560,2021-03-14 23:14:19.391449,0 days 00:00:00.061889,0.157225,0.097088,4,COMPLETE
6,6,0.840594,2021-03-14 23:14:19.192019,2021-03-14 23:14:19.270491,0 days 00:00:00.078472,0.272413,0.177396,5,COMPLETE
0,0,0.612636,2021-03-14 23:14:18.786531,2021-03-14 23:14:18.838605,0 days 00:00:00.052074,0.469564,0.215349,2,COMPLETE
7,7,0.584611,2021-03-14 23:14:19.270530,2021-03-14 23:14:19.329519,0 days 00:00:00.058989,0.017719,0.347173,4,COMPLETE
9,9,0.565872,2021-03-14 23:14:19.391490,2021-03-14 23:14:19.454272,0 days 00:00:00.062782,0.373466,0.326631,3,COMPLETE
1,1,0.429516,2021-03-14 23:14:18.838637,2021-03-14 23:14:18.882987,0 days 00:00:00.044350,0.156712,0.484131,2,COMPLETE
2,2,0.37745,2021-03-14 23:14:18.883026,2021-03-14 23:14:18.921922,0 days 00:00:00.038896,0.447303,0.438195,1,COMPLETE


In [8]:
studdy_fold_3_random_forest = exp_random_forest.studies_per_fold[3]
studdy_fold_3_random_forest.trials_dataframe().sort_values("value",ascending=False)

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_max_features,params_min_samples_split,params_num_arvores,state
9,9,0.952563,2021-03-14 23:14:20.067299,2021-03-14 23:14:20.145551,0 days 00:00:00.078252,0.309274,0.007426,4,COMPLETE
7,7,0.951439,2021-03-14 23:14:19.926458,2021-03-14 23:14:20.018517,0 days 00:00:00.092059,0.333197,0.000217,5,COMPLETE
5,5,0.924012,2021-03-14 23:14:19.791266,2021-03-14 23:14:19.888087,0 days 00:00:00.096821,0.465265,0.075027,5,COMPLETE
3,3,0.910615,2021-03-14 23:14:19.648002,2021-03-14 23:14:19.749778,0 days 00:00:00.101776,0.438629,0.093769,5,COMPLETE
4,4,0.908996,2021-03-14 23:14:19.749819,2021-03-14 23:14:19.791227,0 days 00:00:00.041408,0.476507,0.033337,1,COMPLETE
8,8,0.760547,2021-03-14 23:14:20.018557,2021-03-14 23:14:20.067258,0 days 00:00:00.048701,0.343318,0.241266,2,COMPLETE
6,6,0.688855,2021-03-14 23:14:19.888127,2021-03-14 23:14:19.926418,0 days 00:00:00.038291,0.353128,0.169332,1,COMPLETE
2,2,0.684419,2021-03-14 23:14:19.585942,2021-03-14 23:14:19.647961,0 days 00:00:00.062019,0.205894,0.335827,4,COMPLETE
1,1,0.603046,2021-03-14 23:14:19.528593,2021-03-14 23:14:19.585903,0 days 00:00:00.057310,0.049173,0.439071,4,COMPLETE
0,0,0.588981,2021-03-14 23:14:19.471736,2021-03-14 23:14:19.528559,0 days 00:00:00.056823,0.136525,0.414802,3,COMPLETE


In [9]:
studdy_fold_4_random_forest = exp_random_forest.studies_per_fold[4]
studdy_fold_4_random_forest.trials_dataframe().sort_values("value",ascending=False)

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_max_features,params_min_samples_split,params_num_arvores,state
4,4,0.920924,2021-03-14 23:14:20.411193,2021-03-14 23:14:20.451642,0 days 00:00:00.040449,0.498985,0.007528,1,COMPLETE
3,3,0.920041,2021-03-14 23:14:20.367547,2021-03-14 23:14:20.411152,0 days 00:00:00.043605,0.483283,0.003559,1,COMPLETE
8,8,0.898507,2021-03-14 23:14:20.587012,2021-03-14 23:14:20.637461,0 days 00:00:00.050449,0.367386,0.096025,2,COMPLETE
5,5,0.89423,2021-03-14 23:14:20.451682,2021-03-14 23:14:20.489947,0 days 00:00:00.038265,0.362174,0.014819,1,COMPLETE
6,6,0.834328,2021-03-14 23:14:20.489985,2021-03-14 23:14:20.537178,0 days 00:00:00.047193,0.346885,0.145569,2,COMPLETE
7,7,0.763572,2021-03-14 23:14:20.537217,2021-03-14 23:14:20.586972,0 days 00:00:00.049755,0.054822,0.175662,3,COMPLETE
9,9,0.652822,2021-03-14 23:14:20.637503,2021-03-14 23:14:20.680302,0 days 00:00:00.042799,0.000314,0.256886,2,COMPLETE
2,2,0.649207,2021-03-14 23:14:20.286499,2021-03-14 23:14:20.367511,0 days 00:00:00.081012,0.494431,0.375072,5,COMPLETE
1,1,0.631336,2021-03-14 23:14:20.227964,2021-03-14 23:14:20.286461,0 days 00:00:00.058497,0.206269,0.391657,4,COMPLETE
0,0,0.5435,2021-03-14 23:14:20.159304,2021-03-14 23:14:20.227932,0 days 00:00:00.068628,0.157758,0.345939,5,COMPLETE


In [10]:
studdy_fold_0_decision_tree = exp_decision_tree.studies_per_fold[0]
studdy_fold_0_decision_tree.trials_dataframe().sort_values("value",ascending=False)

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_min_samples_split,state
3,3,0.935144,2021-03-14 23:14:20.831707,2021-03-14 23:14:20.884232,0 days 00:00:00.052525,0.009897,COMPLETE
2,2,0.933262,2021-03-14 23:14:20.778017,2021-03-14 23:14:20.831668,0 days 00:00:00.053651,5.7e-05,COMPLETE
4,4,0.920042,2021-03-14 23:14:20.884272,2021-03-14 23:14:20.936868,0 days 00:00:00.052596,0.034286,COMPLETE
7,7,0.917701,2021-03-14 23:14:21.025771,2021-03-14 23:14:21.076043,0 days 00:00:00.050272,0.092008,COMPLETE
9,9,0.91721,2021-03-14 23:14:21.119810,2021-03-14 23:14:21.168769,0 days 00:00:00.048959,0.108072,COMPLETE
5,5,0.877954,2021-03-14 23:14:20.936906,2021-03-14 23:14:20.983278,0 days 00:00:00.046372,0.172393,COMPLETE
0,0,0.854085,2021-03-14 23:14:20.688859,2021-03-14 23:14:20.731796,0 days 00:00:00.042937,0.208511,COMPLETE
1,1,0.722224,2021-03-14 23:14:20.731830,2021-03-14 23:14:20.777979,0 days 00:00:00.046149,0.360162,COMPLETE
8,8,0.722224,2021-03-14 23:14:21.076082,2021-03-14 23:14:21.119766,0 days 00:00:00.043684,0.328414,COMPLETE
6,6,0.576178,2021-03-14 23:14:20.983318,2021-03-14 23:14:21.025732,0 days 00:00:00.042414,0.475424,COMPLETE


In [11]:
studdy_fold_1_decision_tree = exp_decision_tree.studies_per_fold[1]
studdy_fold_1_decision_tree.trials_dataframe().sort_values("value",ascending=False)

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_min_samples_split,state
5,5,0.938405,2021-03-14 23:14:21.405063,2021-03-14 23:14:21.456388,0 days 00:00:00.051325,0.008212,COMPLETE
7,7,0.933644,2021-03-14 23:14:21.498838,2021-03-14 23:14:21.553290,0 days 00:00:00.054452,0.002846,COMPLETE
2,2,0.922535,2021-03-14 23:14:21.274893,2021-03-14 23:14:21.322901,0 days 00:00:00.048008,0.046169,COMPLETE
1,1,0.92216,2021-03-14 23:14:21.226359,2021-03-14 23:14:21.274851,0 days 00:00:00.048492,0.073378,COMPLETE
0,0,0.898642,2021-03-14 23:14:21.180944,2021-03-14 23:14:21.226324,0 days 00:00:00.045380,0.151166,COMPLETE
8,8,0.845011,2021-03-14 23:14:21.553343,2021-03-14 23:14:21.594591,0 days 00:00:00.041248,0.216948,COMPLETE
6,6,0.75653,2021-03-14 23:14:21.456427,2021-03-14 23:14:21.498774,0 days 00:00:00.042347,0.259127,COMPLETE
9,9,0.633698,2021-03-14 23:14:21.594649,2021-03-14 23:14:21.636945,0 days 00:00:00.042296,0.357812,COMPLETE
4,4,0.60781,2021-03-14 23:14:21.363266,2021-03-14 23:14:21.405023,0 days 00:00:00.041757,0.379185,COMPLETE
3,3,0.490679,2021-03-14 23:14:21.322936,2021-03-14 23:14:21.363226,0 days 00:00:00.040290,0.480754,COMPLETE


In [12]:
studdy_fold_2_decision_tree = exp_decision_tree.studies_per_fold[2]
studdy_fold_2_decision_tree.trials_dataframe().sort_values("value",ascending=False)

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_min_samples_split,state
5,5,0.939874,2021-03-14 23:14:21.868149,2021-03-14 23:14:21.918281,0 days 00:00:00.050132,0.0047,COMPLETE
7,7,0.932206,2021-03-14 23:14:21.958299,2021-03-14 23:14:22.008742,0 days 00:00:00.050443,0.013009,COMPLETE
0,0,0.922393,2021-03-14 23:14:21.648009,2021-03-14 23:14:21.694672,0 days 00:00:00.046663,0.09313,COMPLETE
3,3,0.921827,2021-03-14 23:14:21.777059,2021-03-14 23:14:21.827443,0 days 00:00:00.050384,0.069464,COMPLETE
1,1,0.864919,2021-03-14 23:14:21.694705,2021-03-14 23:14:21.736507,0 days 00:00:00.041802,0.17278,COMPLETE
2,2,0.844837,2021-03-14 23:14:21.736545,2021-03-14 23:14:21.777021,0 days 00:00:00.040476,0.198384,COMPLETE
9,9,0.752699,2021-03-14 23:14:22.052018,2021-03-14 23:14:22.094980,0 days 00:00:00.042962,0.27295,COMPLETE
8,8,0.574032,2021-03-14 23:14:22.008781,2021-03-14 23:14:22.051973,0 days 00:00:00.043192,0.323107,COMPLETE
4,4,0.429315,2021-03-14 23:14:21.827483,2021-03-14 23:14:21.868093,0 days 00:00:00.040610,0.42468,COMPLETE
6,6,0.429315,2021-03-14 23:14:21.918321,2021-03-14 23:14:21.958257,0 days 00:00:00.039936,0.426185,COMPLETE


In [13]:
studdy_fold_3_decision_tree = exp_decision_tree.studies_per_fold[3]
studdy_fold_3_decision_tree.trials_dataframe().sort_values("value",ascending=False)

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_min_samples_split,state
4,4,0.941921,2021-03-14 23:14:22.283561,2021-03-14 23:14:22.333457,0 days 00:00:00.049896,0.002186,COMPLETE
8,8,0.938871,2021-03-14 23:14:22.465915,2021-03-14 23:14:22.515335,0 days 00:00:00.049420,0.004549,COMPLETE
3,3,0.933966,2021-03-14 23:14:22.230639,2021-03-14 23:14:22.283520,0 days 00:00:00.052881,0.023278,COMPLETE
5,5,0.928976,2021-03-14 23:14:22.333496,2021-03-14 23:14:22.381977,0 days 00:00:00.048481,0.038418,COMPLETE
9,9,0.923096,2021-03-14 23:14:22.515373,2021-03-14 23:14:22.560623,0 days 00:00:00.045250,0.120605,COMPLETE
7,7,0.919669,2021-03-14 23:14:22.421605,2021-03-14 23:14:22.465875,0 days 00:00:00.044270,0.131142,COMPLETE
1,1,0.85969,2021-03-14 23:14:22.147229,2021-03-14 23:14:22.188114,0 days 00:00:00.040885,0.209597,COMPLETE
0,0,0.754054,2021-03-14 23:14:22.105802,2021-03-14 23:14:22.147197,0 days 00:00:00.041395,0.269408,COMPLETE
2,2,0.604133,2021-03-14 23:14:22.188152,2021-03-14 23:14:22.230600,0 days 00:00:00.042448,0.34261,COMPLETE
6,6,0.466823,2021-03-14 23:14:22.382013,2021-03-14 23:14:22.421564,0 days 00:00:00.039551,0.45149,COMPLETE


In [14]:
studdy_fold_4_decision_tree = exp_decision_tree.studies_per_fold[4]
studdy_fold_4_decision_tree.trials_dataframe().sort_values("value",ascending=False)

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_min_samples_split,state
2,2,0.936884,2021-03-14 23:14:22.661463,2021-03-14 23:14:22.713508,0 days 00:00:00.052045,0.013694,COMPLETE
3,3,0.936662,2021-03-14 23:14:22.713548,2021-03-14 23:14:22.765047,0 days 00:00:00.051499,0.007736,COMPLETE
6,6,0.936662,2021-03-14 23:14:22.850783,2021-03-14 23:14:22.902476,0 days 00:00:00.051693,0.008007,COMPLETE
0,0,0.910385,2021-03-14 23:14:22.570810,2021-03-14 23:14:22.620624,0 days 00:00:00.049814,0.102226,COMPLETE
8,8,0.889441,2021-03-14 23:14:22.945216,2021-03-14 23:14:22.989761,0 days 00:00:00.044545,0.158797,COMPLETE
4,4,0.824158,2021-03-14 23:14:22.765105,2021-03-14 23:14:22.806680,0 days 00:00:00.041575,0.213167,COMPLETE
7,7,0.661737,2021-03-14 23:14:22.902516,2021-03-14 23:14:22.945177,0 days 00:00:00.042661,0.306127,COMPLETE
9,9,0.661737,2021-03-14 23:14:22.989801,2021-03-14 23:14:23.032364,0 days 00:00:00.042563,0.322728,COMPLETE
1,1,0.522602,2021-03-14 23:14:22.620658,2021-03-14 23:14:22.661424,0 days 00:00:00.040766,0.439059,COMPLETE
5,5,0.522602,2021-03-14 23:14:22.806717,2021-03-14 23:14:22.850741,0 days 00:00:00.044024,0.465631,COMPLETE


In [15]:
[result.mat_confusao for result in result_random_forest]

[array([[35.,  0.,  0.,  1.,  0.,  0.,  0.],
        [ 0., 53.,  0.,  0.,  0.,  0.,  0.],
        [ 1.,  0., 31.,  0.,  4.,  0.,  0.],
        [ 1.,  0.,  1., 47.,  3.,  0.,  0.],
        [ 1.,  0.,  1.,  0., 35.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0., 53.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0., 33.]]),
 array([[41.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0., 47.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0., 36.,  3.,  6.,  0.,  0.],
        [ 1.,  0.,  0., 27.,  1.,  0.,  0.],
        [ 2.,  0.,  8.,  8., 34.,  0.,  0.],
        [ 0.,  0.,  0.,  3.,  0., 46.,  0.],
        [ 0.,  0.,  0.,  1.,  0.,  0., 36.]]),
 array([[35.,  0.,  0.,  2.,  0.,  0.,  0.],
        [ 0., 35.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0., 38.,  0.,  2.,  0.,  0.],
        [ 2.,  0.,  0., 45.,  5.,  1.,  0.],
        [ 1.,  0.,  1.,  0., 36.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0., 45.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0., 52.]]),
 array([[47.,  0.,  0.,  0.,  1.,  0.,  0.],
    

In [16]:
[result.mat_confusao for result in result_decision_tree]

[array([[36.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0., 53.,  0.,  0.,  0.,  0.,  0.],
        [ 2.,  0., 32.,  0.,  2.,  0.,  0.],
        [ 0.,  0.,  1., 50.,  1.,  0.,  0.],
        [ 0.,  0.,  1.,  1., 35.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0., 53.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0., 33.]]),
 array([[40.,  0.,  0.,  0.,  1.,  0.,  0.],
        [ 0., 47.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0., 43.,  1.,  1.,  0.,  0.],
        [ 0.,  0.,  0., 27.,  2.,  0.,  0.],
        [ 0.,  0.,  7.,  5., 40.,  0.,  0.],
        [ 0.,  0.,  2.,  1.,  0., 46.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0., 37.]]),
 array([[36.,  0.,  1.,  0.,  0.,  0.,  0.],
        [ 0., 35.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0., 34.,  0.,  6.,  0.,  0.],
        [ 2.,  0.,  0., 45.,  3.,  3.,  0.],
        [ 0.,  0.,  2.,  0., 36.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0., 45.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0., 52.]]),
 array([[45.,  0.,  2.,  0.,  1.,  0.,  0.],
    

In [17]:
# Acurácia com árvore de decisão
[result.acuracia for result in result_decision_tree]

[0.9733333333333334,
 0.9333333333333333,
 0.9433333333333334,
 0.9333333333333333,
 0.9133333333333333]

In [18]:
# Acurácia com RandomForest
[result.acuracia for result in result_random_forest]

[0.9566666666666667,
 0.89,
 0.9533333333333334,
 0.9566666666666667,
 0.9266666666666666]

É possível perceber que não houve um método que obteve a melhor acurácia em todos os folds, porém, a árvore de decisão obteve resultados mais consistentes.

Observando as matrizes de confusão, percebe-se que as classes 2, 6 e 7 foram as mais fáceis de prever, enquanto as classes 3 e 5 foram as mais difíceis. Essas mesmas 3 e 5 também foram as que mais se confundiram, com várias predições trocadas entre elas.