## Projeto Prático 4

**Universidade do Estado do Amazonas**  
**Escola Superior de Tecnologia**  
**Professora:** Elloá B. Guedes  
**Alunos:** Juliany Raiol, Raí Soledade, Richardson Souza  
**Disciplina:** Redes Neurais Artificiais

## Aprendizado de Máquina com tarefa de classificação aplicado no dataset  de variedades de trigo

### Introdução

Três variedades de trigo (Kama, Rosa e Canadian) possuem sementes muito parecidas,
entretanto diferentes. Um grupo de pesquisadores poloneses coletou 70 amostras de cada
tipo e, usando uma técnica particular de raio-X, coletou medidas geométricas destas
sementes, a citar: área, perímetro, compactude, comprimento, largura, coeficiente de
assimetria e comprimento do sulco da semente.


In [2]:
# Módulos utilizados no projeto

import pandas as pd
import numpy as np
import random
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
import seaborn as sns
import matplotlib.pyplot as plt
% matplotlib inline
sns.set()
import warnings
warnings.filterwarnings('ignore')

Leitura do dataset

In [3]:
names = ["Area", "Perimeter", "Compactness", "Length", "Width", "Asymmetry", "Groove", "Seed"]

df = pd.read_csv('../../data/seeds_dataset.txt', delim_whitespace=True, names = names)

### Treinamento

X = atributos preditores, y = atributo alvo

In [4]:
X = df.drop('Seed', axis=1)
y = df['Seed']

Definição dos parâmetros de taxa de aprendizado, neurônios na camada de entrada e saída, funções de ativação e o alfa da regra da pirâmide geométrica utilizada para calcular a quantidade de neurônios nas camadas ocultas

In [14]:
rate  = [0.01, 0.05]
alpha = [0.5, 2, 3]

neuron_out = 2
neuron_ini = 7
activation_functions = ['identity', 'logistic', 'tanh', 'relu']

Cálculo da quantidade de neurônios nas camadas ocultas utilizando a regra da pirâmide geométrica.

\begin{align}
\dot{N_{h}} & = \alpha.\sqrt{\dot{N_{i} . \dot{N_{o}}}}
\end{align}

<strong> Nh </strong> é o número de neurônios ocultos (a serem distribuídos em uma ou duas camadas
ocultas)

<strong>Ni</strong> é o número de neurônios na camada de entrada

<strong>No</strong> é o número de neurônios
na camada de saída.

In [6]:
n = []
for a in alpha:
    n.append(int( a * np.sqrt((neuron_ini*neuron_out))))
print("Quantidade de neurônios nas camadas ocultas a serem testadas respectivamente: ", n)

Quantidade de neurônios nas camadas ocultas a serem testadas respectivamente:  [1, 7, 11]


Parâmetro que define uma série de combinações de neurônios distribuídos em 1 ou 2 camadas, de acordo com a quantidade de neurônios calculada anteriormente

In [7]:
hidden_layer = [(1,), (7,),(1,6),(2,5),(3,4), (11,),(1,10),(2,9),(3,8),(4,7),(5,6)]

Definição dos parâmetros para inicialização dos modelos

In [8]:
parameters = dict([
                ('hidden_layer_sizes', hidden_layer),
                ('learning_rate_init', rate),
                ('activation', activation_functions)
            ])

No treinamento das redes neurais, o solver escolhido foi o LBFGS pois ele é o que se comporta melhor com datasets com poucos dados. 

In [9]:
clf = GridSearchCV(MLPClassifier(solver='lbfgs'), parameters, iid=True, cv = 3, return_train_score=True)
clf.fit(X, y)

GridSearchCV(cv=3, error_score='raise',
       estimator=MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='lbfgs', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'hidden_layer_sizes': [(1,), (7,), (1, 6), (2, 5), (3, 4), (11,), (1, 10), (2, 9), (3, 8), (4, 7), (5, 6)], 'learning_rate_init': [0.01, 0.05], 'activation': ['identity', 'logistic', 'tanh', 'relu']},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring=None, verbose=0)

Listagem de todas as redes neurais geradas pelo GridSearchCV, com k-fold de tamanho 3.

In [10]:
results = pd.DataFrame.from_dict(clf.cv_results_)
results

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_activation,param_hidden_layer_sizes,param_learning_rate_init,params,split0_test_score,split1_test_score,split2_test_score,mean_test_score,std_test_score,rank_test_score,split0_train_score,split1_train_score,split2_train_score,mean_train_score,std_train_score
0,0.135941,0.141183,0.000357,5.463353e-05,identity,"(1,)",0.01,"{'activation': 'identity', 'hidden_layer_sizes...",0.875000,0.913043,0.753623,0.847619,0.067576,33,0.847826,0.851064,0.914894,0.871261,0.030881
1,0.036299,0.005218,0.000309,9.381164e-06,identity,"(1,)",0.05,"{'activation': 'identity', 'hidden_layer_sizes...",0.861111,0.913043,0.797101,0.857143,0.047081,29,0.891304,0.851064,0.921986,0.888118,0.029041
2,0.047110,0.022264,0.000334,1.857014e-06,identity,"(7,)",0.01,"{'activation': 'identity', 'hidden_layer_sizes...",0.972222,0.956522,0.826087,0.919048,0.065347,12,0.978261,0.971631,1.000000,0.983297,0.012117
3,0.043567,0.003085,0.000365,7.439279e-05,identity,"(7,)",0.05,"{'activation': 'identity', 'hidden_layer_sizes...",0.972222,0.985507,0.840580,0.933333,0.065113,1,0.985507,0.964539,1.000000,0.983349,0.014557
4,0.054199,0.002430,0.000350,1.644954e-05,identity,"(1, 6)",0.01,"{'activation': 'identity', 'hidden_layer_sizes...",0.875000,0.913043,0.753623,0.847619,0.067576,33,0.891304,0.851064,0.914894,0.885754,0.026352
5,0.054986,0.000695,0.000330,1.191351e-05,identity,"(1, 6)",0.05,"{'activation': 'identity', 'hidden_layer_sizes...",0.875000,0.942029,0.753623,0.857143,0.077447,29,0.855072,0.851064,0.914894,0.873677,0.029191
6,0.055007,0.000889,0.000353,2.355965e-05,identity,"(2, 5)",0.01,"{'activation': 'identity', 'hidden_layer_sizes...",0.958333,0.956522,0.710145,0.876190,0.116159,25,0.985507,0.914894,0.957447,0.952616,0.029030
7,0.050661,0.002991,0.000312,5.388940e-06,identity,"(2, 5)",0.05,"{'activation': 'identity', 'hidden_layer_sizes...",0.972222,0.971014,0.826087,0.923810,0.068363,5,0.978261,0.971631,1.000000,0.983297,0.012117
8,0.052154,0.000413,0.000348,4.086497e-05,identity,"(3, 4)",0.01,"{'activation': 'identity', 'hidden_layer_sizes...",0.958333,0.971014,0.840580,0.923810,0.058454,5,0.978261,0.957447,1.000000,0.978569,0.017374
9,0.052812,0.000774,0.000324,2.219329e-05,identity,"(3, 4)",0.05,"{'activation': 'identity', 'hidden_layer_sizes...",0.972222,0.985507,0.739130,0.900000,0.112667,18,0.978261,0.957447,0.992908,0.976205,0.014550


Métricas do modelo que o obteve a melhor média de acurácia entre os folds

In [11]:
print("Melhor média de acurácia entre os folds = " + str(max(results['mean_train_score'])))

Melhor média de acurácia entre os folds = 0.9880768835440437


In [12]:
results.loc[results['mean_train_score']==max(results['mean_train_score'])]

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_activation,param_hidden_layer_sizes,param_learning_rate_init,params,split0_test_score,split1_test_score,split2_test_score,mean_test_score,std_test_score,rank_test_score,split0_train_score,split1_train_score,split2_train_score,mean_train_score,std_train_score
68,0.074731,0.016695,0.000474,4.7e-05,relu,"(7,)",0.01,"{'activation': 'relu', 'hidden_layer_sizes': (...",0.944444,0.956522,0.811594,0.904762,0.065362,15,0.985507,0.978723,1.0,0.988077,0.008874


Características do melhor modelo que endereça a tarefa

In [13]:
clf.best_estimator_

MLPClassifier(activation='identity', alpha=0.0001, batch_size='auto',
       beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(7,), learning_rate='constant',
       learning_rate_init=0.05, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='lbfgs', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

Como estamos trabalhando com pouco dados, escolhemos o solver 'LBFGS', que converge mais rápido e trabalha com pouca memória.