# Projeto 1 - Classificação

- Moisés Botarro Ferraz Silva, 8504135
- Thales de Lima Kobosighawa,  9897884
- Victor Rozzatti Tornisiello, 9806867

# Implementação de um MultiLayer Perceptron

Para implementar o Multilayer Perceptron, pegamos as classes Layer e MLP implementadas para o Laboratório 2. As seguintes alterações foram realizadas:

### Classe Layer
- Adição dos atributos *d_weights_current*, *d_weights_old*, *d_bias_curent* e *d_bias_old* na classe Layer para aplicar a regra delta com parâmetro momentum ao realizar a atualização dos pesos da rede 

### Classe MLP
- Criação dos construtores MLPClassifier e MLPRegressor para a classe MLP, a fim de inicializar a classe para realizar uma classsificação ou regressão, respectivamente
- Adição dos métodos *\__get_class_mapping*, *\__convert_class_labels_to_vectors* e *\__convert_class_vectors_to_labels* na classe MLP para fazer a conversão de labels de classe em vetores binários. Por exemplo, podemos passar para o MLP um conjunto de treinamento com as classes (classeA, classeB, classeC). Esses métodos transformarão essas classes nos vetores (100, 010 e 001), evitando que a conversão seja feita pelo usuário a todo momento. Vale relembrar que esse mapeamento é realizado apenas para o caso de classificação
- Dentro de cada epoch durante o treinamento, os exemplos são embaralhados com o auxílio da função *shuffle* para evitar a saturação da saída dos neurônios
- Adição do critério de Kramer e Sangiovanni-Vicentelli para determinar a convergência do treinamento. Para-se de treinar quando a norma do gradiente dos pesos é menor que uma tolerância desejada
- Possibilidade de passar número máximo de epochs para treinamento para interrompé-lo antes que o gradiente dos pesos seja menor que a tolerância desejada. Isso será útil quando o conjunto de treinamento é muito grande e não consegue-se classificar corretamente todos os exemplos.
- Mapeamento dos vetores binários da saída de rede para labels de classe no método *predict* durante a classificação

In [1]:
import numpy as np
import random
import math
from IPython.display import display, clear_output
from sklearn.utils import shuffle as shuffle_data

# Layer represents a MLP Layer
# It has two main properties:
#      - a weigth matrix containing the weights of the layer's neurons. Each line represents a neuron and 
#        the columns represent its corresponding weights
#      - a bias vector, containing the neurons's bias
# Since during the backpropagation we need to compute the weights variation using the old ones, the 
# updated_weights and updated_bias properties store the new values until the update method is called
class Layer:
    # Create a new Layer with 'size' neurons, each one linked to 'inputs_size' inputs
    def __init__(self, size, inputs_size):
        self.size = size
        self.inputs_size = inputs_size
        self.weights = np.array([[random.uniform(-0.1, 0.1) for j in range(inputs_size)] for i in range(size)])
        self.bias = np.array([random.uniform(-0.1,0.1) for i in range(size)])
        
        self.d_weights_current = np.zeros((size, inputs_size))
        self.d_bias_current = np.zeros(size)
        self.d_weights_old = np.zeros((size, inputs_size))
        self.d_bias_old = np.zeros(size)
    
    # update updates the weights and bias matrices with the values stored in the updated ones
    def update(self, eta, alpha):        
        self.weights = self.weights + eta*self.d_weights_current + alpha*self.d_weights_old 
        self.bias = self.bias + eta*self.d_bias_current + alpha*self.d_bias_old
        
        self.d_weights_old = self.d_weights_current
        self.d_bias_old = self.d_bias_current
        
    # description prints a layer description
    def description(self):
        print("Layer Info")
        print("Weights: \n", self.weights)
        print("Bias: \n ", self.bias)

def logistic(x):
    return 1.0/(1.0+ math.exp(-x))

logistic_vec = np.vectorize(logistic)

def logistic_derivate(x):
    return x*(1.0-x)

In [2]:
class MLP:
    # MLP creation. One might pass the MLP layers as parameters or add them later using the add_layer method.
    # The classification parameter defines if the MLP will be used for a classification or regression problem
    def __init__(self, *layers, classifier=True):
        self.classifier = classifier
        if classifier:
            # Map each class label to a vector with a single 1
            # Ex: Class 0 -> [1,0]
            #     Class 1 -> [0,1]
            self.class_mapping = dict()  
            # Unmap each class vector to the corresponding class label
            # Ex: [1,0] -> Class 0 
            #     [0,1] -> Class 1
            self.class_unmapping = dict()
            
        self.layers = list()
        for layer in layers:
            self.add_layer(layer)
    
    # Shortcut to create a classifier MLP
    @classmethod
    def MLPClassifier(cls, *layers):
        return cls(classifier=True, *layers)   
    
    # Shortcut to create a regressor MLP
    @classmethod
    def MLPRegressor(cls, *layers):
        return cls(classifier=False, *layers)
    
    # add_layer adds a new layer on the MLP. It verifies whether or not the new layer is compatible with the MLP
    def add_layer(self, layer):
        # If there's already a layer in the MLP, verify if the new layer is compatible
        if len(self.layers) > 0:
            if layer.inputs_size != self.layers[-1].size:
                print("The new layer is incompatible with the MLP")
                print("Please, use a layer where each neuron has the same amount of inputs as the number" \
                     "of neurons in the MLP last layer")
        
        self.layers.append(layer)
    
    # description prints the info about the MLP layers
    def description(self):
        print("MLP Classifier?: ", self.classifier)
        print("-------------------------")
        print("MLP Info:")
        for layer, i in zip(self.layers, range(len(self.layers))):
            print("--- Layer: %d ---" % i)
            layer.description()
            
    # __get_class_mapping gets the class labels in the classes list and builds the mapping dicionaries
    # class_mapping and class_unmapping
    def __get_class_mapping(self, classes):
        class_labels = np.unique(classes)
        
        for c in range(len(class_labels)):
            class_label = class_labels[c]
            class_vector = np.zeros(len(class_labels))
            class_vector[c] = 1
    
            self.class_mapping[class_label] = class_vector
            
            # We can't use a list as a hash key. So transform it into a tuple
            self.class_unmapping[tuple(class_vector)] = class_label
        
    # __convert_class_labels_to_vectors converts a list with class labels to a list with 
    # vectors that maps each class label
    def __convert_class_labels_to_vectors(self, class_labels):
        return [self.class_mapping[c] for c in class_labels]
    
    # __convert_class_vectors_to_labels converts a list with class vectors to a list with 
    # the corresponding class labels
    def __convert_class_vectors_to_labels(self, class_vectors):
        return [self.class_unmapping[tuple(class_vector)] for class_vector in class_vectors]
        
        
    # fast_forward computes the ouput for a given input vector
    def fast_forward(self,input_v):
        # We need to store each layer input in order to perform the backpropagation
        self.inputs = list()
    
        # The input is applied in a layer weights matrix and the bias is added in the result
        # Then, the logistic function is applied to each layer neuron result
        # For a layer, we have a final output vector where each component i represents the output
        # of the neuron i
        for layer in self.layers:
            self.inputs.append(input_v)
            output = logistic_vec(layer.weights @ input_v + layer.bias)
            
            # The output of the current layer is the input of the next one
            input_v = output
        
        return output
    
    # train trains the MLP using the examples passed in the samples parameter
    # The expected output for each example must be passed in the classes parameter;
    # eta represents the MLP learning rate;
    # tol represents the error tolerance. The MLP is trained until the cumulative squared error for all example
    #     is less than the tol value
    # print_status prints the output for each example during the training phase
    def train(self, samples, classes, eta=0.5, alpha=0, tol=1e-2, epoch_max=2000, 
              print_status=False, shuffle=True):
        # Map the class labels to output vectors if it's a classification problem
        if self.classifier:
            self.__get_class_mapping(classes)
            classes = self.__convert_class_labels_to_vectors(classes)
                
        error = tol
        new_error = 3*tol
        epoch = 0
        
        # The training stops when the max number of epochs is reached or the Kramer and Sangiovanni-Vicentelly
        # criteria is valid. According to it, we can consider that the BP converged when the average mean squared
        # error is less than a given tolerance
        while (abs(new_error - error) > tol and epoch < epoch_max):
            epoch += 1
            error = 0
            new_error = 0
            
            # Suffles samples to avoid saturation if training with samples beloging to the same class
            # one after another
            if shuffle:
                samples, classes = shuffle_data(samples, classes)
            
            for input_v, t in zip(samples, classes):  
                # ---- Compute the output for the given input vector ----
                output = self.fast_forward(input_v)
                
                # Compute the mean squared error before the backpropagation
                error_sample = pow((np.array(t)-np.array(output)),2)
                # We need to sum the error of each component when the output is a vector
                error += sum(error_sample)/len(samples)
                
                if (print_status == True):
                    print("\ttraining example: %s from class %s" % (input_v, t), end = " ")
                    print("y = ", output)
     
                # ---- Backpropagation ----
                # Compute the new weights of each layer
                # Remark: the udpated weights are stored as a layer property and the layer is updated once 
                # the backpropagation is finished
                # It's necessary to do so in order to compute the delta value for the inner layers. We need 
                # to use the weights that caused the error to compute the delta instead of the updated weights
                for l in reversed(range(len(self.layers))): # Traverse the layers in reversed order
                    layer = self.layers[l]
             
                    deltas = list()
                    # Compute the delta for each layer neuron n
                    for n in range(len(layer.weights)):
                        # Last Layer
                        if l == (len(self.layers)-1):
                            delta = (t[n]-output[n])*logistic_derivate(output[n])
                            
                        # Inner Layer
                        else:
                            # output of the current layer is the input of the next one
                            neuron_output = self.inputs[l+1][n]
                            # weights of each neuron output
                            errors_weights = self.layers[l+1].weights[:,n]
                            
                            delta = np.dot(delta_next_layer,errors_weights)*logistic_derivate(neuron_output)
                              
                        # Computes the weights and bias variation for the neuron n
                        for w in range(len(layer.weights[n])):
                            layer.d_weights_current[n][w] = delta*self.inputs[l][w]
                        layer.d_bias_current[n] = delta*1 # bias input = 1
                        
                        #for w in range(len(layer.weights[n])):
                        #    layer.updated_weights[n][w] = layer.weights[n][w] + eta*delta*self.inputs[l][w]
                        #layer.updated_bias[n] = layer.bias[n] + (eta*delta*1) # bias input = 1

                        # Store the neuron delta
                        deltas.append(delta)
                    
                    # The neurons' delta of the current layer will be used to compute the deltas of the 
                    # next inner layer
                    delta_next_layer = np.array(deltas)
                     
                # Once the backpropagation is finished for the current example, update all the weigths and bias
                for layer in self.layers:
                    layer.update(eta, alpha)
                    
                # Compute the new error mean squared error
                output = self.fast_forward(input_v)
                error_sample = pow((np.array(t)-np.array(output)),2)
                #print("error sample: ",error_sample)
                new_error += sum(error_sample)/len(samples)
            
            # End of a epoch
            if epoch%1 == 0: # Print status only after each 100 iterations 
                clear_output(wait=True)
                display("End of epoch " + str(epoch) + ". Total Error = " + str(new_error))
        
        # End of training         
        clear_output(wait=True)
        display("End of epoch " + str(epoch) + ". Total Error = " + str(new_error))
        
    # predicts gets a list of input samples and returns a list with the predicted outputs
    def predict(self, samples):
        outputs = list()
        for input_v in samples:
            probs = self.fast_forward(input_v)
            
            if self.classifier:
                class_pos = np.argmax(probs)
                output = np.zeros(len(probs))
                output[class_pos] = 1
            
                #outputs.append(self.class_unmapping[tuple(output)])
                outputs.append(output)
                
            else:
                outputs.append(probs)
    
        if self.classifier:
            return self.__convert_class_vectors_to_labels(outputs)
        
        else:
            return outputs

# Pré Processamento dos Dados

A fim de normalizar os dados e evitar a saturação dos neurônios, vamos escalar os dados utilizando a seguinte transformação:

$$ x = \frac{x-x_{min}}{x_{max}-x_{min}}$$

Onde $x_{max}$ e $x_{min}$ são, respectivamente, os maiores e menores valores para um determinado atributo.

Dessa forma, garantimos que todos os dados inseridos na rede estejam no intervalo [0,1].

A função normalize_data transforma os dados de forma que eles tenham média 0 e variância 1. Entretanto, como isso não garante que os valores estejam no intervalo [0,1], não iremos utilizá-la. Realizamos alguns testes e em algusn dos casos, ocorre a saturação dos neurônios utilizando essa transformação.

In [4]:
# normalize data transforms data in order to all points have mean 0 and variance 1
def normalize_data(data):
    normalized_columns = list()
    for c in range(len(data[0])):
        col = data[:,c]
        normalized_columns.append((col - np.mean(col))/np.std(col))

    return np.array(normalized_columns).T

In [5]:
# scale_data transforms data in order to all points be in the interval [0,1]
def scale_data(data):
    normalized_columns = list()
    for c in range(len(data[0])):
        col = data[:,c]
        normalized_columns.append((col-np.min(col))/(np.max(col)-np.min(col)))

    return np.array(normalized_columns).T

# Avaliação do Modelo

A fim de avaliar o resultado do modelo, vamos calcular a sua acurácia atravéz da função *evaluate* abaixo. Ela calcula a acurância geral comparando um vetor com as classes reais e as classes preditas utilizando o modelo. Além disso, para avaliar a acurácia para cada uma das classes separadamente, a matriz de confusão é construída e através dela, determina-se o desempenho do modelo para cada classe.

In [6]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
import pandas as pd

def evaluate(real_classes, predicted_classes, display=True):
    acc = accuracy_score(real_classes, predicted_classes)
    
    class_labels = np.sort(np.unique(real_classes))
    cm = confusion_matrix(real_classes, predicted_classes, labels=class_labels)
    df = pd.DataFrame(cm)
    df.columns = class_labels
    df.index = class_labels
    
    # Acurracy per class
    accs = list()
    for c in range(len(cm)):
        accs.append(cm[c,c]/sum(cm[c,:]))
    df["Accuracy"] = accs
    
    avg_acc = np.average(accs)
    
    if display == True:
        print("Accuracy: %.2f%%" % (acc*100))
        
        print("Confusion Matrix and Accuracy per class:")
        print(df)

        print("Average accuracy per class: %.2f%%" % (avg_acc*100))
        
    return acc, avg_acc

# Estudos dos Meta Parâmetros

Para o problema de classificação, vamos pegar o conjunto de dados fornecidos e vamos normalizá-los de forma que cada valor de feature esteja no intervalo [0,1]. Isso será feito para evitar a saturação da saída dos neurônios e melhorar a convergência do algoritmo de aprendizagem.

A seguir, tomaremos um conjunto de treinamento consistindo em 70% da base original. Iremos avaliar o impacto da arquitetura da rede assim como da variação de parâmetros de aprendizado no valor de acurácia obtido na classificação do conjunto de teste. 

Iremos calcular a acurácia total e a acurácia por classe. Uma vez que não estamos impondo penalidades diferentes para erros cometidos em determinadas classes, vamos considerar como melhor arquitetura aquela que fornece maior valor para a acurácia total, embora isso possa não refletir em uma acurácia por classe elevada! Mais tarde, iremos tratar melhor o balanceamento entre as classes a fim de que o aprendizado da rede seja adequado para todas elas.

## Data Set Original

In [14]:
import pandas as pd

df = pd.read_csv('winequality-red.csv')
df.head(5)

Unnamed: 0.1,Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,category
0,0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,Mid
1,1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,Mid
2,2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,Mid
3,3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,Mid
4,4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,Mid


In [15]:
# split data in inputs and classes
inputs = df[df.columns[1:-1]].values
classes = df[df.columns[-1]].values

print(inputs[0:2,:])
print(classes)

[[ 7.4     0.7     0.      1.9     0.076  11.     34.      0.9978  3.51
   0.56    9.4   ]
 [ 7.8     0.88    0.      2.6     0.098  25.     67.      0.9968  3.2
   0.68    9.8   ]]
['Mid' 'Mid' 'Mid' ... 'Mid' 'Mid' 'Mid']


In [16]:
# How many examples exist for each classe?
unique, counts = np.unique(classes, return_counts=True)
print(unique)
print(counts)

['Bad' 'Good' 'Mid']
[  63  217 1319]


Como podemos observar, as classes no conjunto de dados encontram-se extremamente desbalanceadas. Mais tarde, iremos propor métodos para tentar contornar esse problema, utilizando under e over sampling.

In [17]:
scaled_inputs = scale_data(inputs)

## Divisão em Conjunto de Treinamento e Teste

A fim de que o conjunto de treinamento e o de teste mantenha a mesma proporção relativa de exemplos de cada classe, iremos realizar um split estratificado!

In [20]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(scaled_inputs, classes, test_size=0.3, 
                                                    stratify=classes,random_state=42)

## Teste de diferentes Arquiteturas de Camadas

Iremos testar diferentes arquiteturas de camadas. Primeiramente, iremos considerar uma única camada, variando o número de neurônios entre:
- Metade do número de entradas: 5
- Número de entradas: 11
- Dobro do número de entradas: 22

A seguir, iremos considerar arquiteturas com duas camadas, variando cada uma delas da mesma forma descrita acima!

### 1 layer - 5 Neurons

In [21]:
# Training
N1 = 5
N = len(X_train[0])
n_classes = len(np.unique(y_train))

random.seed(0)
mlp = MLP.MLPClassifier(Layer(N1, N), Layer(n_classes, N1))
mlp.train(X_train, y_train, eta=0.5, alpha=0.5, tol=1e-4, epoch_max=400, print_status=False, shuffle=True)

'End of epoch 400. Total Error = 0.17097283090389398'

In [22]:
predicted = mlp.predict(X_test)
evaluate(y_test, predicted)

Accuracy: 82.92%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     0     0   19  0.000000
Good    0    15   50  0.230769
Mid     0    13  383  0.967172
Average accuracy per class: 39.93%


(0.8291666666666667, 0.3993136493136493)

### 1 Layer - 11 Neurons

In [23]:
N1 = 11
N = len(X_train[0])
n_classes = len(np.unique(y_train))

random.seed(0)
mlp = MLP.MLPClassifier(Layer(N1, N), Layer(n_classes, N1))
mlp.train(X_train, y_train, eta=0.5, alpha=0.5, tol=1e-4, epoch_max=400, print_status=False, shuffle=True)

'End of epoch 400. Total Error = 0.15555351711307336'

In [24]:
predicted = mlp.predict(X_test)
evaluate(y_test, predicted)

Accuracy: 83.54%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     0     0   19  0.000000
Good    0    25   40  0.384615
Mid     5    15  376  0.949495
Average accuracy per class: 44.47%


(0.8354166666666667, 0.44470344470344475)

### 1 Layer - 22 Neurons

In [25]:
N1 = 22
N = len(X_train[0])
n_classes = len(np.unique(y_train))

random.seed(0)
mlp = MLP.MLPClassifier(Layer(N1, N), Layer(n_classes, N1))
mlp.train(X_train, y_train, eta=0.5, alpha=0.5, tol=1e-4, epoch_max=400, print_status=False, shuffle=True)

'End of epoch 400. Total Error = 0.15584908720846483'

In [26]:
predicted = mlp.predict(X_test)
evaluate(y_test, predicted)

Accuracy: 82.29%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     1     0   18  0.052632
Good    0    28   37  0.430769
Mid     5    25  366  0.924242
Average accuracy per class: 46.92%


(0.8229166666666666, 0.4692144113196745)

### 2 layers: 5 Neurons - 5 Neurons

In [27]:
N1 = 5
N2 = 5
N = len(X_train[0])
n_classes = len(np.unique(y_train))

random.seed(0)
mlp = MLP.MLPClassifier(Layer(N1, N), Layer(N2, N1), Layer(n_classes, N2))
mlp.train(X_train, y_train, eta=0.5, alpha=0.5, tol=1e-4, epoch_max=400, print_status=False, shuffle=True)

'End of epoch 400. Total Error = 0.17256073450057421'

In [28]:
predicted = mlp.predict(X_test)
evaluate(y_test, predicted)

Accuracy: 83.96%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     1     1   17  0.052632
Good    0    24   41  0.369231
Mid     2    16  378  0.954545
Average accuracy per class: 45.88%


(0.8395833333333333, 0.45880260090786407)

### 2 layers: 5 Neurons - 11 Neurons

In [29]:
N1 = 5
N2 = 11
N = len(X_train[0])
n_classes = len(np.unique(y_train))

random.seed(0)
mlp = MLP.MLPClassifier(Layer(N1, N), Layer(N2, N1), Layer(n_classes, N2))
mlp.train(X_train, y_train, eta=0.5, alpha=0.5, tol=1e-4, epoch_max=400, print_status=False, shuffle=True)

'End of epoch 400. Total Error = 0.17131903178171723'

In [30]:
predicted = mlp.predict(X_test)
evaluate(y_test, predicted)

Accuracy: 84.58%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     0     0   19  0.000000
Good    0    28   37  0.430769
Mid     1    17  378  0.954545
Average accuracy per class: 46.18%


(0.8458333333333333, 0.4617715617715618)

### 2 layers: 5 Neurons - 22 Neurons

In [31]:
N1 = 5
N2 = 22
N = len(X_train[0])
n_classes = len(np.unique(y_train))

random.seed(0)
mlp = MLP.MLPClassifier(Layer(N1, N), Layer(N2, N1), Layer(n_classes, N2))
mlp.train(X_train, y_train, eta=0.5, alpha=0.5, tol=1e-4, epoch_max=400, print_status=False, shuffle=True)

'End of epoch 400. Total Error = 0.18312974578027885'

In [32]:
predicted = mlp.predict(X_test)
evaluate(y_test, predicted)

Accuracy: 85.21%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     0     1   18  0.000000
Good    0    37   28  0.569231
Mid     0    24  372  0.939394
Average accuracy per class: 50.29%


(0.8520833333333333, 0.5028749028749029)

### 2 layers: 11Neurons - 5 Neurons

In [33]:
N1 = 11
N2 = 5
N = len(X_train[0])
n_classes = len(np.unique(y_train))

random.seed(0)
mlp = MLP.MLPClassifier(Layer(N1, N), Layer(N2, N1), Layer(n_classes, N2))
mlp.train(X_train, y_train, eta=0.5, alpha=0.5, tol=1e-4, epoch_max=400, print_status=False, shuffle=True)

'End of epoch 400. Total Error = 0.157005940957767'

In [34]:
predicted = mlp.predict(X_test)
evaluate(y_test, predicted)

Accuracy: 84.58%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     2     0   17  0.105263
Good    0    24   41  0.369231
Mid     3    13  380  0.959596
Average accuracy per class: 47.80%


(0.8458333333333333, 0.4780299622404886)

### 2 layers: 11Neurons - 11 Neurons

In [45]:
N1 = 11
N2 = 11
N = len(X_train[0])
n_classes = len(np.unique(y_train))

random.seed(0)
mlp = MLP.MLPClassifier(Layer(N1, N), Layer(N2, N1), Layer(n_classes, N2))
mlp.train(X_train, y_train, eta=0.5, alpha=0.5, tol=1e-4, epoch_max=400, print_status=False, shuffle=True)

'End of epoch 400. Total Error = 0.15924146409084203'

In [46]:
predicted = mlp.predict(X_test)
evaluate(y_test, predicted)

Accuracy: 85.00%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     1     0   18  0.052632
Good    0    34   31  0.523077
Mid     5    18  373  0.941919
Average accuracy per class: 50.59%


(0.85, 0.5058758979811612)

### 2 layers: 11Neurons - 22 Neurons

In [37]:
N1 = 11
N2 = 22
N = len(X_train[0])
n_classes = len(np.unique(y_train))

random.seed(0)
mlp = MLP.MLPClassifier(Layer(N1, N), Layer(N2, N1), Layer(n_classes, N2))
mlp.train(X_train, y_train, eta=0.5, alpha=0.5, tol=1e-4, epoch_max=400, print_status=False, shuffle=True)

'End of epoch 400. Total Error = 0.17482560382191964'

In [38]:
predicted = mlp.predict(X_test)
evaluate(y_test, predicted)

Accuracy: 83.75%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     1     0   18  0.052632
Good    0    24   41  0.369231
Mid     5    14  377  0.952020
Average accuracy per class: 45.80%


(0.8375, 0.4579608500661132)

### 2 layers: 22Neurons - 5 Neurons

In [39]:
N1 = 22
N2 = 5
N = len(X_train[0])
n_classes = len(np.unique(y_train))

random.seed(0)
mlp = MLP.MLPClassifier(Layer(N1, N), Layer(N2, N1), Layer(n_classes, N2))
mlp.train(X_train, y_train, eta=0.5, alpha=0.5, tol=1e-4, epoch_max=400, print_status=False, shuffle=True)

'End of epoch 400. Total Error = 0.15610677816108867'

In [40]:
predicted = mlp.predict(X_test)
evaluate(y_test, predicted)

Accuracy: 83.12%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     1     1   17  0.052632
Good    0    25   40  0.384615
Mid     5    18  373  0.941919
Average accuracy per class: 45.97%


(0.83125, 0.459722051827315)

### 2 layers: 22Neurons - 11 Neurons

In [41]:
N1 = 22
N2 = 11
N = len(X_train[0])
n_classes = len(np.unique(y_train))

random.seed(0)
mlp = MLP.MLPClassifier(Layer(N1, N), Layer(N2, N1), Layer(n_classes, N2))
mlp.train(X_train, y_train, eta=0.5, alpha=0.5, tol=1e-4, epoch_max=400, print_status=False, shuffle=True)

'End of epoch 400. Total Error = 0.1499731420710275'

In [42]:
predicted = mlp.predict(X_test)
evaluate(y_test, predicted)

Accuracy: 82.71%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     3     0   16  0.157895
Good    0    24   41  0.369231
Mid     8    18  370  0.934343
Average accuracy per class: 48.72%


(0.8270833333333333, 0.4871563134721029)

### 2 layers: 22Neurons - 22 Neurons

In [43]:
N1 = 22
N2 = 22
N = len(X_train[0])
n_classes = len(np.unique(y_train))

random.seed(0)
mlp = MLP.MLPClassifier(Layer(N1, N), Layer(N2, N1), Layer(n_classes, N2))
mlp.train(X_train, y_train, eta=0.5, alpha=0.5, tol=1e-4, epoch_max=400, print_status=False, shuffle=True)

'End of epoch 400. Total Error = 0.15165697320669774'

In [44]:
predicted = mlp.predict(X_test)
evaluate(y_test, predicted)

Accuracy: 85.21%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     1     0   18  0.052632
Good    0    32   33  0.492308
Mid     2    18  376  0.949495
Average accuracy per class: 49.81%


(0.8520833333333333, 0.49814474025000344)

## Número de epochs usadas durante o Treinamento

Vamos analisar como o número de epochs usadas durante o treinamento interfere na acurácia obtida no conjunto de testes! Para isso, vamos utilizar a arquitetura de rede que obteve melhor desempenho! A arquitetura de duas camadas com 5 e 22 neurônios teve o mesmo desempenho para a acurácia geral que considerando 22 neurônios nas camadas 1 e 2. Entretanto, como a acurácia por classe é melhor no primeiro caso e a rede é menor, facilitando o treinamento, utilizaremos esse modelos nos passos seguintes.

In [47]:
accs = dict()
best_N1 = 5
best_N2 = 21

### 200 epochs

In [57]:
epochs = 200

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=0.5, alpha=0.5, tol=1e-4, epoch_max=epochs, print_status=False, shuffle=True)

predicted_train = mlp.predict(X_train)
predicted_test = mlp.predict(X_test)

'End of epoch 200. Total Error = 0.19357673538234624'

In [58]:
print("=== TRAINING SET ===")
evaluate(y_train, predicted_train)
print("=== TEST SET ===")
accs[epochs] = evaluate(y_test, predicted_test)

=== TRAINING SET ===
Accuracy: 85.61%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     5     1   38  0.113636
Good    0    64   88  0.421053
Mid     2    32  889  0.963164
Average accuracy per class: 49.93%
=== TEST SET ===
Accuracy: 83.33%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     0     0   19  0.000000
Good    0    21   44  0.323077
Mid     1    16  379  0.957071
Average accuracy per class: 42.67%


### 400 epochs

In [59]:
epochs = 400

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=0.5, alpha=0.5, tol=1e-4, epoch_max=epochs, print_status=False, shuffle=True)

predicted_train = mlp.predict(X_train)
predicted_test = mlp.predict(X_test)

'End of epoch 400. Total Error = 0.1723300286170253'

In [60]:
print("=== TRAINING SET ===")
evaluate(y_train, predicted_train)
print("=== TEST SET ===")
accs[epochs] = evaluate(y_test, predicted_test)

=== TRAINING SET ===
Accuracy: 87.58%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad    10     1   33  0.227273
Good    0    81   71  0.532895
Mid     7    27  889  0.963164
Average accuracy per class: 57.44%
=== TEST SET ===
Accuracy: 84.79%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     3     0   16  0.157895
Good    0    28   37  0.430769
Mid     4    16  376  0.949495
Average accuracy per class: 51.27%


### 800 epochs

In [61]:
epochs = 800

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=0.5, alpha=0.5, tol=1e-4, epoch_max=epochs, print_status=False, shuffle=True)

predicted_train = mlp.predict(X_train)
predicted_test = mlp.predict(X_test)

'End of epoch 800. Total Error = 0.15944656209494504'

In [62]:
print("=== TRAINING SET ===")
evaluate(y_train, predicted_train)
print("=== TEST SET ===")
accs[epochs] = evaluate(y_test, predicted_test)

=== TRAINING SET ===
Accuracy: 88.03%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad    13     1   30  0.295455
Good    0   109   43  0.717105
Mid     9    51  863  0.934995
Average accuracy per class: 64.92%
=== TEST SET ===
Accuracy: 82.92%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     2     1   16  0.105263
Good    0    39   26  0.600000
Mid     6    33  357  0.901515
Average accuracy per class: 53.56%


### 1000 epochs

In [63]:
epochs = 1000

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=0.5, alpha=0.5, tol=1e-4, epoch_max=epochs, print_status=False, shuffle=True)

predicted_train = mlp.predict(X_train)
predicted_test = mlp.predict(X_test)

'End of epoch 1000. Total Error = 0.15061523549796377'

In [64]:
print("=== TRAINING SET ===")
evaluate(y_train, predicted_train)
print("=== TEST SET ===")
accs[epochs] = evaluate(y_test, predicted_test)

=== TRAINING SET ===
Accuracy: 88.47%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad    11     1   32  0.250000
Good    0   116   36  0.763158
Mid     9    51  863  0.934995
Average accuracy per class: 64.94%
=== TEST SET ===
Accuracy: 84.58%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     2     1   16  0.105263
Good    0    41   24  0.630769
Mid     3    30  363  0.916667
Average accuracy per class: 55.09%


### 2000 epochs

In [65]:
epochs = 2000

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=0.5, alpha=0.5, tol=1e-4, epoch_max=epochs, print_status=False, shuffle=True)

predicted_train = mlp.predict(X_train)
predicted_test = mlp.predict(X_test)

'End of epoch 2000. Total Error = 0.14620087345741456'

In [66]:
print("=== TRAINING SET ===")
evaluate(y_train, predicted_train)
print("=== TEST SET ===")
accs[epochs] = evaluate(y_test, predicted_test)

=== TRAINING SET ===
Accuracy: 85.97%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad    12     2   30  0.272727
Good    1    96   55  0.631579
Mid    23    46  854  0.925244
Average accuracy per class: 60.98%
=== TEST SET ===
Accuracy: 81.67%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     3     1   15  0.157895
Good    1    31   33  0.476923
Mid     8    30  358  0.904040
Average accuracy per class: 51.30%


In [67]:
print(accs)

{200: (0.8333333333333334, 0.42671587671587674), 400: (0.8479166666666667, 0.5127196390354286), 800: (0.8291666666666667, 0.5355927698032961), 1000: (0.8458333333333333, 0.5508996851102114), 2000: (0.8166666666666667, 0.5129527392685288)}


In [68]:
epochs_accs = pd.DataFrame(accs)
epochs_accs.index = ["Total accuracy", "Accuracy per class"]
print(epochs_accs)

                        200       400       800       1000      2000
Total accuracy      0.833333  0.847917  0.829167  0.845833  0.816667
Accuracy per class  0.426716  0.512720  0.535593  0.550900  0.512953


Repare que embora a Loss Function decresça para 800, 1000 e 2000 epochs, ao avaliar o modelo no conjunto de testes, há uma queda na acurácia! isso ilustra a ocorrência de um overfitting do modelo. Portanto, nas próximas etapas, iremos tomar 400 epochs como o máximo de iterações durante a fase de treinamento, uma vez que ela mantém um bom nível de acurácia total, enquanto eleva a acurácia por classe.

In [70]:
best_epochs = 400

# Learning Rate and Momentum

Fixando o número de epochs em 800 e utilizando a arquitura de rede com melhor desempenho (2 camadas intermediárias:11 neurônios na primeira e 11 na segunda), vamos variar os parâmetros learning rate (eta) e momentum (alfa), e vamos observar como eles interferem no aprendizado.

### Learning Rate = 0.3 e Momentum = 0.3

In [71]:
accs = dict()

In [72]:
eta = 0.3
alpha = 0.3

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=eta, alpha=alpha, tol=1e-4, epoch_max=best_epochs, 
          print_status=False, shuffle=True)

predicted = mlp.predict(X_test)
accs[(eta,alpha)] = evaluate(y_test, predicted)

'End of epoch 400. Total Error = 0.17852369217449351'

Accuracy: 83.12%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     1     0   18  0.052632
Good    0    13   52  0.200000
Mid     2     9  385  0.972222
Average accuracy per class: 40.83%


### Learning Rate = 0.3 e Momentum = 0.5

In [73]:
eta = 0.3
alpha = 0.5

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=eta, alpha=alpha, tol=1e-4, epoch_max=best_epochs, 
          print_status=False, shuffle=True)

predicted = mlp.predict(X_test)
accs[(eta,alpha)] = evaluate(y_test, predicted)

'End of epoch 400. Total Error = 0.17005450714137374'

Accuracy: 82.92%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     2     1   16  0.105263
Good    0    40   25  0.615385
Mid     3    37  356  0.898990
Average accuracy per class: 53.99%


### Learning Rate = 0.3 e Momentum = 0.8

In [74]:
eta = 0.3
alpha = 0.8

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=eta, alpha=alpha, tol=1e-4, epoch_max=best_epochs, 
          print_status=False, shuffle=True)

predicted = mlp.predict(X_test)
accs[(eta,alpha)] = evaluate(y_test, predicted)

'End of epoch 400. Total Error = 0.16849678551862712'

Accuracy: 84.58%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     3     1   15  0.157895
Good    0    34   31  0.523077
Mid     4    23  369  0.931818
Average accuracy per class: 53.76%


### Learning Rate = 0.5 e Momentum = 0.3

In [75]:
eta = 0.5
alpha = 0.3

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=eta, alpha=alpha, tol=1e-4, epoch_max=best_epochs, 
          print_status=False, shuffle=True)

predicted = mlp.predict(X_test)
accs[(eta,alpha)] = evaluate(y_test, predicted)

'End of epoch 400. Total Error = 0.17178673034167954'

Accuracy: 84.17%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     1     0   18  0.052632
Good    0    26   39  0.400000
Mid     3    16  377  0.952020
Average accuracy per class: 46.82%


### Learning Rate = 0.5 e Momentum = 0.5

In [76]:
eta = 0.5
alpha = 0.5

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=eta, alpha=alpha, tol=1e-4, epoch_max=best_epochs, 
          print_status=False, shuffle=True)

predicted = mlp.predict(X_test)
accs[(eta,alpha)] = evaluate(y_test, predicted)

'End of epoch 400. Total Error = 0.1660632096305141'

Accuracy: 84.79%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     2     0   17  0.105263
Good    0    36   29  0.553846
Mid     3    24  369  0.931818
Average accuracy per class: 53.03%


### Learning Rate = 0.5 e Momentum = 0.8

In [77]:
eta = 0.5
alpha = 0.8

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=eta, alpha=alpha, tol=1e-4, epoch_max=best_epochs, 
          print_status=False, shuffle=True)

predicted = mlp.predict(X_test)
accs[(eta,alpha)] = evaluate(y_test, predicted)

'End of epoch 400. Total Error = 0.16087797552456454'

Accuracy: 84.79%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     1     1   17  0.052632
Good    0    21   44  0.323077
Mid     1    10  385  0.972222
Average accuracy per class: 44.93%


### Learning Rate = 0.8 e Momentum = 0.3

In [78]:
eta = 0.8
alpha = 0.3

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=eta, alpha=alpha, tol=1e-4, epoch_max=best_epochs, 
          print_status=False, shuffle=True)

predicted = mlp.predict(X_test)
accs[(eta,alpha)] = evaluate(y_test, predicted)

'End of epoch 400. Total Error = 0.16368478056386876'

Accuracy: 84.58%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     0     0   19  0.000000
Good    0    27   38  0.415385
Mid     1    16  379  0.957071
Average accuracy per class: 45.75%


### Learning Rate = 0.8 e Momentum = 0.5

In [79]:
eta = 0.8
alpha = 0.5

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=eta, alpha=alpha, tol=1e-4, epoch_max=best_epochs, 
          print_status=False, shuffle=True)

predicted = mlp.predict(X_test)
accs[(eta,alpha)] = evaluate(y_test, predicted)

'End of epoch 400. Total Error = 0.1656719841097967'

Accuracy: 84.58%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     2     1   16  0.105263
Good    0    33   32  0.507692
Mid     1    24  371  0.936869
Average accuracy per class: 51.66%


### Learning Rate = 0.8 e Momentum = 0.8

In [80]:
eta = 0.8
alpha = 0.8

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=eta, alpha=alpha, tol=1e-4, epoch_max=best_epochs, 
          print_status=False, shuffle=True)

predicted = mlp.predict(X_test)
accs[(eta,alpha)] = evaluate(y_test, predicted)

'End of epoch 400. Total Error = 0.16077643807005182'

Accuracy: 85.42%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     2     1   16  0.105263
Good    0    28   37  0.430769
Mid     1    15  380  0.959596
Average accuracy per class: 49.85%


### Learning Rate = 1 e Momentum = 1

In [81]:
eta = 1
alpha = 1

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=eta, alpha=alpha, tol=1e-4, epoch_max=best_epochs, 
          print_status=False, shuffle=True)

predicted = mlp.predict(X_test)
accs[(eta,alpha)] = evaluate(y_test, predicted)

'End of epoch 400. Total Error = 0.16825300651265834'

Accuracy: 84.38%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     0     1   18  0.000000
Good    0    27   38  0.415385
Mid     0    18  378  0.954545
Average accuracy per class: 45.66%


In [82]:
accs

{(0.3, 0.3): (0.83125, 0.40828460038986353),
 (0.3, 0.5): (0.8291666666666667, 0.5398792240897504),
 (0.3, 0.8): (0.8458333333333333, 0.5375966139124033),
 (0.5, 0.3): (0.8416666666666667, 0.4682172603225235),
 (0.5, 0.5): (0.8479166666666667, 0.5303091645196908),
 (0.5, 0.8): (0.8479166666666667, 0.4493102414155046),
 (0.8, 0.3): (0.8458333333333333, 0.45748510748510746),
 (0.8, 0.5): (0.8458333333333333, 0.5166080508185771),
 (0.8, 0.8): (0.8541666666666666, 0.49854278275330904),
 (1, 1): (0.84375, 0.4566433566433567)}

In [87]:
total_acc = pd.DataFrame()
avg_acc = pd.DataFrame()

for run, acc in accs.items():
    total_acc.at[str(run[0]), str(run[1])] = acc[0]
    avg_acc.at[str(run[0]), str(run[1])] = acc[1]
    
print(total_acc)

          0.3       0.5       0.8        1
0.3  0.831250  0.829167  0.845833      NaN
0.5  0.841667  0.847917  0.847917      NaN
0.8  0.845833  0.845833  0.854167      NaN
1         NaN       NaN       NaN  0.84375


In [88]:
print(avg_acc)

          0.3       0.5       0.8         1
0.3  0.408285  0.539879  0.537597       NaN
0.5  0.468217  0.530309  0.449310       NaN
0.8  0.457485  0.516608  0.498543       NaN
1         NaN       NaN       NaN  0.456643


Podemos observar que, para esse problema considerado, aumentando o learning rate e o momentum, aumentamos a acurácia geral mas diminuímos a acurácia média por classe. Entretanto, tomando os valores igual a 1, a acurácia total obtida é menor que a atinginda com ambos valendo 0.8.

In [90]:
best_eta   = 0.8
best_alpha = 0.8

## Variação do tamanho dos conjuntos de treinamento e teste

<div style="text-align: justify"> Agora, vamos variar o tamanho dos conjuntos de treinamento e teste utilizando a melhor arquitetura encontrada acima e os melhores valores de learning rate e momentum. Utilizaremos incialmente 70% dos dados para treinamento, aumentando gradativamente esse valor até 90%.</div>

### 70% for training, 30% for test

In [91]:
X_train, X_test, y_train, y_test = train_test_split(scaled_inputs, classes, test_size=0.3, 
                                                    stratify=classes,random_state=42)

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=best_eta, alpha=best_alpha, tol=1e-4, epoch_max=best_epochs, 
          print_status=False, shuffle=True)

predicted = mlp.predict(X_test)
evaluate(y_test, predicted)

'End of epoch 400. Total Error = 0.1633866935427087'

Accuracy: 84.17%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     2     1   16  0.105263
Good    0    37   28  0.569231
Mid     1    30  365  0.921717
Average accuracy per class: 53.21%


(0.8416666666666667, 0.5320703662808927)

### 75% for training, 25% for test

In [92]:
X_train, X_test, y_train, y_test = train_test_split(scaled_inputs, classes, test_size=0.25, 
                                                    stratify=classes,random_state=42)

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=best_eta, alpha=best_alpha, tol=1e-4, epoch_max=best_epochs, 
          print_status=False, shuffle=True)

predicted = mlp.predict(X_test)
evaluate(y_test, predicted)

'End of epoch 400. Total Error = 0.17691930674123232'

Accuracy: 84.00%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     0     0   16  0.000000
Good    0    16   38  0.296296
Mid     0    10  320  0.969697
Average accuracy per class: 42.20%


(0.84, 0.4219977553310887)

### 80% for training, 20% for test

In [93]:
X_train, X_test, y_train, y_test = train_test_split(scaled_inputs, classes, test_size=0.2, 
                                                    stratify=classes,random_state=42)

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=best_eta, alpha=best_alpha, tol=1e-4, epoch_max=best_epochs, 
          print_status=False, shuffle=True)

predicted = mlp.predict(X_test)
evaluate(y_test, predicted)

'End of epoch 400. Total Error = 0.1710581769015962'

Accuracy: 80.62%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     0     0   13  0.000000
Good    0    15   28  0.348837
Mid     0    21  243  0.920455
Average accuracy per class: 42.31%


(0.80625, 0.42309725158562367)

### 85% for training, 15% for test

In [94]:
X_train, X_test, y_train, y_test = train_test_split(scaled_inputs, classes, test_size=0.15, 
                                                    stratify=classes,random_state=42)

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=best_eta, alpha=best_alpha, tol=1e-4, epoch_max=best_epochs, 
          print_status=False, shuffle=True)

predicted = mlp.predict(X_test)
evaluate(y_test, predicted)

'End of epoch 400. Total Error = 0.1778925043554022'

Accuracy: 85.00%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     0     0    9  0.000000
Good    0    14   19  0.424242
Mid     0     8  190  0.959596
Average accuracy per class: 46.13%


(0.85, 0.4612794612794613)

### 90% for training, 10% for test

In [95]:
X_train, X_test, y_train, y_test = train_test_split(scaled_inputs, classes, test_size=0.1, 
                                                    stratify=classes,random_state=42)

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=best_eta, alpha=best_alpha, tol=1e-4, epoch_max=best_epochs, 
          print_status=False, shuffle=True)

predicted = mlp.predict(X_test)
evaluate(y_test, predicted)

'End of epoch 400. Total Error = 0.171274089212619'

Accuracy: 86.25%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     0     0    6  0.000000
Good    0    10   12  0.454545
Mid     0     4  128  0.969697
Average accuracy per class: 47.47%


(0.8625, 0.47474747474747475)

### 95% for training, 5% for test

In [96]:
X_train, X_test, y_train, y_test = train_test_split(scaled_inputs, classes, test_size=0.05, 
                                                    stratify=classes,random_state=42)

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=best_eta, alpha=best_alpha, tol=1e-4, epoch_max=best_epochs, 
          print_status=False, shuffle=True)

predicted = mlp.predict(X_test)
evaluate(y_test, predicted)

'End of epoch 400. Total Error = 0.1943626528283032'

Accuracy: 83.75%
Confusion Matrix and Accuracy per class:
      Bad  Good  Mid  Accuracy
Bad     0     0    3  0.000000
Good    0     4    7  0.363636
Mid     0     3   63  0.954545
Average accuracy per class: 43.94%


(0.8375, 0.43939393939393945)

A partir das classificações realizadas, é possível observar que utilizando 70% dos dados para treinamento e 30% dos dados para teste, obtivemos a melhor acurácia média por classe, enquanto que utilizando 90% dos dados para treinamento e 10% para teste, foi obtida a melhor acurácia geral.

Entretanto, como utilizando 90% para treinamento, resulta-se em um conjunto de testes com apenas 3 exemplares da classe 'Bad', a divisão entre 70% e 30% será considerada como a que apresentou melhores resultados e será utilizada para os próximos testes daqui em diante.

In [None]:
best_test_size = 0.3

# Revisitando o Pré Processamento dos Dados

Como podemos observar nas classificações acima, na base de dados considerada, há poucos exemplos da classe Bad. Como consequência, como estamos dando o mesmo peso para um erro cometido em qualquer classe, a rede acaba por classificar a maioria dos exemplos como sendo pertecentes à classe com maior número de exemplos ('Mid').

A seguir, vamos considerar uma base de dados com um número balanceado entre as classes. Para tal, vamos tomar o tamanho da menor classe e escolher exemplos aleatórios das demais classes para igualar esse número.

Também vamos considerar uma base de dados com exemplos artificiais que serão criados para igualar o número de exemplos da menor classe com o número de exemplos da classe intermediária. Não igualaremos o número de exemplares pela classe de maior cardinalidade, uma vez que, no mundo real, é normal que haja mais itens de qualidade intermediária que itens de qualidade ruim ou boa. 

Para melhor comparar as base de dados e não ser influenciado pela aleatoriedade com a qual os conjuntos de treinamento e teste são escolhidos, iremos avaliar os modelos utilizando CrossValidation Stratified com 10 folds!

In [None]:
df = pd.read_csv('winequality-red.csv')
inputs = df[df.columns[1:-1]].values
classes = df[df.columns[-1]].values

unique, counts = np.unique(classes, return_counts=True)
print(unique)
print(counts)

## Undersampling

In [None]:
# gets the smallest amount of examples, amongts all classes in the dataset
examples = min(counts)

In [None]:
# randomly chooses 63 indexes of examples from each class 
bad_indices = np.random.choice(np.where(classes == 'Bad')[0], examples)
good_indices = np.random.choice(np.where(classes == 'Good')[0], examples)
mid_indices = np.random.choice(np.where(classes == 'Mid')[0], examples)

# stores the chosen examples from 'Bad' class, as well as the same amount of labels from it
under_sampled_examples = scaled_inputs[bad_indices]
under_sampled_classes = classes[bad_indices]

print(under_sampled_classes)

# stores the chosen examples from 'Good' class, as well as the same amount of labels from it
under_sampled_examples = np.append(under_sampled_examples,scaled_inputs[good_indices], axis=0)
under_sampled_classes = np.append(under_sampled_classes, classes[good_indices], axis=0)

# stores the chosen examples from 'Mid' class, as well as the same amount of labels from it 
under_sampled_examples = np.append(under_sampled_examples,scaled_inputs[mid_indices], axis=0)
under_sampled_classes = np.append(under_sampled_classes, classes[mid_indices], axis=0)

In [None]:
# splits the new dataset in a training set and a test set, using 70% to 30% proportion
X_train, X_test, y_train, y_test = train_test_split(under_sampled_examples, under_sampled_classes, 
                                                    test_size=best_test_size, 
                                                    stratify=under_sampled_classes,random_state=42)

### Without Cross Validation metrics

In [None]:
# uses the best architecture, number of epochs and alpha and eta values, all previously obtained,
# for classifying the new examples, without using Stratified K-Fold metrics
random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=best_eta, alpha=best_alpha, tol=1e-4, epoch_max=best_epochs, 
          print_status=False, shuffle=True)

predicted = mlp.predict(X_test)
evaluate(y_test, predicted)

<div style="text-align: justify"> Como é possível observar, a acurácia obtida na classificação utilizando undersampling piorou em cerca de 10% em comparação com a classificação realizada utilizando a base de dados original. Porém, a acurácia média por classes melhorou consideravelmente, em cerca de 20%, quando comparada à melhor acurácia por classe obtida anteriormente, de aproximadamente 53%. Isso se deve ao fato de que a quantidade de exemplos disponíveis para cada classe é a mesma, de forma que a classificação não se torna enviesada, favorecendo os exemplos da classe majoritária em detrimento das outras.</div>

### With Cross Validation metrics

In [None]:
# uses the best architecture, number of epochs and alpha and eta values, all previously obtained,
# for classifying the new examples, using Stratified K-Fold metrics, for K = 10.
from sklearn.model_selection import StratifiedKFold

skf = StratifiedKFold(n_splits=10)
accuracies = list()

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)

X = over_sampled_examples
y = over_sampled_classes

for train_index, test_index in skf.split(X, y):
    print(train_index, test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    mlp.train(X_train, y_train, eta=best_eta, alpha=best_alpha, tol=1e-4, epoch_max=best_epochs, 
              print_status=False, shuffle=True)

    predicted  = mlp.predict(X_test)
    acc = evaluate(y_test, predicted, display=False)
    accuracies.append(acc)
    
mean_acc = np.array([])
mean_classes_acc = np.array([])

for i in range(len(accuracies)):
    mean_acc = np.append(mean_acc, accuracies[i][0])
    mean_classes_acc = np.append(mean_classes_acc, accuracies[i][1])

print("Average accuracy: " + '{:.2f}'.format(np.mean(mean_acc)*100) + "%")
print("Average accuracy per class: " + '{:.2f}'.format(np.mean(mean_classes_acc)*100) + "%")

### Testing on the complete dataset, trained with the undersampled dataset

In [None]:
predicted  = mlp.predict(X_test)
acc = evaluate(y_test, predicted)

## Over Sampling

Para realizar o over-sampling dos dados, iremos utlizar o Método SMOTE(Synthetic Minority Over-sampling Technique) presente na biblioteca imbalanced-learn (https://imbalanced-learn.readthedocs.io/en/stable/generated/imblearn.over_sampling.SMOTE.html#imblearn.over_sampling.SMOTE).

Nesse método, cria-se exemplos artificiais da seguinte maneira:
    - Escolhe-se um exemplo da classe minoritária
    - Toma-se k vizinhos mais próximos dele da mesma classe
    - Determina-se o vetor entre o exemplo e seus vizinhos
    - Cria-se um novo exemplo multiplicando esse vetor por um número aleatório entre 0 e 1
    
Além desse técnica, poderímos duplicar, triplicar e assim por diante, exemplos das classes minoritárias realizando uma amostragem aleatória com reposição nessas classes. Entretanto, como terminaríamos com exemplos duplicados nas classes minoritárias, optamos pela criação de exemplos artificiais através da técnica SMOTE.

In [None]:
from imblearn.over_sampling import SMOTE

# specify the class targeted by the resampling. The number of samples in the different classes will be equalized
# 'not majority': resample all classes but the majority class
sm = SMOTE(sampling_strategy='not majority')

# resample the dataset, using parameters: matrix containing the data which have to be sampled and corresponding 
# label for each sample in matrix
# 'over_sampled_dfX': The array containing the resampled data
# 'over_sampled_dfY': The corresponding label of over_sampled_dfX
over_sampled_dfX, over_sampled_dfY = sm.fit_sample(df.drop('category', axis=1), df['category'])

# new df containing the resampled data
over_sampled_df = pd.concat([pd.DataFrame(over_sampled_dfX), pd.DataFrame(over_sampled_dfY)], axis=1)
over_sampled_df.columns = df.columns

#over_sampled_train

In [None]:
# Separação dos Dados
over_sampled_inputs = over_sampled_df[over_sampled_df.columns[1:-1]].values
over_sampled_classes = over_sampled_df[over_sampled_df.columns[-1]].values

print(over_sampled_inputs[0:2,:])
print(over_sampled_classes)

In [None]:
over_sampled_scaled_inputs = scale_data(over_sampled_inputs)
over_sampled_scaled_inputs

In [None]:
# splits the new dataset in a training set and a test set, using 70% to 30% proportion
X_train, X_test, y_train, y_test = train_test_split(over_sampled_scaled_inputs, over_sampled_classes, 
                                                    test_size=best_test_size, 
                                                    stratify=over_sampled_classes,random_state=42)

### Without Cross Validation metrics

In [None]:
# uses the best architecture, number of epochs and alpha and eta values, all previously obtained,
# for classifying the new examples, without using Stratified K-Fold metrics
random.seed(0)
N = len(X_train[0])
n_classes = len(np.unique(y_train))
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)
mlp.train(X_train, y_train, eta=best_eta, alpha=best_alpha, tol=1e-4, epoch_max=best_epochs, 
          print_status=False, shuffle=True)

predicted = mlp.predict(X_test)
evaluate(y_test, predicted)

### With Cross Validation metrics

In [None]:
# uses the best architecture, number of epochs and alpha and eta values, all previously obtained,
# for classifying the new examples, using Stratified K-Fold metrics, for K = 10.
from sklearn.model_selection import StratifiedKFold

skf = StratifiedKFold(n_splits=10)
accuracies = list()

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)

X = over_sampled_scaled_inputs
y = over_sampled_classes

for train_index, test_index in skf.split(X, y):
    print(train_index, test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    mlp.train(X_train, y_train, eta=best_eta, alpha=best_alpha, tol=1e-4, epoch_max=200, 
              print_status=False, shuffle=True)

    predicted  = mlp.predict(X_test)
    acc = evaluate(y_test, predicted, display=False)
    accuracies.append(acc)
    
mean_acc = np.array([])
mean_classes_acc = np.array([])

for i in range(len(accuracies)):
    mean_acc = np.append(mean_acc, accuracies[i][0])
    mean_classes_acc = np.append(mean_classes_acc, accuracies[i][1])

print("Average accuracy: " + '{:.2f}'.format(np.mean(mean_acc)*100) + "%")
print("Average accuracy per class: " + '{:.2f}'.format(np.mean(mean_classes_acc)*100) + "%")

### Testing on the complete dataset, trained with the oversampled dataset

In [None]:
X_train, X_test, y_train, y_test = train_test_split(scaled_inputs, classes, test_size=0.3, 
                                                    stratify=classes,random_state=42)

predicted  = mlp.predict(X_test)
acc = evaluate(y_test, predicted)

## Complete Dataset

Aqui será feito o treinamento e teste utilizando o conjunto de dados completo, sem utilização de undersampling ou oversampling, com o método de validação cruzada Stratified K-Fold.

In [None]:
# uses the best architecture, number of epochs and alpha and eta values, all previously obtained,
# for classifying the new examples, using Stratified K-Fold metrics, for K = 10.
from sklearn.model_selection import StratifiedKFold

skf = StratifiedKFold(n_splits=10)
accuracies = list()

random.seed(0)
best_layers = [Layer(best_N1, N), Layer(best_N2, best_N1), Layer(n_classes, best_N2)]
mlp = MLP.MLPClassifier(*best_layers)

X = scaled_inputs
y = classes

for train_index, test_index in skf.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    mlp.train(X_train, y_train, eta=best_eta, alpha=best_alpha, tol=1e-4, epoch_max=best_epochs, 
              print_status=False, shuffle=True)

    predicted  = mlp.predict(X_test)
    acc = evaluate(y_test, predicted, display=False)
    accuracies.append(acc)
    
mean_acc = np.array([])
mean_classes_acc = np.array([])

for i in range(len(accuracies)):
    mean_acc = np.append(mean_acc, accuracies[i][0])
    mean_classes_acc = np.append(mean_classes_acc, accuracies[i][1])

print("Average accuracy: " + '{:.2f}'.format(np.mean(mean_acc)*100) + "%")
print("Average accuracy per class: " + '{:.2f}'.format(np.mean(mean_classes_acc)*100) + "%")

# Regressão