# Perceptron Multicamada no problema MNIST



### Disciplina: Noções de Inteligência Artificial - 2/2024
### Alunos: Felipe Lopes Gibin Duarte e Matheus das Neves Fernandes

## Introdução


Este trabalho implementa um Perceptron Multicamada (MLP) para resolver o problema de classificação de dígitos manuscritos do conjunto de dados MNIST. Foram exploradas diferentes arquiteturas e técnicas de treinamento, avaliando o desempenho de cada abordagem.


## 1. Preparação do Ambiente
Nesta seção, importamos as bibliotecas necessárias, carregamos o conjunto de dados MNIST e preparamos o dataloader


### 1.1. Importação de Bibliotecas

In [9]:
import tensorflow as tf
from d2l import tensorflow as d2l
d2l.use_svg_display()
import pdb
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix

### 1.2. Carregamento e Pré-processamento dos Dados

In [10]:
class MNIST(d2l.DataModule):  # @save
    """O dataset MNIST."""
    
    def __init__(self, batch_size=64):
        super().__init__()
        self.save_hyperparameters()
        self.train, self.val = tf.keras.datasets.mnist.load_data()
        
    def get_dataloader(self, train):
        data = self.train if train else self.val
        process = lambda X, y: (tf.expand_dims(X, axis=3) / 255,tf.cast(y, dtype='int32'))
        resize_fn = lambda X, y: (tf.image.resize_with_pad(X,*self.resize), y)
        shuffle_buf = len(data[0]) if train else 1
        return tf.data.Dataset.from_tensor_slices(process(*data)).batch(self.batch_size).shuffle(shuffle_buf)

#Instancia o dataset
data = MNIST()

In [11]:
# Verifica o número de exemplos e formato dos dados
print(len(data.train[0]), len(data.val[0]))
print(data.train[0].shape)

60000 10000
(60000, 28, 28)


In [12]:
#Verifica um minibatch
X, y = next(iter(data.train_dataloader()))
print(X.shape, X.dtype, y.shape, y.dtype)

(64, 28, 28, 1) <dtype: 'float32'> (64,) <dtype: 'int32'>


## 2. Modelos e Arquiteturas
Nesta seção, implementamos diferentes arquiteturas e técnicas de treinamento para o MLP, avaliando o impacto no desempenho.


### 2a. Perceptron com uma Camada Escondida, Função Logística, Custo SSE e Descida de Gradiente


In [13]:
# Converte os rótulos para One-Hot
def preprocess_labels(dataset):
    return dataset.map(lambda X, y: (X, tf.one_hot(y, depth=10)))

# Prepara os dados para treinamento com os rotulos processados
train_dataloader = preprocess_labels(data.get_dataloader(train=True))
val_dataloader = preprocess_labels(data.get_dataloader(train=False))

In [14]:
model = tf.keras.Sequential([ 
    tf.keras.layers.Flatten(),  #Transforma a entrada (28x28) em vetor (784)
    tf.keras.layers.Dense(128, activation='sigmoid'),   #Camada escondida
    tf.keras.layers.Dense(10, activation='sigmoid')  #Camada de saída
])   

model.compile(
    optimizer=tf.keras.optimizers.SGD(learning_rate = 0.1),   #Descida por gradiente
    loss=tf.keras.losses.MeanSquaredError(),  #Erro quadrático médio
    metrics=['accuracy']
)

model.fit(
    train_dataloader,               # Dados de treinamento
    epochs=10,                       # Número de épocas
    validation_data=val_dataloader  # Dados de validação
)

Epoch 1/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 4ms/step - accuracy: 0.2348 - loss: 0.0994 - val_accuracy: 0.4401 - val_loss: 0.0839
Epoch 2/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 4ms/step - accuracy: 0.4623 - loss: 0.0820 - val_accuracy: 0.5819 - val_loss: 0.0750
Epoch 3/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 4ms/step - accuracy: 0.6047 - loss: 0.0727 - val_accuracy: 0.6725 - val_loss: 0.0653
Epoch 4/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 4ms/step - accuracy: 0.6779 - loss: 0.0635 - val_accuracy: 0.7341 - val_loss: 0.0570
Epoch 5/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 4ms/step - accuracy: 0.7275 - loss: 0.0557 - val_accuracy: 0.7690 - val_loss: 0.0506
Epoch 6/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 4ms/step - accuracy: 0.7607 - loss: 0.0501 - val_accuracy: 0.7985 - val_loss: 0.0459
Epoch 7/10
[1m938/938[0m 

<keras.src.callbacks.history.History at 0x7b78159f2640>

### 2b. Saída do tipo softmax, custo "entropia cruzada"

In [15]:
# Prepara os dados de treinamento sem conversão para One-Hot
train_dataloader = data.get_dataloader(train=True)
val_dataloader = data.get_dataloader(train=False)

In [16]:
model = tf.keras.Sequential([ 
    tf.keras.layers.Flatten(),  #Transforma a entrada (28x28) em vetor (784)
    tf.keras.layers.Dense(128, activation='sigmoid'),   #Camada escondida
    tf.keras.layers.Dense(10, activation='softmax')  #Camada de saída
])   

model.compile(
    optimizer=tf.keras.optimizers.SGD(learning_rate = 0.1),   #Descida por gradiente
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),  #Entropia cruzada
    metrics=['accuracy']
)

model.fit(
    train_dataloader,               # Dados de treinamento
    epochs=10,                      # Número de épocas
    validation_data=val_dataloader  # Dados de validação
)

Epoch 1/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 4ms/step - accuracy: 0.7261 - loss: 1.1275 - val_accuracy: 0.8982 - val_loss: 0.3778
Epoch 2/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 4ms/step - accuracy: 0.8962 - loss: 0.3719 - val_accuracy: 0.9124 - val_loss: 0.3133
Epoch 3/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 4ms/step - accuracy: 0.9064 - loss: 0.3208 - val_accuracy: 0.9190 - val_loss: 0.2869
Epoch 4/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 4ms/step - accuracy: 0.9162 - loss: 0.2887 - val_accuracy: 0.9253 - val_loss: 0.2624
Epoch 5/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 4ms/step - accuracy: 0.9240 - loss: 0.2685 - val_accuracy: 0.9293 - val_loss: 0.2461
Epoch 6/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 4ms/step - accuracy: 0.9261 - loss: 0.2540 - val_accuracy: 0.9353 - val_loss: 0.2330
Epoch 7/10
[1m938/938[0m 

<keras.src.callbacks.history.History at 0x7b781bda41f0>

### 2c. Otimizador Adam

In [18]:
model = tf.keras.Sequential([ 
    tf.keras.layers.Flatten(),  #Transforma a entrada (28x28) em vetor (784)
    tf.keras.layers.Dense(128, activation='sigmoid'),   #Camada escondida
    tf.keras.layers.Dense(10, activation='softmax')  #Camada de saída
])   

model.compile(
    optimizer=tf.keras.optimizers.Adam(),   #Otimizador Adam
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),  #Entropia cruzada
    metrics=['accuracy']
)

model.fit(
    train_dataloader,               # Dados de treinamento
    epochs=10,                      # Número de épocas
    validation_data=val_dataloader  # Dados de validação
)

Epoch 1/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 5ms/step - accuracy: 0.8029 - loss: 0.8123 - val_accuracy: 0.9274 - val_loss: 0.2637
Epoch 2/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - accuracy: 0.9296 - loss: 0.2508 - val_accuracy: 0.9418 - val_loss: 0.1994
Epoch 3/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - accuracy: 0.9468 - loss: 0.1894 - val_accuracy: 0.9525 - val_loss: 0.1621
Epoch 4/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - accuracy: 0.9549 - loss: 0.1537 - val_accuracy: 0.9593 - val_loss: 0.1401
Epoch 5/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - accuracy: 0.9640 - loss: 0.1260 - val_accuracy: 0.9636 - val_loss: 0.1216
Epoch 6/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - accuracy: 0.9721 - loss: 0.1027 - val_accuracy: 0.9680 - val_loss: 0.1096
Epoch 7/10
[1m938/938[0m 

<keras.src.callbacks.history.History at 0x7b781c3f3550>

### 2d. A função Relu

In [19]:
model = tf.keras.Sequential([ 
    tf.keras.layers.Flatten(),  #Transforma a entrada (28x28) em vetor (784)
    tf.keras.layers.Dense(128, activation='relu'),   #Camada escondida
    tf.keras.layers.Dense(10, activation='softmax')  #Camada de saída
])   

model.compile(
    optimizer=tf.keras.optimizers.Adam(),   #Otimizador Adam
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),  #Entropia cruzada
    metrics=['accuracy']
)

model.fit(
    train_dataloader,               # Dados de treinamento
    epochs=10,                      # Número de épocas
    validation_data=val_dataloader  # Dados de validação
)

Epoch 1/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 5ms/step - accuracy: 0.8519 - loss: 0.5351 - val_accuracy: 0.9493 - val_loss: 0.1753
Epoch 2/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - accuracy: 0.9534 - loss: 0.1597 - val_accuracy: 0.9640 - val_loss: 0.1212
Epoch 3/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 6ms/step - accuracy: 0.9694 - loss: 0.1070 - val_accuracy: 0.9720 - val_loss: 0.0980
Epoch 4/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - accuracy: 0.9786 - loss: 0.0754 - val_accuracy: 0.9746 - val_loss: 0.0815
Epoch 5/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - accuracy: 0.9821 - loss: 0.0597 - val_accuracy: 0.9746 - val_loss: 0.0827
Epoch 6/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - accuracy: 0.9866 - loss: 0.0467 - val_accuracy: 0.9758 - val_loss: 0.0760
Epoch 7/10
[1m938/938[0m 

<keras.src.callbacks.history.History at 0x7b781c249850>

### 2e. Redes maiores

#### Camada escondida com 256 neurônios

In [20]:
model = tf.keras.Sequential([ 
    tf.keras.layers.Flatten(),  #Transforma a entrada (28x28) em vetor (784)
    tf.keras.layers.Dense(256, activation='relu'),   #Camada escondida
    tf.keras.layers.Dense(10, activation='softmax')  #Camada de saída
])   

model.compile(
    optimizer=tf.keras.optimizers.Adam(),   #Otimizador Adam
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),  #Entropia cruzada
    metrics=['accuracy']
)

model.fit(
    train_dataloader,               # Dados de treinamento
    epochs=10,                      # Número de épocas
    validation_data=val_dataloader  # Dados de validação
)

Epoch 1/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 7ms/step - accuracy: 0.8774 - loss: 0.4430 - val_accuracy: 0.9581 - val_loss: 0.1355
Epoch 2/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 7ms/step - accuracy: 0.9644 - loss: 0.1206 - val_accuracy: 0.9707 - val_loss: 0.0932
Epoch 3/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 6ms/step - accuracy: 0.9772 - loss: 0.0762 - val_accuracy: 0.9760 - val_loss: 0.0789
Epoch 4/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 6ms/step - accuracy: 0.9844 - loss: 0.0528 - val_accuracy: 0.9766 - val_loss: 0.0793
Epoch 5/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 7ms/step - accuracy: 0.9884 - loss: 0.0410 - val_accuracy: 0.9773 - val_loss: 0.0747
Epoch 6/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 7ms/step - accuracy: 0.9919 - loss: 0.0280 - val_accuracy: 0.9786 - val_loss: 0.0681
Epoch 7/10
[1m938/938[0m 

<keras.src.callbacks.history.History at 0x7b781c0c5490>

#### 2 camadas escondidas com 256 neurônios cada

In [21]:
model = tf.keras.Sequential([ 
    tf.keras.layers.Flatten(),  #Transforma a entrada (28x28) em vetor (784)
    tf.keras.layers.Dense(256, activation='relu'),   #Camada escondida 1
    tf.keras.layers.Dense(256, activation='relu'),   #Camada escondida 2
    tf.keras.layers.Dense(10, activation='softmax')  #Camada de saída
])   

model.compile(
    optimizer=tf.keras.optimizers.Adam(),   #Otimizador Adam
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),  #Entropia cruzada
    metrics=['accuracy']
)

model.fit(
    train_dataloader,               # Dados de treinamento
    epochs=10,                      # Número de épocas
    validation_data=val_dataloader  # Dados de validação
)

Epoch 1/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 8ms/step - accuracy: 0.8808 - loss: 0.4047 - val_accuracy: 0.9633 - val_loss: 0.1178
Epoch 2/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 8ms/step - accuracy: 0.9719 - loss: 0.0904 - val_accuracy: 0.9750 - val_loss: 0.0784
Epoch 3/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 8ms/step - accuracy: 0.9810 - loss: 0.0608 - val_accuracy: 0.9738 - val_loss: 0.0852
Epoch 4/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 9ms/step - accuracy: 0.9867 - loss: 0.0411 - val_accuracy: 0.9788 - val_loss: 0.0661
Epoch 5/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 8ms/step - accuracy: 0.9898 - loss: 0.0322 - val_accuracy: 0.9780 - val_loss: 0.0748
Epoch 6/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 8ms/step - accuracy: 0.9924 - loss: 0.0234 - val_accuracy: 0.9753 - val_loss: 0.0860
Epoch 7/10
[1m938/938[0m 

<keras.src.callbacks.history.History at 0x7b781bd1f9d0>