# Retele de perceptroni - Pytorch & Scikit Learn

### Definirea unei retele de perceptroni in Scikit-learn

In [1]:
from sklearn.neural_network import MLPClassifier

mlp_classifier_model = MLPClassifier(hidden_layer_sizes=(100, ),
                                     activation='relu',
                                     solver='adam',
                                     alpha=0.0001,
                                     batch_size='auto',
                                     learning_rate='constant',
                                     learning_rate_init=0.001,
                                     power_t=0.5,
                                     max_iter=200,
                                     shuffle=True,
                                     random_state=None,
                                     tol=0.0001,
                                     momentum=0.9,
                                     early_stopping=False,
                                     validation_fraction=0.1,
                                     n_iter_no_change=10)

Parametri:
- *hidden_layer_sizes* (tuple, default=(100, )): Un tuplu cu "n" elemente; al i-lea element reprezinta numarul de neurori din al i-lea strat ascuns.

(*hidden_layer_sizes=(neuroni_strat1, neuroni_strat2, neuroni_strat3, ...)* - Default: O retea cu un strat ascuns cu 100 de neuroni

- *activation*( {‘identity’, ‘logistic’, ‘tanh’, ‘relu’}, default=‘relu’)
  - ‘Identity’: 𝑓(𝑥) = 𝑥
  - ‘logistic’ : 𝑓(𝑥) = (1 + e^(−𝑥))^(-1)
  - ‘tanh’ : 𝑓(𝑥) = 𝑡𝑎𝑛ℎ(𝑥)
  - ‘relu’ : 𝑓(𝑥) = 𝑚𝑎𝑥(0, 𝑥)
  
- *solver* ( {‘lbfgs’, ‘sgd’, ‘adam’}, default=‘adam’): regula de invatare (update)
  - ‘sgd’ - stochastic gradient descent (doar pe acesta il vom folosi).

- *alpha* (float, default=0.0001): parametru pentru regularizare L2.

- *batch_size*: (int, default=‘auto’)
  - auto - marimea batch-ului pentru antrenare este min(200, n_samples).

- *learning_rate* ( {‘constant’, ‘invscaling’, ‘adaptive’}, default=‘constant’ ):
  - ‘constant’ : rata de invatare este constanta si este data de parametrul
  learning_rate_init.
  - ‘invscaling’: rata de invatare va fi scazuta la fiecare pas t, dupa
  formula: new_learning_rate = learning_rate_init / pow(t, power_t)
  - ‘adaptive’: pastreaza rata de invatare constanta cat timp eroarea
  scade. Daca eroarea nu scade cu cel putin tol (fata de epoca anterior)
  sau daca scorul pe multimea de validare (doar daca
  ealy_stopping=True) nu creste cu cel putin tol (fata de epoca
  anteriora), rata de invatare curenta se imparte la 5.

- *learning_rate_init* (double, default=0.001): rata de invatare
- *power_t* (double, default=0.5): parametrul pentru learning_rate=’invscaling’.
- *max_iter* (int, default=200): numarul maxim de epoci pentru antrenare.
- *shuffle* (bool, default=True): amesteca datele la fiecare epoca
- *tol* (float, default=1e-4) :
  - Daca eroarea sau scorul nu se imbunatatesc timp n_iter_no_chage
epoci consecutive (si learning_rate != ‘adaptive’) cu cel putin tol,
antrenarea se opreste.
- *momentum* (float, default=0.9): - valoarea pentru momentum cand se
foloseste gradient descent cu momentum. Trebuie sa fie intre 0 si 1.
- *early_stopping* (bool, default=False):
  - Daca este setat cu True atunci antrenarea se va termina daca eroarea
pe multimea de validare nu se imbunatateste timp n_iter_no_chage
epoci consecutive cu cel putin tol.
- *validation_fraction* (float, optional, default=0.1):
  - Procentul din multimea de antrenare care sa fie folosit pentru validare
(doar cand early_stopping=True). Trebuie sa fie intre 0 si 1.
- *n_iter_no_change* : (int, optional, default 10, sklearn-versiune-0.20)
  - Numarul maxim de epoci fara imbunatatiri (eroare sau scor).

Mai departe in restul laboratorului ne vom focusa pe implementara retelelor neuronale folosind libraria Pytorch

### Install Pytorch


Accesati linkul: https://pytorch.org, iar la sectiunea "Install Pytorch" selectati detaliile conform specificatiilor masinii voastre. Mai precis, daca masina dispune de o placa video atunci lasati selectia nemodificata, in caz contrar selectati CPU in campul "Compute Platform".


Pentru a verifica daca instalarea a fost cu succes, puteti rula urmatorul bloc de cod:


In [2]:
import torch
x = torch.rand(5, 3)
print(x)

tensor([[0.0262, 0.8300, 0.1864],
        [0.0230, 0.8618, 0.2052],
        [0.8117, 0.2919, 0.4342],
        [0.9858, 0.7974, 0.7526],
        [0.0235, 0.2041, 0.4412]])


Pentru a verifica daca GPU-ul este accesibil de catre Pytorch, puteti rula codul urmator. Daca totul este in regula, ultima linie ar trebui sa returneze True.

In [3]:
import torch
torch.cuda.is_available()

True

In [4]:
!nvidia-smi

Fri Apr 11 00:37:05 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   43C    P8              9W /   70W |       2MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

### Definirea retelei neuronale

Pentru a crea un model in Pytorch este necesar sa extindem clasa **nn.Module**, iar in constructor vom defini straturile retelei care vor fi folosite in implementarea functiei **forward**. Mai jos aveti un exemplu pentru un Multilayer Perceptron cu un singur strat ascuns.

- stratul **Flatten** transforma datele de intrare in vectori 1-dimensionali.
- stratul **Linear** aplica o transformare liniara: xW<sup>T</sup>+b. Pentru acest strat trebuie sa specificam dimensiunile matricei W, care corespund cu dimensiunea tensorilor de intrare si iesire.

In [5]:
# Varianta 1, mai explicita, lunga, si flexibila la logica adaugata intre straturi (control)
import torch.nn as nn
import torch.nn.functional as F

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.first_layer = nn.Linear(28 * 28, 512) # FC1
        # self.dropout1 = nn.Dropout(p=0.2)
        self.second_layer = nn.Linear(512, 512) # FC2
        # self.dropout2 = nn.Dropout(p=0.2)
        self.output_layer = nn.Linear(512, 10) # FC3

    def forward(self, x):
        x = self.flatten(x)
        x = F.relu(self.first_layer(x))
        # x = self.dropout1(x)
        x = F.relu(self.second_layer(x))
        # x = self.dropout2(x)
        x = self.output_layer(x)
        return x

In [None]:
# Varianta 2, mai compacta si eleganta pentru retele simple (dar mai rigida la schimbari)
import torch.nn as nn

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28 * 28, 512),
            nn.ReLU(),
            # nn.Dropout(p=0.2),
            nn.Linear(512, 512),
            nn.ReLU(),
            # nn.Dropout(p=0.2),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        return self.model(x)

model = NeuralNetwork().to(device)

Putem sa si fortam device-ul pe care antrenam, cu:


```
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
```

sau

```
device = torch.device("cpu") / device = torch.device("cuda") / device = torch.device("cuda:1") - daca avem mai multe placi video
```

dupa care ii dam:

```
model.to(device)
```

Trecerea unui exemplu prin reteaua precedenta se poate executa in felul urmator:

In [None]:
model = NeuralNetwork()
model(torch.rand(5, 1, 28, 28))

tensor([[-0.0025, -0.0220,  0.0092,  0.0460,  0.0248, -0.1026,  0.0115, -0.0041,
          0.0397,  0.0023],
        [ 0.0270, -0.0269,  0.0483,  0.0569, -0.0073, -0.1117,  0.1061,  0.0082,
          0.0087,  0.0214],
        [ 0.0053, -0.0357,  0.0217,  0.0450, -0.0064, -0.1106,  0.0631,  0.0425,
          0.0024,  0.0560],
        [-0.0077, -0.0647,  0.0059,  0.0206,  0.0425, -0.1246,  0.0336,  0.0064,
          0.0097,  0.0556],
        [ 0.0439, -0.0590, -0.0068, -0.0175,  0.0175, -0.1118,  0.0483,  0.0035,
          0.0362,  0.0694]], grad_fn=<AddmmBackward0>)

### Antrenarea retelei

Pentru antrenarea retelei avem nevoie de date de antrenare, un algoritm de optimizare si o functie de pierdere pe care sa o minimizam pe setul de antrenare.

Vom folosi MNIST pentru a ilustra o procedura de antrenare in Pytorch, ca algoritm de optimizare vom folosi stochastic gradient descent (SGD), iar functia de optimizare va fi cross entropy.


Crearea seturilor de date si a dataloader-lor care ne vor ajuta sa iteram prin batch-uri in timpul unei epoci:

In [6]:
# Varianta 1, mai explicita

from torchvision import datasets
from torchvision.transforms import ToTensor
from torch.utils.data import DataLoader

train_data = datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.MNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

train_dataloader = DataLoader(train_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)

In [None]:
# Varianta 2, mai eleganta si mai robusta la schimbari/transformari

from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# transform = transforms.Compose([
#     transforms.ToTensor()                                     # -> Transforma si normalizeaza imaginea in tensori in intervalul 0-1
#     transforms.Normalize((0.1307,), (0.3081,))                # -> Normalizeaza datele cu (mean, std)
#     transforms.RandomRotation(15)                             # -> Roteste imaginea aleatoriu cu un unghi intre -15 si 15
#     transforms.RandomHorizontalFlip()                         # -> Intoarce imaginea orizontal cu probabilitate implicita de 0.5
#     transforms.RandomVerticalFlip()                           # -> La fel ^, doar ca intoarce imaginea vertical
#     transforms.RandomCrop(24)                                 # -> Taie aleator o portiune de dimensiunea 24x24 (size x size)
#     transforms.Resize((28, 28))                               # -> Face resize la dimensiunile la 28x28
#     transforms.ColorJitter(brightness=0.2, contrast=0.2)      # -> Modifica aleator luminozitatea, contrastul, stauratia, etc.
#     transforms.Grayscale(num_output_channels=1)               # -> Converteste imaginea de la RGB la Grayscale
#     transforms.CenterCrop(24)                                 # -> Taie zona centrala a imaginii
# ])

transform = transforms.ToTensor()

train_data = datasets.MNIST(root="data", train=True, download=True, transform=transform)
test_data = datasets.MNIST(root="data", train=False, download=True, transform=transform)

train_dataloader = DataLoader(train_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64)

In [None]:
train_dataloader.__dict__

{'dataset': Dataset MNIST
     Number of datapoints: 60000
     Root location: data
     Split: Train
     StandardTransform
 Transform: ToTensor(),
 'num_workers': 0,
 'prefetch_factor': None,
 'pin_memory': False,
 'pin_memory_device': '',
 'timeout': 0,
 'worker_init_fn': None,
 '_DataLoader__multiprocessing_context': None,
 'in_order': True,
 '_dataset_kind': 0,
 'batch_size': 64,
 'drop_last': False,
 'sampler': <torch.utils.data.sampler.SequentialSampler at 0x7d1fff297390>,
 'batch_sampler': <torch.utils.data.sampler.BatchSampler at 0x7d1fff424e90>,
 'generator': None,
 'collate_fn': <function torch.utils.data._utils.collate.default_collate(batch)>,
 'persistent_workers': False,
 '_DataLoader__initialized': True,
 '_IterableDataset_len_called': None,
 '_iterator': None}

Crearea modelului si definirea algoritmului de optimizare:

In [7]:
import torch
from torch import nn

model = NeuralNetwork()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

# Functia de antrenare a retelei

def train_loop(dataloader, model, loss_fn, optimizer):
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)
        pred = model(X)
        loss = loss_fn(pred, y)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            print(f"Loss: {loss.item():>7f}  [{batch * len(X):>5d}/{len(dataloader.dataset):>5d}]")

# Metoda de testare a performantei retelei:

def test_loop(dataloader, model, loss_fn):
    model.eval()
    test_loss, correct = 0, 0

    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= len(dataloader)
    correct /= len(dataloader.dataset)
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")


num_epochs = 10

for t in range(num_epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
Loss: 2.313639  [    0/60000]
Loss: 0.289800  [ 6400/60000]
Loss: 0.183688  [12800/60000]
Loss: 0.245247  [19200/60000]
Loss: 0.107182  [25600/60000]
Loss: 0.333738  [32000/60000]
Loss: 0.141004  [38400/60000]
Loss: 0.240234  [44800/60000]
Loss: 0.304911  [51200/60000]
Loss: 0.173267  [57600/60000]
Test Error: 
 Accuracy: 95.6%, Avg loss: 0.138787 

Epoch 2
-------------------------------
Loss: 0.093351  [    0/60000]
Loss: 0.109368  [ 6400/60000]
Loss: 0.099886  [12800/60000]
Loss: 0.098969  [19200/60000]
Loss: 0.037061  [25600/60000]
Loss: 0.141523  [32000/60000]
Loss: 0.072462  [38400/60000]
Loss: 0.125026  [44800/60000]
Loss: 0.154892  [51200/60000]
Loss: 0.106008  [57600/60000]
Test Error: 
 Accuracy: 96.7%, Avg loss: 0.109013 

Epoch 3
-------------------------------
Loss: 0.039883  [    0/60000]
Loss: 0.036797  [ 6400/60000]
Loss: 0.043843  [12800/60000]
Loss: 0.061818  [19200/60000]
Loss: 0.044012  [25600/60000]
Loss: 0.068877  [32000/600

### Exercitii

In [None]:
import torch.nn as nn
import torch.nn.functional as F

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.first_layer = nn.Linear(28 * 28, 512) # FC1
        # self.dropout1 = nn.Dropout(p=0.2)
        self.second_layer = nn.Linear(512, 512) # FC2
        # self.dropout2 = nn.Dropout(p=0.2)
        self.output_layer = nn.Linear(512, 10) # FC3

    def forward(self, x):
        x = self.flatten(x)
        x = F.relu(self.first_layer(x))
        # x = self.dropout1(x)
        x = F.relu(self.second_layer(x))
        # x = self.dropout2(x)
        x = self.output_layer(x)
        return x
    



1. Antrenati o retea de perceptroni care sa clasifice cifrele scrise de mana MNIST. Datele trebuie normalizate prin scaderea mediei si impartirea la deviatia standard. Antrenati pentru 5 epoci si testati urmatoarele configuratii de retele:

a. Definiti o retea cu un singur strat ascuns cu un singur neuron si folositi ca functie de activare tanh. Pentru optimizator folositi un learning rate de 1e-2.

b. Definiti o retea cu un singur strat ascuns cu 10 neuroni si folositi ca functie de activare tanh. Pentru optimizator folositi un learning rate de 1e-2.

c. Definiti o retea cu un singur strat ascuns cu 10 neuroni si folositi ca functie de activare tanh. Pentru optimizator folositi un learning rate de 1e-5.

d. Definiti o retea cu un singur strat ascuns cu 10 neuroni si folositi ca functie de activare tanh. Pentru optimizator folositi un learning rate de 10.

e. Definiti o retea cu 2 straturi ascunse cu 10 neuroni fiecare si folositi ca functie de activare tanh. Pentru optimizator folositi un learning rate de 1e-2.

f. Definiti o retea cu 2 straturi ascunse cu 10 neuroni fiecare si folositi ca functie de activare relu. Pentru optimizator folositi un learning rate de 1e-2.

g. Definiti o retea cu 2 straturi ascunse cu 100 neuroni fiecare si folositi ca functie de activare relu. Pentru optimizator folositi un learning rate de 1e-2.

h. Definiti o retea cu 2 straturi ascunse cu 100 neuroni fiecare si folositi ca functie de activare relu. Pentru optimizator folositi un learning rate de 1e-2 si momentum=0.9

In [None]:
from sklearn.datasets import fetch_openml
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
import numpy as np
from torch import tensor, float32, int16

X, y = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False)

# Preprocessing
y = y.astype(int)
X_tensor = tensor(X, dtype=float32)
y_tensor = tensor(y, dtype=int16)

# Splitting
x_train, x_test, y_train, y_test = train_test_split(X_tensor, y_tensor, test_size=0.2, random_state=0)

# Normalization
scaler = StandardScaler()
scaler.fit(x_train)

# print(scaler.mean_) # media
# print(scaler.scale_) # deviatia standard

scaled_x_train = scaler.transform(x_train)

scaled_x_test = scaler.transform(x_test)

EPOCHS = 5

mlp_classifier_model = MLPClassifier(hidden_layer_sizes=(100, 100, ),
                                     activation='relu',
                                     solver='adam',
                                     alpha=0.0001,
                                     batch_size='auto',
                                     learning_rate='constant',
                                     learning_rate_init=1e-2,
                                     power_t=0.5,
                                     max_iter=EPOCHS,
                                     shuffle=True,
                                     random_state=None,
                                     tol=0.0001,
                                     momentum=0.9,
                                     early_stopping=False,
                                     validation_fraction=0.1,
                                     n_iter_no_change=10)

mlp_classifier_model.fit(scaled_x_train, y_train)

predicted_labels = mlp_classifier_model.predict(scaled_x_test)

print(accuracy_score(y_test, predicted_labels))

0.9502857142857143


