# MLP Weights Initialization

In this notebook, we will create and save locally the weights of a simple Multi-Layer Perceptron (MLP) for the IRIS dataset. 

The dataset can be downloaded from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/dataset/53/iris) or from other sources. It consists of:

- **Features**: Four flower characteristics:
    - *sepal_length*, *sepal_width*, *petal_length*, *petal_width*
    
- **Label**: The target variable, which can take one of the following values:
    - *setosa*, *versicolor*, *virginica*
    
- **Dimensions**: The dataset consists of 150 rows, each containing the features and the target label.

The MLP is designed to be as simple as possible, as its **forward pass** will later be synthesized using VITIS. The architecture includes:

- **2 hidden layers**, each with the **ReLU** activation function
- **1 output layer** with 3 neurons (corresponding to the three species)

This ensures that the model remains efficient and easy to implement in hardware synthesis.

In [4]:
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
import numpy as np

# Set random seed for reproducibility
torch.manual_seed(4242)

df = pd.read_csv('../datasets/iris_dataset/iris_dataset.csv')

print("Shape of the dataset:", df.shape, "\n----------------------------------------------------------------")
print("Some rows of the dataset:\n\n", df.head())

# Extract the features and labels
X = df.iloc[:, :-1].values  # all columns except the last one are features
y = df.iloc[:, -1].values   # the last column is the label

# Encode labels (e.g. 'setosa' -> 0)
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(y)

# Save the dataset with encoded labels to a txt file
np.savetxt('../datasets/iris_dataset/iris_dataset_encoded.txt', np.c_[X, y], fmt='%f', header='', comments='')


# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)  # Use 30% for test

# Convert to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.long)
y_test = torch.tensor(y_test, dtype=torch.long)

# Create DataLoader for batching
train_data = TensorDataset(X_train, y_train)
test_data = TensorDataset(X_test, y_test)

train_loader = DataLoader(train_data, batch_size=16, shuffle=True)
test_loader = DataLoader(test_data, batch_size=16, shuffle=False)

Shape of the dataset: (150, 5) 
----------------------------------------------------------------
Some rows of the dataset:

    sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa


We will define the MLP as a class, inheriting from PyTorch's `nn.Module`. The structure will follow the architecture discussed above.

In [2]:
class MLP(nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.fc1 = nn.Linear(4, 10)
        self.fc2 = nn.Linear(10, 10)
        self.fc3 = nn.Linear(10, 3)
    
    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Finally, we can proceed to train the model. For this task, we will use:

- `CrossEntropyLoss` as the **Loss Function**, which is suitable for multi-class classification problems
- The `Adam` **Optimizer**, a widely used optimization algorithm

We will use $100$ epochs and set the learning rate to $0.01$

In [3]:
model = MLP()

# Check if GPU is available and move the model to GPU if possible
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

criterion = nn.CrossEntropyLoss() # Cross-entropy loss for classification
optimizer = optim.Adam(model.parameters(), lr=0.01) # Adam optimizer with learning rate 0.01

# Training loop
NUM_EPOCHS = 100
for epoch in range(NUM_EPOCHS):
    model.train()
    running_loss = 0.0
    correct_train = 0
    total_train = 0

    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

        # Calculate training accuracy
        _, predicted_train = torch.max(outputs.data, 1)
        total_train += labels.size(0)
        correct_train += (predicted_train == labels).sum().item()

    train_accuracy = correct_train / total_train  # Training accuracy

    # Evaluate the model after each epoch on test set
    model.eval()
    correct_test = 0
    total_test = 0
    with torch.no_grad():
        for inputs, labels in test_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, predicted_test = torch.max(outputs.data, 1)
            total_test += labels.size(0)
            correct_test += (predicted_test == labels).sum().item()

    test_accuracy = correct_test / total_test  # Test accuracy

    # Print loss and accuracy every 10 epochs
    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{NUM_EPOCHS}], Loss: {running_loss/len(train_loader):.4f}, '
              f'Train Accuracy: {train_accuracy * 100:.2f}%, Test Accuracy: {test_accuracy * 100:.2f}%')

Epoch [10/100], Loss: 0.3258, Train Accuracy: 90.48%, Test Accuracy: 88.89%
Epoch [20/100], Loss: 0.1060, Train Accuracy: 97.14%, Test Accuracy: 97.78%
Epoch [30/100], Loss: 0.1312, Train Accuracy: 95.24%, Test Accuracy: 97.78%
Epoch [40/100], Loss: 0.0853, Train Accuracy: 97.14%, Test Accuracy: 100.00%
Epoch [50/100], Loss: 0.0675, Train Accuracy: 98.10%, Test Accuracy: 100.00%
Epoch [60/100], Loss: 0.0971, Train Accuracy: 96.19%, Test Accuracy: 97.78%
Epoch [70/100], Loss: 0.0952, Train Accuracy: 96.19%, Test Accuracy: 97.78%
Epoch [80/100], Loss: 0.1020, Train Accuracy: 96.19%, Test Accuracy: 97.78%
Epoch [90/100], Loss: 0.0929, Train Accuracy: 96.19%, Test Accuracy: 97.78%
Epoch [100/100], Loss: 0.0788, Train Accuracy: 94.29%, Test Accuracy: 97.78%


First, let's extract the weights from the model:

In [4]:
weights = {}
for name, param in model.named_parameters():
    weights[name] = param.detach().numpy()

# Print their shapes to verify the network architecture
for name, weight in weights.items():
    print(f"{name}: {weight.shape}")

fc1.weight: (10, 4)
fc1.bias: (10,)
fc2.weight: (10, 10)
fc2.bias: (10,)
fc3.weight: (3, 10)
fc3.bias: (3,)


Finally, save them to a txt file:

In [5]:
import numpy as np

with open('./weights.txt', 'w') as f:
    for name, weight in weights.items():
        f.write(f"{name}\n")
        np.savetxt(f, weight, fmt='%f')