## Training a Multi-Layer Perceptron (MLP) with Random Data

### Introduction
In this notebook, we will use a randomly generated dataset to train a Multi-Layer Perceptron (MLP) using PyTorch. This will help us understand the workflow of training a neural network, without relying on pre-existing datasets like MNIST.

### Mathematical Formulation

We will generate random data $ \mathbf{X} \in \mathbb{R}^{N \times D} $ (where $N$ is the number of samples and $D$ is the number of features) and corresponding labels $\mathbf{y} \in \mathbb{R}^{N} $, where the target is a random binary label (0 or 1).

The neural network will have:
1. An input layer with $D$ input features.
2. Two hidden layers with ReLU activations.
3. An output layer producing a probability (output between 0 and 1) using the sigmoid function.

---

### 1. Install PyTorch and Dependencies

```python
# First, let's install PyTorch (if not already installed)
!pip install torch 
```

### 2. Import Libraries


In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt
import numpy as np



In [2]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [3]:
torch.zeros(10)

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [4]:
torch.ones(10)

tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [6]:
torch.from_numpy(np.zeros(10))

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=torch.float64)

In [7]:
torch.from_numpy(np.zeros(10)).numpy()

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [10]:
torch.from_numpy(np.zeros(10)).cuda()

AssertionError: Torch not compiled with CUDA enabled

In [12]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch.from_numpy(np.zeros(10)).to(device)

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=torch.float64)

In [13]:
torch.from_numpy(np.zeros(10)).detach()

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=torch.float64)

In [16]:
torch.from_numpy(np.zeros(10)).to(device) + 2*torch.from_numpy(np.ones(10)).to(device)

tensor([2., 2., 2., 2., 2., 2., 2., 2., 2., 2.], dtype=torch.float64)

### 3. Generate Random Data

Here, we will generate synthetic data with $N = 1000$ samples and $D = 20$ features. The labels will be binary (0 or 1) and randomly assigned.

- $X$ is a tensor of size $N \times D$, where each row represents an input sample with $D$ features.
- $y$ is a tensor of size $N$, where each value is either 0 or 1, representing the label for each sample.


In [17]:
# Generate random data (N samples, D features)
N = 1000  # Number of samples
D = 20    # Number of features

# Random input data (features)
X = torch.randn(N, D)

# Random binary labels (target output)
y = torch.randint(0, 2, (N,))  # Random binary labels (0 or 1)

In [19]:
X.shape, y.shape

(torch.Size([1000, 20]), torch.Size([1000]))

In [20]:
X=X.to(device)
y=y.to(device)

f(x) -> Y


### 4. Define the MLP Model

We will define a simple MLP with:
1. An input layer of size $D$,
2. Two hidden layers (with 64 and 32 neurons respectively),
3. An output layer with 1 neuron, applying a sigmoid activation to output a probability.

1 -> 1 neuron  0/1
2 -> 2 neuron  y :0/1 -> 0:[1,0], 1:[0,1] # classifcation avec one hot encoding 
3  -> 3 neuron  0:[1,0,0], 1: [0,1,0] , 1: [0,0,1] # Multiclasse classification 

In [28]:
f=nn.Linear(in_features=20, out_features=1, bias=True).to(device)

In [29]:
X.shape

torch.Size([1000, 20])

In [33]:
f(X) 

tensor([[-2.9409e-01],
        [ 1.6678e-01],
        [-4.5586e-02],
        [ 6.4463e-01],
        [ 3.0867e-01],
        [-5.4459e-01],
        [ 8.7769e-01],
        [ 1.1898e-01],
        [-2.3899e-01],
        [-4.2924e-01],
        [-1.5609e-01],
        [ 8.8163e-01],
        [-1.1226e-01],
        [-2.5062e-01],
        [-3.8742e-01],
        [ 2.2449e-01],
        [ 3.4761e-01],
        [ 9.9565e-02],
        [-4.2858e-01],
        [-8.2571e-01],
        [ 5.3350e-02],
        [-7.2007e-02],
        [-5.5918e-01],
        [ 8.6131e-01],
        [ 3.6692e-01],
        [-6.0089e-01],
        [ 1.5981e-03],
        [-9.0100e-01],
        [ 9.4393e-02],
        [ 4.4586e-03],
        [ 1.9622e-01],
        [ 4.9404e-01],
        [ 2.3906e-01],
        [-1.4982e-01],
        [ 3.4067e-01],
        [-4.1321e-01],
        [-7.6999e-01],
        [ 3.4492e-01],
        [-4.8311e-02],
        [ 6.3808e-01],
        [-5.3933e-01],
        [ 1.9665e-01],
        [-4.1618e-01],
        [-2

In [32]:
torch.sigmoid(f(X) )

tensor([[0.4270],
        [0.5416],
        [0.4886],
        [0.6558],
        [0.5766],
        [0.3671],
        [0.7063],
        [0.5297],
        [0.4405],
        [0.3943],
        [0.4611],
        [0.7072],
        [0.4720],
        [0.4377],
        [0.4043],
        [0.5559],
        [0.5860],
        [0.5249],
        [0.3945],
        [0.3046],
        [0.5133],
        [0.4820],
        [0.3637],
        [0.7029],
        [0.5907],
        [0.3541],
        [0.5004],
        [0.2888],
        [0.5236],
        [0.5011],
        [0.5489],
        [0.6211],
        [0.5595],
        [0.4626],
        [0.5844],
        [0.3981],
        [0.3165],
        [0.5854],
        [0.4879],
        [0.6543],
        [0.3683],
        [0.5490],
        [0.3974],
        [0.4378],
        [0.5824],
        [0.7119],
        [0.4330],
        [0.2877],
        [0.6579],
        [0.2952],
        [0.6359],
        [0.6079],
        [0.4136],
        [0.6599],
        [0.4085],
        [0

f(x)-> sigmoid -> loss
               -> BCE
f(x) -> loss
    -> BCEwithlogits

L(f(X) , y)

In [37]:
criterion = nn.BCEWithLogitsLoss()
loss=criterion(f(X).reshape(-1), y.float())
loss.backward()

0/1 BCE
0-> 10 CE
1->01

### 5. Define Loss and Optimizer

For this binary classification task, we will use **Binary Cross-Entropy Loss** and the **Stochastic Gradient Descent (SGD)** optimizer.


<generator object Module.parameters at 0x129d23680>

In [39]:
optimizer = optim.Adam(f.parameters())

### 7. Train the Model

Now, we will define the training loop and train the model for several epochs.


In [40]:
f=nn.Linear(in_features=20, out_features=1, bias=True).to(device)
optimizer = optim.Adam(f.parameters())
criterion = nn.BCEWithLogitsLoss()

In [42]:
for it in range(10):  
        optimizer.zero_grad()
        output = f(X)
        loss = criterion(output.reshape(-1), y.float())
        loss.backward()
        optimizer.step()
        print(loss.item())

0.7427418828010559
0.7421107292175293
0.7414838671684265
0.7408615946769714
0.7402435541152954
0.7396301031112671
0.7390214204788208
0.7384174466133118
0.7378182411193848
0.7372239232063293


### 8. Evaluate the Model

After training, we will evaluate the model on the validation set.

In [43]:
predictions = torch.sigmoid(f(X)).detach().cpu().numpy().reshape(-1)

In [44]:
predictions.shape

(1000,)

In [49]:
np.mean(((predictions>=0.5)*1)==y.numpy())

0.496