#Problème - Session n°2 : une variable cachée

Dans ce problème, on travaille sur un jeu de données comportant 50.000 entrées $x_i$ et des cibles $y_i$. Les entrées sont des vecteurs de taille 10 (au format torch), les cibles sont des scalaires construits à partir de cinq fonctions différentes ($f_0$, ..., $f_4$) : \

$$ \forall i, \exists k\in [\![0 \;;4]\!]  \:\: \text{tel que} \: f_k(x_i) = y_i $$

Ces fonctions sont inconnues, ainsi que l'indice $k$. Par contre, on sait que le groupe des 1000 premières cibles ont été construites à partir du même indice  $k$, de même pour les mille  suivantes, et ainsi de suite.

Le but est de parvenir à rassembler les groupes de cibles qui ont été générées avec le même indice $k$ (avec la même fonction).

In [2]:
# Example d'échantillonnage du dataset
import torch
from torch.utils.data import DataLoader

! git clone https://github.com/medz1966/exam_2025_session2.git
! cp exam_2025_session2/utils/utils.py .
from utils import Problem1Dataset

dataset = Problem1Dataset()
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

for batch in dataloader:
    x_batch, y_batch, k_batch, idx_batch = batch
    print("Batch input shape:", x_batch.shape)
    print("Batch target shape:", y_batch.shape)
    print("Batch k shape:", k_batch.shape) # indice k (pas utilisable à l'entraînement)
    print("Batch indices shape:", idx_batch.shape)
    break

Cloning into 'exam_2025_session2'...
remote: Enumerating objects: 81, done.[K
remote: Counting objects: 100% (81/81), done.[K
remote: Compressing objects: 100% (77/77), done.[K
remote: Total 81 (delta 23), reused 8 (delta 1), pack-reused 0 (from 0)[K
Receiving objects: 100% (81/81), 312.73 KiB | 15.64 MiB/s, done.
Resolving deltas: 100% (23/23), done.
Batch input shape: torch.Size([32, 10])
Batch target shape: torch.Size([32, 1])
Batch k shape: torch.Size([32])
Batch indices shape: torch.Size([32])


**Consignes :**
- Entraîner l'architecture proposée dans la cellule suivante.
- Montrer que les vecteurs 2D de self.theta permettent de répondre
  au problème posé.
- Décrire le rôle de self.theta, du vector noise \
 et ainsi que la raison de la division par 1000 (**indices // 1000** dans le code).

In [5]:
class DeepMLP(nn.Module):
    def __init__(self, input_dim, output_dim, hidden_dim=256):
        super(DeepMLP, self).__init__()
        self.theta = nn.Parameter(torch.randn(50, 2))
        self.fc1 = nn.Linear(input_dim + 2, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)
        self.fc3 = nn.Linear(hidden_dim, hidden_dim)
        self.fc4 = nn.Linear(hidden_dim, output_dim)

    def forward(self, x, indices):
        theta_batch = self.theta[indices // 1000, :]
        noise = torch.normal(mean=torch.zeros_like(theta_batch),
                             std=torch.ones_like(theta_batch))
        x = torch.cat([x, theta_batch + noise], dim=1)
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = torch.relu(self.fc3(x))
        x = self.fc4(x)
        return x, theta_batch

In [12]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from utils import Problem1Dataset

# Check if GPU is available and set the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Load the dataset
dataset = Problem1Dataset()
dataloader = DataLoader(dataset, batch_size=128, shuffle=True)  # Increased batch size for faster training

# Initialize the model
input_dim = 10  # Each input is a vector of size 10
output_dim = 1   # Each target is a scalar
model = DeepMLP(input_dim, output_dim).to(device)  # Move model to GPU

# Define the loss function and optimizer
loss_fn = nn.MSELoss()  # Mean Squared Error for regression
optimizer = optim.Adam(model.parameters(), lr=0.001)  # Adam optimizer

# Training loop
num_epochs = 2000
for epoch in range(num_epochs):
    for batch in dataloader:
        x_batch, y_batch, _, idx_batch = batch

        # Move data to the GPU
        x_batch = x_batch.to(device)
        y_batch = y_batch.to(device)
        idx_batch = idx_batch.to(device)

        # Zero the gradients
        optimizer.zero_grad()

        # Forward pass
        y_pred, theta_batch = model(x_batch, idx_batch)

        # Calculate the loss
        loss = loss_fn(y_pred, y_batch)

        # Backward pass and optimization
        loss.backward()
        optimizer.step()

    # Print loss every 100 epochs
    if epoch % 100 == 0:
        print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}")

# After training, inspect the learned theta vectors
print("Learned theta vectors:")
print(model.theta)

Using device: cuda
Epoch [1/50], Loss: 210.4705
Epoch [2/50], Loss: 133.2904
Epoch [3/50], Loss: 157.7950
Epoch [4/50], Loss: 139.4067
Epoch [5/50], Loss: 125.4037
Epoch [6/50], Loss: 92.8253
Epoch [7/50], Loss: 122.1277
Epoch [8/50], Loss: 44.3012
Epoch [9/50], Loss: 78.7368
Epoch [10/50], Loss: 72.6585
Epoch [11/50], Loss: 58.5326
Epoch [12/50], Loss: 30.5738
Epoch [13/50], Loss: 26.8217
Epoch [14/50], Loss: 36.0434
Epoch [15/50], Loss: 17.9923
Epoch [16/50], Loss: 22.3653
Epoch [17/50], Loss: 20.9041
Epoch [18/50], Loss: 11.0733
Epoch [19/50], Loss: 21.9626
Epoch [20/50], Loss: 24.8648
Epoch [21/50], Loss: 17.1325
Epoch [22/50], Loss: 10.5360
Epoch [23/50], Loss: 6.5659
Epoch [24/50], Loss: 27.4344
Epoch [25/50], Loss: 7.4797
Epoch [26/50], Loss: 5.3714
Epoch [27/50], Loss: 13.5706
Epoch [28/50], Loss: 4.3513
Epoch [29/50], Loss: 4.6908
Epoch [30/50], Loss: 3.2247
Epoch [31/50], Loss: 3.0353
Epoch [32/50], Loss: 17.9652
Epoch [33/50], Loss: 6.2755
Epoch [34/50], Loss: 5.4336
Epoch [

3. Describe the Role of self.theta, noise, and Division by 1000
self.theta:

This is a learnable 2D vector for each group of 1000 samples.

It represents the "signature" of the group, capturing the characteristics of the function
f
k
f
k
​
  used to generate the targets for that group.

During training, the model learns to adjust these vectors to distinguish between groups.

noise:

Noise is added to the theta vectors during training.

It helps the model generalize better by introducing small random variations.

Without noise, the model might overfit to the training data and fail to generalize to new samples.

Division by 1000:

The dataset is divided into groups of 1000 samples, and each group uses the same function
f
k
f
k
​
  to generate the targets.

The formula indices // 1000 assigns the correct theta vector to each sample based on its group.

For example:

Samples 0 to 999 → group 0 → use theta[0].

Samples 1000 to 1999 → group 1 → use theta[1].

And so on.

