# Solution 1 IMPROVED: CNN with Stratified Split

**Key Improvements**:
- ‚≠ê **Stratified train/val split** by label to prevent distribution mismatch
- ‚≠ê **Larger stride (150)** to reduce temporal leakage (50% overlap vs 83%)
- ‚≠ê **User-level awareness** in splitting to prevent person-specific overfitting
- Strong regularization (dropout=0.4, weight_decay=1e-3)
- Early stopping on validation loss

In [1]:
# Set seed for reproducibility
SEED = 42

import os
os.environ['PYTHONHASHSEED'] = str(SEED)
os.environ['MPLCONFIGDIR'] = os.getcwd() + '/configs/'

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=Warning)

import random
import numpy as np
np.random.seed(SEED)
random.seed(SEED)

import torch
torch.manual_seed(SEED)
from torch import nn
from torch.utils.data import DataLoader, TensorDataset

if torch.cuda.is_available():
    device = torch.device("cuda")
    torch.cuda.manual_seed_all(SEED)
else:
    device = torch.device("cpu")

import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score, classification_report, confusion_matrix

print(f"PyTorch version: {torch.__version__}")
print(f"Device: {device}")

PyTorch version: 2.9.0
Device: cpu


In [2]:
# Load and preprocess data
df = pd.read_csv("pirate_pain_train.csv")
target = pd.read_csv("pirate_pain_train_labels.csv")

# Encode categorical features
number_cols = ['n_legs', 'n_hands', 'n_eyes']
for col in number_cols:
    df[col] = df[col].astype('category').cat.codes

# Normalize joint columns
joint_cols = ["joint_" + str(i).zfill(2) for i in range(31)]
for col in joint_cols:
    df[col] = df[col].astype(np.float32)

minmax_scaler = MinMaxScaler()
df[joint_cols] = minmax_scaler.fit_transform(df[joint_cols])

data_cols = number_cols + joint_cols

# Map labels
label_mapping = {'no_pain': 0, 'low_pain': 1, 'high_pain': 2}
target['label'] = target['label'].map(label_mapping)

print("Data loaded and preprocessed")
print(f"Features: {len(data_cols)}")
print(f"Samples: {len(df['sample_index'].unique())}")
print(f"\nClass distribution:")
print(target['label'].value_counts().sort_index())

Data loaded and preprocessed
Features: 34
Samples: 661

Class distribution:
label
0    511
1     94
2     56
Name: count, dtype: int64


In [3]:
# ‚≠ê IMPROVED: Stratified split at USER level
# Create user-label mapping (one label per user)
user_labels = target.groupby('sample_index')['label'].first()

print("\nUser-level class distribution:")
print(user_labels.value_counts().sort_index())

# Stratified split to ensure both train and val have similar class distributions
train_users, val_users = train_test_split(
    user_labels.index, 
    test_size=0.2, 
    random_state=SEED,
    stratify=user_labels.values  # ‚Üê KEY IMPROVEMENT
)

print(f"\nTrain users: {len(train_users)}")
print(f"Val users: {len(val_users)}")
print("\nTrain user class distribution:")
print(user_labels[train_users].value_counts().sort_index())
print("\nVal user class distribution:")
print(user_labels[val_users].value_counts().sort_index())


User-level class distribution:
label
0    511
1     94
2     56
Name: count, dtype: int64

Train users: 528
Val users: 133

Train user class distribution:
label
0    408
1     75
2     45
Name: count, dtype: int64

Val user class distribution:
label
0    103
1     19
2     11
Name: count, dtype: int64


In [4]:
# Build sequences with LARGER STRIDE to reduce temporal leakage
WINDOW_SIZE = 300
STRIDE = 150  # ‚≠ê IMPROVED: 50% overlap instead of 83%

def build_sequences(df, target_df, window=300, stride=150):
    dataset = []
    labels = []
    
    for id in df['sample_index'].unique():
        temp = df[df['sample_index'] == id][data_cols].values
        label = target_df[target_df['sample_index'] == id]['label'].values[0]
        
        padding_len = window - len(temp) % window
        padding = np.zeros((padding_len, len(data_cols)), dtype='float32')
        temp = np.concatenate((temp, padding))
        
        idx = 0
        while idx + window <= len(temp):
            dataset.append(temp[idx:idx + window])
            labels.append(label)
            idx += stride
    
    return np.array(dataset), np.array(labels)

df_train = df[df['sample_index'].isin(train_users)]
df_val = df[df['sample_index'].isin(val_users)]

X_train, y_train = build_sequences(df_train, target, WINDOW_SIZE, STRIDE)
X_val, y_val = build_sequences(df_val, target, WINDOW_SIZE, STRIDE)

print(f"\nTraining sequences: {X_train.shape}")
print(f"Validation sequences: {X_val.shape}")

train_counts = np.bincount(y_train.astype(int))
val_counts = np.bincount(y_val.astype(int))

print(f"\nTrain sequence distribution:")
for cls, count in enumerate(train_counts):
    print(f"  Class {cls}: {count} ({count/len(y_train)*100:.1f}%)")
    
print(f"\nVal sequence distribution:")
for cls, count in enumerate(val_counts):
    print(f"  Class {cls}: {count} ({count/len(y_val)*100:.1f}%)")


Training sequences: (528, 300, 34)
Validation sequences: (133, 300, 34)

Train sequence distribution:
  Class 0: 408 (77.3%)
  Class 1: 75 (14.2%)
  Class 2: 45 (8.5%)

Val sequence distribution:
  Class 0: 103 (77.4%)
  Class 1: 19 (14.3%)
  Class 2: 11 (8.3%)


In [5]:
# Direct oversampling to balance dataset
target_count = train_counts[0]
duplication_factors = np.ceil(target_count / train_counts).astype(int)

X_train_balanced = []
y_train_balanced = []

for cls in range(len(train_counts)):
    cls_indices = np.where(y_train == cls)[0]
    for _ in range(duplication_factors[cls]):
        X_train_balanced.append(X_train[cls_indices])
        y_train_balanced.append(y_train[cls_indices])

X_train_balanced = np.concatenate(X_train_balanced, axis=0)
y_train_balanced = np.concatenate(y_train_balanced, axis=0)

# Shuffle balanced dataset
shuffle_idx = np.random.permutation(len(X_train_balanced))
X_train_balanced = X_train_balanced[shuffle_idx]
y_train_balanced = y_train_balanced[shuffle_idx]

balanced_counts = np.bincount(y_train_balanced.astype(int))
print("Balanced distribution:", balanced_counts)
print(f"Total training samples: {len(y_train_balanced)}")

Balanced distribution: [408 450 450]
Total training samples: 1308


In [6]:
# 1D CNN Model with STRONGER regularization
class CNN1DClassifier(nn.Module):
    def __init__(self, input_size, num_classes, dropout=0.4):
        super().__init__()
        
        # Convolutional layers
        self.conv1 = nn.Conv1d(input_size, 64, kernel_size=5, padding=2)
        self.bn1 = nn.BatchNorm1d(64)
        self.conv2 = nn.Conv1d(64, 128, kernel_size=5, padding=2)
        self.bn2 = nn.BatchNorm1d(128)
        self.conv3 = nn.Conv1d(128, 256, kernel_size=3, padding=1)
        self.bn3 = nn.BatchNorm1d(256)
        
        self.relu = nn.ReLU()
        self.pool = nn.MaxPool1d(2)
        self.dropout = nn.Dropout(dropout)  # ‚≠ê IMPROVED: 0.4 instead of 0.3
        
        # Global Average Pooling
        self.global_pool = nn.AdaptiveAvgPool1d(1)
        
        # Classifier
        self.fc = nn.Sequential(
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(128, num_classes)
        )
    
    def forward(self, x):
        # x shape: (batch, seq_len, features)
        x = x.transpose(1, 2)  # -> (batch, features, seq_len)
        
        # Conv blocks
        x = self.pool(self.relu(self.bn1(self.conv1(x))))
        x = self.dropout(x)
        
        x = self.pool(self.relu(self.bn2(self.conv2(x))))
        x = self.dropout(x)
        
        x = self.relu(self.bn3(self.conv3(x)))
        
        # Global pooling
        x = self.global_pool(x).squeeze(-1)
        
        # Classifier
        x = self.fc(x)
        return x

model = CNN1DClassifier(input_size=X_train.shape[-1], num_classes=3, dropout=0.4).to(device)
print("CNN Model created with stronger regularization")
print(f"Parameters: {sum(p.numel() for p in model.parameters()):,}")

CNN Model created with stronger regularization
Parameters: 184,771


In [7]:
# Create dataloaders
BATCH_SIZE = 32

train_ds = TensorDataset(torch.from_numpy(X_train_balanced).float(), torch.from_numpy(y_train_balanced).long())
val_ds = TensorDataset(torch.from_numpy(X_val).float(), torch.from_numpy(y_val).long())

train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True, num_workers=0)
val_loader = DataLoader(val_ds, batch_size=BATCH_SIZE, shuffle=False, num_workers=0)

print(f"Train batches: {len(train_loader)}")
print(f"Val batches: {len(val_loader)}")

Train batches: 41
Val batches: 5


In [8]:
# Training setup with STRONGER regularization
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(
    model.parameters(), 
    lr=1e-3, 
    weight_decay=1e-3  # ‚≠ê IMPROVED: 1e-3 instead of 1e-4
)
scaler = torch.amp.GradScaler(enabled=(device.type == 'cuda'))

def train_epoch(model, loader, criterion, optimizer, scaler, device):
    model.train()
    total_loss = 0
    all_preds, all_labels = [], []
    
    for inputs, labels in loader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        
        with torch.amp.autocast(device_type=device.type, enabled=(device.type == 'cuda')):
            outputs = model(inputs)
            loss = criterion(outputs, labels)
        
        scaler.scale(loss).backward()
        scaler.unscale_(optimizer)
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        scaler.step(optimizer)
        scaler.update()
        
        total_loss += loss.item() * inputs.size(0)
        all_preds.extend(outputs.argmax(1).cpu().numpy())
        all_labels.extend(labels.cpu().numpy())
    
    avg_loss = total_loss / len(loader.dataset)
    f1 = f1_score(all_labels, all_preds, average='weighted', zero_division=0)
    return avg_loss, f1

def val_epoch(model, loader, criterion, device):
    model.eval()
    total_loss = 0
    all_preds, all_labels = [], []
    
    with torch.no_grad():
        for inputs, labels in loader:
            inputs, labels = inputs.to(device), labels.to(device)
            
            with torch.amp.autocast(device_type=device.type, enabled=(device.type == 'cuda')):
                outputs = model(inputs)
                loss = criterion(outputs, labels)
            
            total_loss += loss.item() * inputs.size(0)
            all_preds.extend(outputs.argmax(1).cpu().numpy())
            all_labels.extend(labels.cpu().numpy())
    
    avg_loss = total_loss / len(loader.dataset)
    f1 = f1_score(all_labels, all_preds, average='weighted', zero_division=0)
    return avg_loss, f1, all_preds, all_labels

print("Training functions ready")

Training functions ready


In [9]:
# Train with early stopping
print("=" * 80)
print("Training IMPROVED CNN Model with Stratified Split")
print("=" * 80)

EPOCHS = 50
best_val_loss = float('inf')
best_f1 = 0
patience = 10
patience_counter = 0

for epoch in range(1, EPOCHS + 1):
    train_loss, train_f1 = train_epoch(model, train_loader, criterion, optimizer, scaler, device)
    val_loss, val_f1, val_preds, val_labels = val_epoch(model, val_loader, criterion, device)
    
    # Save best model based on validation LOSS (not F1) to prevent overfitting
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        best_f1 = val_f1
        patience_counter = 0
        torch.save(model.state_dict(), 'models/cnn_improved_best.pt')
        print(f"Epoch {epoch:3d}/{EPOCHS} | Train: Loss={train_loss:.4f}, F1={train_f1:.4f} | Val: Loss={val_loss:.4f}, F1={val_f1:.4f} ‚≠ê BEST")
    else:
        patience_counter += 1
        if epoch % 5 == 0 or epoch == 1:
            print(f"Epoch {epoch:3d}/{EPOCHS} | Train: Loss={train_loss:.4f}, F1={train_f1:.4f} | Val: Loss={val_loss:.4f}, F1={val_f1:.4f}")
    
    # Early stopping
    if patience_counter >= patience:
        print(f"\nEarly stopping at epoch {epoch} (no improvement for {patience} epochs)")
        break

print("\n" + "=" * 80)
print(f"Best validation loss: {best_val_loss:.4f}")
print(f"Best validation F1: {best_f1:.4f}")
print("=" * 80)

Training IMPROVED CNN Model with Stratified Split
Epoch   1/50 | Train: Loss=0.8488, F1=0.6092 | Val: Loss=0.6192, F1=0.7488 ‚≠ê BEST
Epoch   4/50 | Train: Loss=0.3804, F1=0.8562 | Val: Loss=0.6031, F1=0.7974 ‚≠ê BEST
Epoch   5/50 | Train: Loss=0.3468, F1=0.8646 | Val: Loss=0.3876, F1=0.8856 ‚≠ê BEST
Epoch   6/50 | Train: Loss=0.3098, F1=0.8822 | Val: Loss=0.3541, F1=0.8794 ‚≠ê BEST
Epoch  10/50 | Train: Loss=0.2386, F1=0.9197 | Val: Loss=0.9396, F1=0.7392
Epoch  15/50 | Train: Loss=0.1919, F1=0.9258 | Val: Loss=0.4943, F1=0.8508

Early stopping at epoch 16 (no improvement for 10 epochs)

Best validation loss: 0.3541
Best validation F1: 0.8794


In [10]:
# Final evaluation
model.load_state_dict(torch.load('models/cnn_improved_best.pt'))
_, val_f1, val_preds, val_labels = val_epoch(model, val_loader, criterion, device)

print("\n" + "=" * 80)
print("üìä FINAL RESULTS (IMPROVED CNN with Stratified Split)")
print("=" * 80)
print(f"Validation F1: {val_f1:.4f}")

print("\nüìã Per-class metrics:")
print(classification_report(val_labels, val_preds, target_names=['no_pain', 'low_pain', 'high_pain'], digits=4))

print("\nüìà Confusion Matrix:")
cm = confusion_matrix(val_labels, val_preds)
print(cm)

unique_preds, counts = np.unique(val_preds, return_counts=True)
print("\nüéØ Prediction distribution:")
for cls, count in zip(unique_preds, counts):
    print(f"  Class {cls}: {count} predictions ({count/len(val_preds)*100:.1f}%)")

if len(unique_preds) >= 3:
    print("\n‚úÖ SUCCESS: Model predicts ALL 3 classes!")
    print("\nüí° Key improvements:")
    print("   - Stratified split ensures fair validation")
    print("   - Larger stride reduces temporal leakage")
    print("   - Stronger regularization prevents overfitting")
elif len(unique_preds) == 2:
    print("\n‚ö†Ô∏è  PARTIAL: Predicts 2 out of 3 classes")
else:
    print("\n‚ùå FAILED: Stuck on 1 class")


üìä FINAL RESULTS (IMPROVED CNN with Stratified Split)
Validation F1: 0.8794

üìã Per-class metrics:
              precision    recall  f1-score   support

     no_pain     0.9238    0.9417    0.9327       103
    low_pain     0.7083    0.8947    0.7907        19
   high_pain     1.0000    0.3636    0.5333        11

    accuracy                         0.8872       133
   macro avg     0.8774    0.7334    0.7522       133
weighted avg     0.8993    0.8872    0.8794       133


üìà Confusion Matrix:
[[97  6  0]
 [ 2 17  0]
 [ 6  1  4]]

üéØ Prediction distribution:
  Class 0: 105 predictions (78.9%)
  Class 1: 24 predictions (18.0%)
  Class 2: 4 predictions (3.0%)

‚úÖ SUCCESS: Model predicts ALL 3 classes!

üí° Key improvements:
   - Stratified split ensures fair validation
   - Larger stride reduces temporal leakage
   - Stronger regularization prevents overfitting
