# 🍍 LibriBrain Competition: Advanced (Phoneme Classification)
Welcome! This Colab is the starting point for the second track of the [LibriBrain competition](https://libribrain.com/) hosted by the [PNPL](pnpl.robots.ox.ac.uk/) at NeurIPS 2025. For a more basic introduction, you might prefer to take a look at the [Speech Detection variant](https://neural-processing-lab.github.io/2025-libribrain-competition/links/speech-colab) first, which includes a more comprehensive introduction and a slightly simpler task.

The following notebook will walk you through
1. setting up all necessary dependencies,
2. downloading training data, and
3. training a minimal model

It is fully functional in the Colab Free Tier, though training will of course be faster with more GPU horsepower. With default settings on a `T4` instance, the main training run should take no more than 45 minutes. If you want to speed up model training, make sure you are on a GPU runtime by clicking Runtime -> Change runtime type. TPU acceleration is currently not supported.

In case of any questions or problems, please get in touch through [our Discord server](https://neural-processing-lab.github.io/2025-libribrain-competition/links/discord).

⚠️ **Note**: We have only comprehensively validated the notebook to work on Colab and Unix. Your experience in other environments (e.g., Windows) may vary.

## Setting up dependencies
Run the code below *as is*. It will download all required dependencies, including our own [PNPL](https://pypi.org/project/pnpl/) package. On Windows, you might have to restart your Kernel after the installation has finished.

In [None]:
# Install additional dependencies
%pip install -q mne_bids lightning torchmetrics scikit-learn plotly ipywidgets pnpl

# Set up base path for dataset and related files (base_path is assumed to be set in the cells below!)
base_path = "./libribrain"
try:
    import google.colab  # This module is only available in Colab.
    in_colab = True
    base_path = "/content"  # This is the folder displayed in the Colab sidebar
except ImportError:
    in_colab = False

## Preparing the dataset
The code below will automatically download the training data.

In [None]:
from pnpl.datasets import LibriBrainPhoneme
from torch.utils.data import DataLoader


train_dataset = LibriBrainPhoneme(
  data_path=f"{base_path}/data/",
  include_run_keys = [("0",str(i),"Sherlock1","1") for i in range(1, 11)],
  tmin=0.0,
  tmax = 0.5,
  preload_files = True
  )

channel_means = train_dataset.channel_means
channel_stds = train_dataset.channel_stds


val_dataset = LibriBrainPhoneme(
  data_path=f"{base_path}/data/",
  include_run_keys = [['0', '11', 'Sherlock1', '2'], ['0', '12', 'Sherlock1', '2']],
  standardize=True,
  tmin=0.0,
  tmax = 0.5,
  preload_files = True
  )


## Signal averaging the training data
While we could now train our model with the above dataloaders, the signal-to-noise ratio of a single datapoint turns out to be very low. Therefore, we average the signals of multiple instances of the same phoneme in the training data to filter out some of the noise. This is called [signal averaging](https://en.wikipedia.org/wiki/Signal_averaging).

In [None]:
from pnpl.datasets import GroupedDataset

averaged_train_dataset = GroupedDataset(train_dataset, grouped_samples = 100)

## Defining the model
This is the model architecture we'll use.

In [None]:
import torch
import lightning as L
from torch import nn
from torchmetrics import F1Score

# Basic LightningModule
class PhonemeClassificationModel(L.LightningModule):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Conv1d(306, 128, 1),
            nn.ReLU(),
            nn.Flatten(),
            nn.Linear(16000, 39)
        )
        self.criterion = nn.CrossEntropyLoss()
        self.f1_macro = F1Score(num_classes=39, average='macro', task="multiclass")
    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = self.criterion(y_hat, y)
        f1_macro = self.f1_macro(y_hat, y)
        self.log('train_loss', loss, prog_bar=True)
        self.log('train_f1_macro', f1_macro)
        return loss

    def validation_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = self.criterion(y_hat, y)
        f1_macro = self.f1_macro(y_hat, y)
        self.log('val_loss', loss)
        self.log('val_f1_macro', f1_macro, prog_bar=True)
        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.model.parameters(), lr=0.0005)

## Actually training
The code below will train the model.

In [None]:
import os
from torch.utils.data import DataLoader
import lightning as L
from lightning.pytorch.loggers import TensorBoardLogger, CSVLogger

# Setup paths for logs and checkpoints
LOG_DIR = f"{base_path}/lightning_logs"
CHECKPOINT_PATH = f"{base_path}/models/phoneme_model.ckpt"

# Minimal logging setup
logger = CSVLogger(
    save_dir=LOG_DIR,
    name="",
    version=None,
)
if in_colab:  # In Colab, we use the built-in Tensorboard setup
    logger = TensorBoardLogger(
        save_dir=LOG_DIR,
        name="",
        version=None,
        default_hp_metric=True
    )
    if not os.path.exists(LOG_DIR):
        os.makedirs(LOG_DIR)
    %load_ext tensorboard
    %tensorboard --logdir $LOG_DIR

# Set a fixed seed for reproducibility
L.seed_everything(42)

# Conditionally set num_workers to avoid multiprocessing issues (try increasing if performance is problematic)
num_workers = 2 if in_colab else 0

# Configure data loaders
train_dataloader = DataLoader(averaged_train_dataset, batch_size=16, shuffle=True, num_workers=num_workers)
val_dataloader = DataLoader(val_dataset, batch_size=16, shuffle=False, num_workers=num_workers)

# Initialize the PhonemeClassificationModel model
model = PhonemeClassificationModel()

# Log Hyperparameters (these will be empty be default!)
logger.log_hyperparams(model.hparams)

# Initialize trainer
trainer = L.Trainer(
    devices="auto",
    max_epochs=15,
    logger=logger,
    enable_checkpointing=True,
)

# Actually train the model
trainer.fit(model, train_dataloader, val_dataloader)

# Save the trained model
trainer.save_checkpoint(CHECKPOINT_PATH)

## Validating our results
Let's look at how our model performs.

In [None]:
from torchmetrics import F1Score

def validate(val_loader, module, labels):
    disp_labels = labels
    module.eval()
    predicted_phonemes = []
    true_phonemes = []

    with torch.no_grad():
        for batch in val_loader:
            x, y = batch
            x = x.to(module.device)
            y = y.to(module.device)
            outputs = module(x)
            preds = torch.argmax(outputs, dim=1)
            predicted_phonemes.extend(preds)
            true_phonemes.extend(y)

    true_phonemes = torch.stack(true_phonemes)
    predicted_phonemes = torch.stack(predicted_phonemes)

    f1_macro = F1Score(task="multiclass", average="macro",
                       num_classes=len(disp_labels)).to(module.device)

    random_preds = torch.randint(
        0, len(disp_labels), (len(true_phonemes),), device=module.device)

    random_f1_macro = f1_macro(
        random_preds, true_phonemes)

    f1_macro = f1_macro(predicted_phonemes, true_phonemes)


    binary_f1 = F1Score(task="binary").to(module.device)

    classes = torch.arange(len(disp_labels))
    f1_by_class = []
    random_f1_by_class = []
    for c in classes:
        class_preds = predicted_phonemes == c
        class_targets = true_phonemes == c
        class_f1 = binary_f1(class_preds, class_targets)
        class_random_preds = random_preds == c
        class_random_f1 = binary_f1(class_random_preds, class_targets)

        f1_by_class.append(class_f1)
        random_f1_by_class.append(class_random_f1)

    # We want to return tensors not lists
    f1_by_class = torch.stack(f1_by_class)
    random_f1_by_class = torch.stack(random_f1_by_class)

    return f1_macro, random_f1_macro, f1_by_class, random_f1_by_class

In [None]:
f1_macro, random_f1_macro, f1_by_class, random_f1_by_class = validate(val_dataloader, model, val_dataset.labels_sorted)
print("F1 Macro for random predictions: ", random_f1_macro)
print("F1 Macro for model predictions: ", f1_macro)

In [None]:
from matplotlib import pyplot as plt

plt.bar(x=(0,1), height=(f1_macro.item(), random_f1_macro.item()), tick_label=("Model", "Random"), color=("salmon", "skyblue"))
plt.title("F1 Macro")
plt.show()

Great! We were able to achieve better-than-chance results in terms of F1-Macro and Balanced Accuracy! Let's look at the results for individual classes and see which phonemes we were able to perform well on.

In [None]:
import numpy as np

def plot_class_specific_scores(scores, random_scores, metric_name, labels, sort=True):

    num_classes = len(labels)


    # If sorting is requested, reorder the bars based on the criteria.
    if sort:
        order = torch.argsort(scores).flip(dims=[0])
    else:
        order = torch.arange(len(scores))

    # Reorder the arrays along the class dimension (axis=1) and update the summary statistics
    scores = scores[order]
    random_scores = random_scores[order]
    labels = [labels[i] for i in order]
    # Positions of the groups on the x-axis
    x = np.arange(num_classes)

    # Width of each bar
    width = 0.35

    # Create a figure and axis
    fig, ax = plt.subplots(figsize=(25, 12))

    # Plot Random scores bars
    bars1 = ax.bar(x - width/2, random_scores, width,
                label='Random', capsize=5, color='skyblue', edgecolor='black')

    # Plot Actual score bars
    bars2 = ax.bar(x + width/2, scores, width,
                label='Model', capsize=5, color='salmon', edgecolor='black')

    # Add labels and title
    ax.set_xlabel('Phonemes', fontsize=16)
    ax.set_ylabel(metric_name, fontsize=16)
    ax.set_title(metric_name + " for each Phoneme", fontsize=20)

    # Set x-axis tick labels
    ax.set_xticks(x)
    ax.set_xticklabels(labels, rotation=90, fontsize=16)

    # Add legend
    ax.legend(fontsize=14)

    # Add grid for better readability
    ax.yaxis.grid(True, linestyle='--', which='major', color='grey', alpha=0.7)

    # Adjust layout to prevent clipping of tick-labels
    plt.tight_layout()

    # Display the plot
    plt.show()

In [None]:
plot_class_specific_scores(scores=f1_by_class, random_scores=random_f1_by_class, metric_name="F1", labels=val_dataset.labels_sorted)

## That's it! 🥳
You've successfully trained a model that significantly outperforms random guessing in phoneme classification from MEG data - congrats! Thanks for taking the time to look at and/or participate in our competition. If you have any open questions, get in touch on [our Discord server](https://neural-processing-lab.github.io/2025-libribrain-competition/links/discord)!
Once you're ready to get your score on the leaderboard, take a look at the [submission tutorial](https://neural-processing-lab.github.io/2025-libribrain-competition/links/submission-colab). You might also want to take another look at the [competition website](https://neural-processing-lab.github.io/2025-libribrain-competition).