# Modular Forward Modeling

This notebook includes experiments that try to modularly model the forward process of speech data.

## Hyperparameters

| Hyperparameter | Value |
|----------------|-------|
| Learning Rate | 0.001 |
| Batch Size | 128 |
| Episodes | 15 - 20 |

## Results (Task 1 - All Levels)

| Model | Test Loss (10%) |
|-------|-----------|
| Identity | 0.0004266352625563741 |
| Lineator (n=5) | 0.00005307496394379996 |
| Resonator (n_fft=128) | 0.000027490743377711624 |
| Resonator (n_fft=256) | 0.000023232365492731333 |
| Lineator + Resonator | 0.000023333996068686247 |
| Resonator + Lineator | 0.000024605096768937074 |
| Lin b4 ResNet + Res | 0.00002246881922474131 |
| Conv (kernel=100) | 0.00008288798562716693 |

## Results (Task 1 - Level 1)

| Model | Test Loss (10%) |
|-------|-----------|
| Identity | 0.00041192656 |
| Resonator (n_fft=256) | 0.00005838978358951863 |
| Resonator (n_fft=512) | 0.000052855208195978776 |
| Three Lin + Res layers | 0.000413885893067345 |


## Findings

- Convolutions are equivalent to the diagonal of the Resonator matrix

In [26]:
import os
from typing import List, Tuple

import torch
from torch import nn

from src.data.dataloader import get_loader
from src.data.dataset import SpeechData
from src.data.files import create_dir, write_model
from src.data.filesampler import sample_filepaths
from src.networks.modular import Model
from src.util.consts import LOG_INTERVAL
from src.util.device import set_device
from src.util.logger import Logger
from src.util.signals import load_file_pair

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [27]:
os.environ['CUDA_VISIBLE_DEVICES'] = "3"
device = set_device()

Using device: CPU


In [28]:
tasks = ["Task_1_Level_1"]
hyperparameters = {
    "lr": 0.001,
    "num_episodes": 150,
    "batch_size": 128,
    "n_fft": 512,
    "n_lin_functions": 5,
    "kernel_size": 100,
    "num_matrices": 0,
}

val_paths = sample_filepaths(tasks=tasks, sample_rate=0.1, split_into=1)
val_signals = [load_file_pair(path) for path in val_paths]

In [29]:
dataset = SpeechData(
    tasks=tasks, ignore_paths=val_paths, return_chunks=False
)
train_loader = get_loader(dataset, batch_size=hyperparameters["batch_size"], device=device)

Initialized dataset with 540 files from 1 tasks


In [30]:
model = Model(
    device=device,
    n_lin_functions=hyperparameters["n_lin_functions"],
    n_fft=hyperparameters["n_fft"],
    num_matrices=hyperparameters["num_matrices"],
)
# model.load_state_dict(torch.load("../forward_model.pt", map_location=device, weights_only=True))

optimizer = torch.optim.Adam(model.parameters(), lr=hyperparameters["lr"])
reconstruction_loss = torch.nn.MSELoss()

if torch.cuda.is_available() and False:
    model = torch.compile(model)

In [31]:
def test(model: nn.Module, val_chunks: List[Tuple[torch.Tensor, torch.Tensor]]) -> float:
    loss = 0
    for x, y in val_chunks:
        y_pred = model(x.unsqueeze(0))
        loss += reconstruction_loss(y_pred, y.unsqueeze(0))
    return loss / len(val_chunks)

In [None]:
logger = Logger(
    tasks=tasks,
    hyperparameters=hyperparameters,
    tags=["modular", "lineator", "resonator"],
)

for episode in range(hyperparameters["num_episodes"]):
    for i, (clean_files, recorded_files, _) in enumerate(train_loader):
        clean_files = clean_files.to(device, non_blocking=True).requires_grad_(True)
        recorded_files = recorded_files.to(device, non_blocking=True)

        optimizer.zero_grad(set_to_none=True)
        output_file = model(clean_files)

        recon_loss = reconstruction_loss(output_file, recorded_files)
        recon_loss.backward()
        optimizer.step()

        chunk_recon_loss, file_recon_loss = None, None
        if i % LOG_INTERVAL == 0:
            recon_loss = test(model, val_signals)

            logger.log_metrics(
                file_recon_loss=recon_loss,
                episode=episode,
                iteration=i,
                lr=hyperparameters["lr"],
            )

logger.finish()

Starting run 2024-12-03_18-18-20
Weights and Biases logging is disabled!
----------------------------------------------
episode: 0
iteration: 0
lr: 0.001


KeyboardInterrupt: 

In [None]:
create_dir(directory=logger.run_name)
write_model(model=model, run_name=logger.run_name, model_name="forward_model")

In [None]:
print(test(None, val_signals).numpy())

0.00041192656
