# SemanticHearing Incremental Training

This notebook demonstrates how to perform incremental training on the SemanticHearing model for binaural target sound extraction.

## Overview
- Load pre-trained model from the original paper
- Prepare your additional training data
- Fine-tune the model with new data
- Evaluate performance improvements

**Original Paper**: [Semantic Hearing: Programming Acoustic Scenes with Binaural Hearables](https://dl.acm.org/doi/10.1145/3586183.3606779)

**Repository**: https://github.com/sarahv03/SemanticHearing


## 1. Environment Setup

First, let's install the required dependencies and clone the repository.


In [None]:
# Install required packages
!pip install torch torchaudio torchvision --index-url https://download.pytorch.org/whl/cu118
!pip install librosa soundfile scipy matplotlib tqdm numpy pandas
!pip install torchmetrics==0.10.0 seaborn ipykernel scaper
!pip install transformers openl3 youtube_dl bs4 pyroomacoustics
!pip install onnx onnxruntime torch_tb_profiler ffmpegio noisereduce
!pip install tensorflow tensorflow-probability

# Install additional packages for audio processing
!pip install scaper thop==0.1.1.post2209072238
!pip install python-sofa==0.2.0

print("✅ All dependencies installed successfully!")


In [None]:
# Clone the repository
!git clone https://github.com/sarahv03/SemanticHearing.git
%cd SemanticHearing

print("✅ Repository cloned successfully!")


## 2. Download Pre-trained Model and Dataset

Download the pre-trained model checkpoint and the original dataset.


In [None]:
# Create necessary directories
!mkdir -p experiments/dc_waveformer
!mkdir -p data

# Download pre-trained model checkpoint
!wget -P experiments/dc_waveformer https://semantichearing.cs.washington.edu/39.pt

print("✅ Pre-trained model downloaded!")


In [None]:
# Mount Google Drive to access your pre-downloaded dataset
from google.colab import drive
drive.mount('/content/drive')

# Copy the dataset from your Google Drive to the working directory
# Update this path to match where you stored the dataset in your Drive
dataset_path = "/content/drive/MyDrive/BinauralCuratedDataset"  # Adjust this path as needed

# Check if the dataset exists in Drive
import os
if os.path.exists(dataset_path):
    print(f"✅ Found dataset at: {dataset_path}")
    # Create symlink to avoid copying large files
    !ln -s "$dataset_path" data/BinauralCuratedDataset
    print("✅ Dataset linked from Google Drive!")
else:
    print(f"⚠️  Dataset not found at: {dataset_path}")
    print("Please update the dataset_path variable to point to your dataset location in Google Drive")
    print("You can also manually copy the dataset to the data/ directory")


In [None]:
# Helper function to find your dataset in Google Drive
def find_dataset_in_drive():
    """Search for the BinauralCuratedDataset in your Google Drive"""
    import os
    from pathlib import Path
    
    # Common locations where the dataset might be stored
    search_paths = [
        "/content/drive/MyDrive",
        "/content/drive/MyDrive/datasets",
        "/content/drive/MyDrive/data",
        "/content/drive/MyDrive/SemanticHearing",
        "/content/drive/MyDrive/ML_datasets"
    ]
    
    found_paths = []
    
    for base_path in search_paths:
        if os.path.exists(base_path):
            for root, dirs, files in os.walk(base_path):
                if "BinauralCuratedDataset" in dirs:
                    full_path = os.path.join(root, "BinauralCuratedDataset")
                    found_paths.append(full_path)
                    print(f"✅ Found dataset at: {full_path}")
    
    if not found_paths:
        print("❌ BinauralCuratedDataset not found in common locations")
        print("Please check your Google Drive and update the dataset_path variable manually")
    else:
        print(f"\n📋 Found {len(found_paths)} dataset location(s)")
        print("Update the dataset_path variable in the previous cell with one of these paths")
    
    return found_paths

# Run the search
found_datasets = find_dataset_in_drive()


## 3. Prepare Your Additional Data

This section helps you prepare your additional training data. You'll need to:
1. Upload your audio files
2. Create proper data structure
3. Generate labels for your sounds

### Data Structure Requirements
Your additional data should follow this structure:
```
your_data/
├── train/
│   ├── mixture/          # Mixed audio files
│   ├── target/           # Target sound files
│   └── labels/           # Label files (.jams format)
├── val/
│   ├── mixture/
│   ├── target/
│   └── labels/
└── test/
    ├── mixture/
    ├── target/
    └── labels/
```


In [None]:
# Create directory for your additional data
!mkdir -p data/your_additional_data/{train,val,test}/{mixture,target,labels}

print("📁 Created directory structure for your additional data.")
print("\n📋 Next steps:")
print("1. Upload your audio files to the appropriate directories")
print("2. Create label files in .jams format")
print("3. Run the data preparation script below")


In [None]:
# Data preparation script for your additional data
import os
import json
import librosa
import soundfile as sf
import numpy as np
from pathlib import Path
import jams

def create_jams_label(audio_file, target_sound_class, start_time=0.0, end_time=None):
    """
    Create a JAMS label file for your audio data.
    
    Args:
        audio_file: Path to audio file
        target_sound_class: Class of the target sound (e.g., 'speech', 'music', 'bird')
        start_time: Start time of target sound in seconds
        end_time: End time of target sound in seconds (None for full duration)
    """
    # Load audio to get duration
    y, sr = librosa.load(audio_file, sr=None)
    duration = len(y) / sr
    
    if end_time is None:
        end_time = duration
    
    # Create JAMS annotation
    jam = jams.JAMS()
    jam.file_metadata.duration = duration
    
    # Create annotation for target sound
    ann = jams.Annotation(namespace='tag_open')
    ann.append(time=start_time, duration=end_time-start_time, value=target_sound_class, confidence=1.0)
    
    jam.annotations.append(ann)
    
    return jam

def prepare_your_data(data_dir, target_class):
    """
    Prepare your data by creating JAMS labels and organizing files.
    
    Args:
        data_dir: Directory containing your audio files
        target_class: Class name for your target sounds
    """
    data_path = Path(data_dir)
    
    # Process each split
    for split in ['train', 'val', 'test']:
        split_dir = data_path / split
        if not split_dir.exists():
            continue
            
        print(f"Processing {split} split...")
        
        # Create directories if they don't exist
        (split_dir / 'mixture').mkdir(exist_ok=True)
        (split_dir / 'target').mkdir(exist_ok=True)
        (split_dir / 'labels').mkdir(exist_ok=True)
        
        # Process audio files
        audio_files = list(split_dir.glob('*.wav')) + list(split_dir.glob('*.mp3')) + list(split_dir.glob('*.flac'))
        
        for audio_file in audio_files:
            # Move to mixture directory
            mixture_file = split_dir / 'mixture' / audio_file.name
            if not mixture_file.exists():
                audio_file.rename(mixture_file)
            
            # Create target file (copy for now - you may want to extract specific parts)
            target_file = split_dir / 'target' / audio_file.name
            if not target_file.exists():
                import shutil
                shutil.copy2(mixture_file, target_file)
            
            # Create JAMS label
            label_file = split_dir / 'labels' / (audio_file.stem + '.jams')
            if not label_file.exists():
                jam = create_jams_label(mixture_file, target_class)
                jam.save(str(label_file))
        
        print(f"✅ Processed {len(audio_files)} files in {split} split")

# Example usage - modify the paths and target class as needed
print("📝 Data preparation script ready!")
print("\nTo use this script:")
print("1. Upload your audio files to data/your_additional_data/train/, data/your_additional_data/val/, etc.")
print("2. Run: prepare_your_data('data/your_additional_data', 'your_target_class')")
print("3. Replace 'your_target_class' with the actual class name (e.g., 'speech', 'music', 'bird')")


## 4. Create Incremental Training Configuration

Create a new configuration for incremental training that loads the pre-trained model.


In [None]:
import json
import os

# Create incremental training configuration
incremental_config = {
    "model": "src.training.dcc_tf_binaural",
    "base_metric": "scale_invariant_signal_noise_ratio",
    "fix_lr_epochs": 10,  # Reduced for fine-tuning
    "epochs": 30,  # Reduced for fine-tuning
    "batch_size": 8,  # Smaller batch size for fine-tuning
    "eval_batch_size": 32,
    "n_workers": 4,
    "model_params": {
        "L": 32,
        "label_len": 20,
        "model_dim": 256,
        "num_enc_layers": 10,
        "num_dec_layers": 1,
        "dec_buf_len": 13,
        "dec_chunk_size": 13,
        "use_pos_enc": True,
        "conditioning": "mult",
        "out_buf_len": 4,
        "pretrained_path": "experiments/dc_waveformer/39.pt"  # Load pre-trained model
    },
    "train_dataset": "src.training.datasets.curated_binaural_augrir.CuratedBinauralAugRIRDataset",
    "train_data_args": {
        "fg_dir": "data/your_additional_data/scaper_fmt/train",
        "bg_dir": "data/BinauralCuratedDataset/TAU-acoustic-sounds/TAU-urban-acoustic-scenes-2019-development",
        "bg_scaper_dir": "data/BinauralCuratedDataset/bg_scaper_fmt/train",
        "jams_dir": "data/your_additional_data/labels/train",
        "hrtf_dir": "data/BinauralCuratedDataset/hrtf",
        "dset": "train",
        "sr": 44100,
        "resample_rate": None,
        "reverb": True
    },
    "val_dataset": "src.training.datasets.curated_binaural_augrir.CuratedBinauralAugRIRDataset",
    "val_data_args": {
        "fg_dir": "data/your_additional_data/scaper_fmt/val",
        "bg_dir": "data/BinauralCuratedDataset/TAU-acoustic-sounds/TAU-urban-acoustic-scenes-2019-development",
        "bg_scaper_dir": "data/BinauralCuratedDataset/bg_scaper_fmt/val",
        "jams_dir": "data/your_additional_data/labels/val",
        "hrtf_dir": "data/BinauralCuratedDataset/hrtf",
        "dset": "val",
        "sr": 44100,
        "resample_rate": None,
        "reverb": True
    },
    "test_dataset": "src.training.datasets.curated_binaural_augrir.CuratedBinauralAugRIRDataset",
    "test_data_args": {
        "fg_dir": "data/your_additional_data/scaper_fmt/test",
        "bg_dir": "data/BinauralCuratedDataset/TAU-acoustic-sounds/TAU-urban-acoustic-scenes-2019-evaluation",
        "bg_scaper_dir": "data/BinauralCuratedDataset/bg_scaper_fmt/test",
        "jams_dir": "data/your_additional_data/labels/test",
        "hrtf_dir": "data/BinauralCuratedDataset/hrtf",
        "dset": "test",
        "sr": 44100,
        "resample_rate": None,
        "reverb": True
    },
    "optim": {
        "lr": 0.0001,  # Lower learning rate for fine-tuning
        "weight_decay": 1e-5  # Add weight decay for regularization
    },
    "lr_sched": {
        "mode": "max",
        "factor": 0.5,
        "patience": 3,  # More aggressive scheduling for fine-tuning
        "min_lr": 1e-6,
        "threshold": 0.01,
        "threshold_mode": "abs"
    },
    "commit_hash": "incremental_training_v1"
}

# Save the configuration
os.makedirs('experiments/incremental_training', exist_ok=True)
with open('experiments/incremental_training/config.json', 'w') as f:
    json.dump(incremental_config, f, indent=4)

print("✅ Incremental training configuration created!")
print("📁 Saved to: experiments/incremental_training/config.json")


## 5. Run Incremental Training

Now let's run the incremental training with your additional data.


In [None]:
# Check GPU availability
import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")
    print(f"CUDA memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("⚠️  No GPU available. Training will be slow on CPU.")


In [None]:
# Run incremental training
print("🚀 Starting incremental training...")
print("\n📋 Training configuration:")
print("- Model: Pre-trained SemanticHearing model")
print("- Epochs: 30")
print("- Learning rate: 0.0001")
print("- Batch size: 8")
print("\n⏳ This may take several hours depending on your data size...")

# Run training
!python -m src.training.train experiments/incremental_training --use_cuda --start_epoch 0

print("\n✅ Incremental training completed!")


## 6. Evaluate and Test Your Fine-tuned Model

Evaluate the performance and test inference with your fine-tuned model.


In [None]:
# Evaluate the fine-tuned model
print("📊 Evaluating fine-tuned model...")

!python -m src.training.eval experiments/incremental_training --use_cuda

print("\n✅ Evaluation completed!")


In [None]:
# Save your fine-tuned model to Google Drive
import shutil
from datetime import datetime

def save_to_drive():
    """Save the fine-tuned model and results to Google Drive"""
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    
    # Create results directory in Drive
    results_dir = f"/content/drive/MyDrive/SemanticHearing_Results_{timestamp}"
    os.makedirs(results_dir, exist_ok=True)
    
    # Copy model checkpoints
    if os.path.exists('experiments/incremental_training'):
        shutil.copytree('experiments/incremental_training', f"{results_dir}/incremental_training")
        print(f"✅ Model saved to: {results_dir}")
    
    return results_dir

# Save results
results_path = save_to_drive()
print(f"\n📦 Your fine-tuned model has been saved to Google Drive!")
print(f"📁 Location: {results_path}")
print("\n💡 You can now download or use this model for inference!")
