# SemanticHearing Incremental Training

This notebook demonstrates how to perform incremental training on the SemanticHearing model for binaural target sound extraction.

## Overview
- Load pre-trained model from the original paper
- Prepare your additional training data
- Fine-tune the model with new data
- Evaluate performance improvements

**Original Paper**: [Semantic Hearing: Programming Acoustic Scenes with Binaural Hearables](https://dl.acm.org/doi/10.1145/3586183.3606779)

**Repository**: https://github.com/sarahv03/SemanticHearing


## 1. Environment Setup

First, let's install the required dependencies and clone the repository.


In [None]:
# Install required packages
!pip install torch torchaudio torchvision --index-url https://download.pytorch.org/whl/cu118
!pip install librosa soundfile scipy matplotlib tqdm numpy pandas
!pip install torchmetrics==0.10.0 seaborn ipykernel scaper
!pip install transformers openl3 youtube_dl bs4 pyroomacoustics
!pip install onnx onnxruntime torch_tb_profiler ffmpegio noisereduce
!pip install tensorflow tensorflow-probability

# Install additional packages for audio processing
!pip install scaper thop==0.1.1.post2209072238
!pip install python-sofa==0.2.0

print("✅ All dependencies installed successfully!")


In [None]:
# Clone the repository
!git clone https://github.com/sarahv03/SemanticHearing.git
%cd SemanticHearing

print("✅ Repository cloned successfully!")


## 2. Download Pre-trained Model and Dataset

Download the pre-trained model checkpoint and the original dataset.


In [None]:
# Create necessary directories
!mkdir -p experiments/dc_waveformer
!mkdir -p data

# Download pre-trained model checkpoint
!wget -P experiments/dc_waveformer https://semantichearing.cs.washington.edu/39.pt

print("✅ Pre-trained model downloaded!")


In [None]:
# Mount Google Drive and set up Google Cloud Storage access
from google.colab import drive, auth
from google.cloud import storage
import os

# Mount Google Drive
drive.mount('/content/drive')

# Authenticate with Google Cloud
auth.authenticate_user()

# Set up Google Cloud Storage client
client = storage.Client()

# Your bucket and dataset paths
BUCKET_NAME = "misophones_training_dataset"
DATASET_PATH = "FOAMS_dataset/FOAMS_processed_audio"
BINAURAL_DATASET_PATH = "BinauralCuratedDataset"  # Original dataset path in your bucket

print(f"✅ Google Cloud Storage client initialized")
print(f"📦 Bucket: {BUCKET_NAME}")
print(f"📁 Additional data path: {DATASET_PATH}")
print(f"📁 Original dataset path: {BINAURAL_DATASET_PATH}")


In [None]:
# Download datasets from Google Cloud Storage
def download_from_gcs(bucket_name, source_path, local_path):
    """Download files from Google Cloud Storage to local directory"""
    bucket = client.bucket(bucket_name)
    blobs = bucket.list_blobs(prefix=source_path)
    
    os.makedirs(local_path, exist_ok=True)
    downloaded_files = 0
    
    for blob in blobs:
        # Skip directories
        if blob.name.endswith('/'):
            continue
            
        # Create local file path
        local_file_path = os.path.join(local_path, blob.name.replace(source_path + '/', ''))
        local_dir = os.path.dirname(local_file_path)
        os.makedirs(local_dir, exist_ok=True)
        
        # Download file
        blob.download_to_filename(local_file_path)
        downloaded_files += 1
        
        if downloaded_files % 100 == 0:
            print(f"Downloaded {downloaded_files} files...")
    
    print(f"✅ Downloaded {downloaded_files} files from {source_path}")
    return downloaded_files

# Download the original BinauralCuratedDataset
print("📥 Downloading original BinauralCuratedDataset from Google Cloud Storage...")
download_from_gcs(BUCKET_NAME, BINAURAL_DATASET_PATH, "data/BinauralCuratedDataset")

# Download your additional FOAMS dataset
print("📥 Downloading FOAMS dataset from Google Cloud Storage...")
download_from_gcs(BUCKET_NAME, DATASET_PATH, "data/your_additional_data")

print("✅ All datasets downloaded successfully!")


## 3. Prepare Your Additional Data

This section helps you prepare your additional training data. You'll need to:
1. Upload your audio files
2. Create proper data structure
3. Generate labels for your sounds

### Data Structure Requirements
Your additional data should follow this structure:
```
your_data/
├── train/
│   ├── mixture/          # Mixed audio files
│   ├── target/           # Target sound files
│   └── labels/           # Label files (.jams format)
├── val/
│   ├── mixture/
│   ├── target/
│   └── labels/
└── test/
    ├── mixture/
    ├── target/
    └── labels/
```


In [None]:
# Explore your FOAMS dataset structure
def explore_dataset_structure(dataset_path):
    """Explore the structure of your FOAMS dataset"""
    import os
    from pathlib import Path
    
    print(f"🔍 Exploring dataset structure at: {dataset_path}")
    
    if not os.path.exists(dataset_path):
        print(f"❌ Dataset path not found: {dataset_path}")
        return
    
    # Walk through the directory structure
    for root, dirs, files in os.walk(dataset_path):
        level = root.replace(dataset_path, '').count(os.sep)
        indent = ' ' * 2 * level
        print(f"{indent}{os.path.basename(root)}/")
        
        # Show some files in each directory
        subindent = ' ' * 2 * (level + 1)
        for file in files[:5]:  # Show first 5 files
            print(f"{subindent}{file}")
        if len(files) > 5:
            print(f"{subindent}... and {len(files) - 5} more files")

# Explore the downloaded dataset
explore_dataset_structure("data/your_additional_data")

print("\n📋 Next steps:")
print("1. Review the dataset structure above")
print("2. Run the data preparation script to organize it for training")
print("3. The script will create the proper train/val/test splits")


In [None]:
# Create directory for your additional data
!mkdir -p data/your_additional_data/{train,val,test}/{mixture,target,labels}

print("📁 Created directory structure for your additional data.")
print("\n📋 Next steps:")
print("1. Upload your audio files to the appropriate directories")
print("2. Create label files in .jams format")
print("3. Run the data preparation script below")


In [None]:
# Updated data preparation function for FOAMS dataset
def prepare_foams_data(data_dir, target_class="speech"):
    """
    Prepare your FOAMS dataset for training by creating proper train/val/test splits.
    
    Args:
        data_dir: Directory containing your FOAMS audio files
        target_class: Class name for your target sounds (default: "speech")
    """
    import random
    import shutil
    from pathlib import Path
    
    data_path = Path(data_dir)
    
    # Find all audio files recursively
    audio_extensions = ['*.wav', '*.mp3', '*.flac', '*.m4a']
    all_audio_files = []
    
    for ext in audio_extensions:
        all_audio_files.extend(data_path.rglob(ext))
    
    print(f"Found {len(all_audio_files)} audio files")
    
    # Shuffle and split the data
    random.shuffle(all_audio_files)
    
    # Split: 80% train, 10% val, 10% test
    train_size = int(0.8 * len(all_audio_files))
    val_size = int(0.1 * len(all_audio_files))
    
    train_files = all_audio_files[:train_size]
    val_files = all_audio_files[train_size:train_size + val_size]
    test_files = all_audio_files[train_size + val_size:]
    
    print(f"Split: {len(train_files)} train, {len(val_files)} val, {len(test_files)} test")
    
    # Process each split
    for split_name, files in [('train', train_files), ('val', val_files), ('test', test_files)]:
        print(f"\nProcessing {split_name} split...")
        
        # Create directories
        split_dir = data_path / split_name
        (split_dir / 'mixture').mkdir(parents=True, exist_ok=True)
        (split_dir / 'target').mkdir(parents=True, exist_ok=True)
        (split_dir / 'labels').mkdir(parents=True, exist_ok=True)
        
        for i, audio_file in enumerate(files):
            # Create new filename with index
            new_name = f"{i:06d}{audio_file.suffix}"
            
            # Copy to mixture directory
            mixture_file = split_dir / 'mixture' / new_name
            shutil.copy2(audio_file, mixture_file)
            
            # Copy to target directory (same file for now)
            target_file = split_dir / 'target' / new_name
            shutil.copy2(audio_file, target_file)
            
            # Create JAMS label
            label_file = split_dir / 'labels' / f"{i:06d}.jams"
            jam = create_jams_label(mixture_file, target_class)
            jam.save(str(label_file))
        
        print(f"✅ Processed {len(files)} files in {split_name} split")
    
    print("\n🎉 FOAMS dataset preparation completed!")
    print("Your data is now organized in the format expected by the training pipeline.")

# Run the data preparation
print("🚀 Preparing FOAMS dataset for training...")
prepare_foams_data("data/your_additional_data", target_class="speech")


In [None]:
# Convert spreadsheet labels to JAMS format for incremental training
import pandas as pd
import json
import os
from pathlib import Path
import librosa
import jams

def convert_spreadsheet_to_jams(spreadsheet_path, audio_dir, output_dir, target_class="speech"):
    """
    Convert spreadsheet with labels to JAMS format for SemanticHearing training.
    
    Required spreadsheet columns:
    - filename: Name of the audio file
    - start_time: Start time of the target sound (seconds)
    - end_time: End time of the target sound (seconds)
    - label: Target sound class (must be in predefined list)
    
    Predefined sound classes:
    "alarm_clock", "baby_cry", "birds_chirping", "cat", "car_horn", 
    "cock_a_doodle_doo", "cricket", "computer_typing", 
    "dog", "glass_breaking", "gunshot", "hammer", "music", 
    "ocean", "door_knock", "singing", "siren", "speech", 
    "thunderstorm", "toilet_flush"
    
    Args:
        spreadsheet_path: Path to your CSV/Excel file
        audio_dir: Directory containing the audio files
        output_dir: Directory to save organized data
        target_class: Default class name if not specified in spreadsheet
    """
    
    # Read the spreadsheet
    if spreadsheet_path.endswith('.csv'):
        df = pd.read_csv(spreadsheet_path)
    elif spreadsheet_path.endswith(('.xlsx', '.xls')):
        df = pd.read_excel(spreadsheet_path)
    else:
        raise ValueError("Unsupported file format. Use CSV or Excel files.")
    
    print(f"📊 Loaded {len(df)} entries from spreadsheet")
    print(f"📁 Audio directory: {audio_dir}")
    print(f"📁 Output directory: {output_dir}")
    
    # Create output directories
    for split in ['train', 'val', 'test']:
        for subdir in ['mixture', 'target', 'labels']:
            os.makedirs(os.path.join(output_dir, split, subdir), exist_ok=True)
    
    # Split data: 80% train, 10% val, 10% test
    df = df.sample(frac=1).reset_index(drop=True)  # Shuffle
    train_size = int(0.8 * len(df))
    val_size = int(0.1 * len(df))
    
    splits = {
        'train': df[:train_size],
        'val': df[train_size:train_size + val_size],
        'test': df[train_size + val_size:]
    }
    
    print(f"📊 Data split: {len(splits['train'])} train, {len(splits['val'])} val, {len(splits['test'])} test")
    
    # Process each split
    for split_name, split_df in splits.items():
        print(f"\n🔄 Processing {split_name} split...")
        
        for idx, row in split_df.iterrows():
            try:
                # Get file info
                filename = row['filename']
                start_time = float(row['start_time'])
                end_time = float(row['end_time'])
                label = row.get('label', target_class)
                
                # Validate label is in predefined list
                predefined_labels = [
                    "alarm_clock", "baby_cry", "birds_chirping", "cat", "car_horn", 
                    "cock_a_doodle_doo", "cricket", "computer_typing", 
                    "dog", "glass_breaking", "gunshot", "hammer", "music", 
                    "ocean", "door_knock", "singing", "siren", "speech", 
                    "thunderstorm", "toilet_flush"
                ]
                
                if label not in predefined_labels:
                    print(f"⚠️  Warning: '{label}' not in predefined list. Using '{target_class}' instead.")
                    label = target_class
                
                # Find the audio file
                audio_file = None
                for ext in ['.wav', '.mp3', '.flac', '.m4a']:
                    potential_file = os.path.join(audio_dir, filename + ext)
                    if os.path.exists(potential_file):
                        audio_file = potential_file
                        break
                
                if not audio_file:
                    print(f"⚠️  Audio file not found: {filename}")
                    continue
                
                # Load audio to get duration
                y, sr = librosa.load(audio_file, sr=None)
                duration = len(y) / sr
                
                # Create new filename with index
                new_filename = f"{idx:06d}{os.path.splitext(audio_file)[1]}"
                
                # Copy to mixture directory
                mixture_file = os.path.join(output_dir, split_name, 'mixture', new_filename)
                import shutil
                shutil.copy2(audio_file, mixture_file)
                
                # Extract target segment and save to target directory
                target_file = os.path.join(output_dir, split_name, 'target', new_filename)
                start_sample = int(start_time * sr)
                end_sample = int(end_time * sr)
                target_audio = y[start_sample:end_sample]
                
                # Ensure stereo
                if target_audio.ndim == 1:
                    target_audio = np.stack([target_audio, target_audio], axis=0)
                elif target_audio.shape[0] == 1:
                    target_audio = np.repeat(target_audio, 2, axis=0)
                
                import soundfile as sf
                sf.write(target_file, target_audio.T, sr)
                
                # Create JAMS label file
                jam = jams.JAMS()
                jam.file_metadata.duration = duration
                
                # Create annotation for the target sound
                target_ann = jams.Annotation(namespace='tag_open')
                target_ann.append(
                    time=start_time,
                    duration=end_time - start_time,
                    value=label,
                    confidence=1.0
                )
                jam.annotations.append(target_ann)
                
                # Save JAMS file
                jams_file = os.path.join(output_dir, split_name, 'labels', f"{idx:06d}.jams")
                jam.save(jams_file)
                
            except Exception as e:
                print(f"❌ Error processing {filename}: {str(e)}")
                continue
        
        print(f"✅ Processed {len(split_df)} files in {split_name} split")
    
    print(f"\n🎉 Conversion completed!")
    print(f"📁 Organized data saved to: {output_dir}")
    print(f"📊 Ready for incremental training!")

# Example usage - update these paths for your data
print("📝 Spreadsheet to JAMS conversion script ready!")
print("\nTo use this script:")
print("1. Prepare your spreadsheet with columns: filename, start_time, end_time, label")
print("2. Update the paths below:")
print("3. Run: convert_spreadsheet_to_jams('your_labels.csv', 'audio_directory', 'output_directory')")
print("\nExample:")
print("convert_spreadsheet_to_jams('labels.csv', 'data/your_additional_data', 'data/foams_organized', 'chewing')")


## Spreadsheet Format Requirements

Based on the SemanticHearing codebase, here's the correct format:

### Required Columns:
| Column Name | Description | Example |
|-------------|-------------|---------|
| `filename` | Audio file name (without extension) | `audio_001` |
| `start_time` | Start time of target sound (seconds) | `2.3` |
| `end_time` | End time of target sound (seconds) | `5.4` |
| `label` | Target sound class (must be in predefined list) | `speech` |

### Background vs Foreground:
- **Foreground (target)**: Sounds you want to extract (must be in the 20 predefined classes)
- **Background (non-target)**: Everything else that should be filtered out
- **The model determines foreground/background by the label type**, not separate columns

### Predefined Sound Classes (from the codebase):
```
"alarm_clock", "baby_cry", "birds_chirping", "cat", "car_horn", 
"cock_a_doodle_doo", "cricket", "computer_typing", 
"dog", "glass_breaking", "gunshot", "hammer", "music", 
"ocean", "door_knock", "singing", "siren", "speech", 
"thunderstorm", "toilet_flush"
```

### Example Spreadsheet:
```csv
filename,start_time,end_time,label
audio_001,2.3,5.4,speech
audio_002,0.5,3.2,music
audio_003,1.1,4.7,speech
audio_004,2.0,4.0,computer_typing
```

### For Misophonia (Chewing Sounds):
Since "chewing" is not in the predefined list, you have two options:
1. **Use "speech"** as the closest category
2. **Add "chewing" to the model's label list** (requires code modification)

### Supported File Formats:
- **CSV files** (`.csv`)
- **Excel files** (`.xlsx`, `.xls`)

### What the Script Does:
1. **Reads your spreadsheet** with timing information
2. **Finds corresponding audio files** (supports .wav, .mp3, .flac, .m4a)
3. **Extracts target segments** based on start/end times
4. **Creates proper directory structure** for training
5. **Generates JAMS label files** in the required format
6. **Splits data** into train/val/test (80/10/10)


In [None]:
# Create example spreadsheet with background noise information
import pandas as pd

def create_example_spreadsheet():
    """Create an example spreadsheet showing the correct format for SemanticHearing"""
    
    # Example data using predefined sound classes
    data = {
        'filename': ['audio_001', 'audio_002', 'audio_003', 'audio_004', 'audio_005'],
        'start_time': [2.3, 0.5, 1.1, 3.2, 0.8],
        'end_time': [5.4, 3.2, 4.7, 6.1, 2.9],
        'label': ['speech', 'speech', 'speech', 'computer_typing', 'speech']
    }
    
    df = pd.DataFrame(data)
    
    # Save as CSV
    df.to_csv('example_labels_with_background.csv', index=False)
    
    print("📊 Example spreadsheet created: example_labels_with_background.csv")
    print("\n📋 This shows the recommended format with background noise information:")
    print(df.to_string(index=False))
    
    return df

# Create the example
example_df = create_example_spreadsheet()

print("\n💡 Key points about the correct format:")
print("• Use only predefined sound classes from the SemanticHearing model")
print("• Foreground vs background is determined by the label type")
print("• For misophonia (chewing), use 'speech' as the closest category")
print("• The model will learn to extract target sounds and filter everything else")
print("\n🎯 This format helps the model:")
print("• Learn to distinguish between target and non-target sounds")
print("• Focus on the specific sounds you want to extract")
print("• Work with the existing model architecture")


## Why Include Background Noise Information?

### **SemanticHearing Model Architecture:**
The SemanticHearing model is specifically designed to:
- **Extract target sounds** (foreground) from mixed audio
- **Suppress background noise** while preserving spatial cues
- **Learn the distinction** between what to keep vs. what to filter

### **Benefits of Background Noise Labels:**

1. **Better Target Extraction** 🎯
   - Model learns to focus on specific sounds (chewing)
   - Reduces false positives from background noise
   - Improves precision in noisy environments

2. **Improved Generalization** 🌍
   - Trains on diverse acoustic environments
   - Learns to handle different background types
   - Better performance in real-world scenarios

3. **Spatial Audio Preservation** 🎧
   - Maintains binaural cues for target sounds
   - Filters background while preserving directionality
   - Essential for misophonia applications

4. **Training Efficiency** ⚡
   - Model learns faster with explicit background labels
   - Better convergence during incremental training
   - More robust to different noise levels

### **Background Noise Types to Include:**
- **Traffic** (cars, trucks, motorcycles)
- **Restaurant** (chatter, dishes, ambient noise)
- **Office** (keyboard typing, air conditioning, conversations)
- **Music** (background music, radio)
- **Nature** (wind, rain, birds)
- **Home** (appliances, TV, family sounds)

### **Intensity Guidelines:**
- **0.0-0.3**: Quiet background (library, quiet room)
- **0.3-0.6**: Moderate background (cafe, office)
- **0.6-0.8**: Loud background (restaurant, traffic)
- **0.8-1.0**: Very loud background (construction, loud music)


In [None]:
# Example: Convert your spreadsheet labels to JAMS format
# Update these paths to match your data

# Path to your spreadsheet (upload to Colab or place in Google Drive)
spreadsheet_path = "labels.csv"  # or "labels.xlsx"

# Path to your audio files (downloaded from Google Cloud Storage)
audio_directory = "data/your_additional_data"

# Output directory for organized data
output_directory = "data/foams_organized"

# Target sound class (e.g., "chewing", "speech", "music")
target_class = "chewing"

print("📋 Ready to convert your spreadsheet labels!")
print(f"📊 Spreadsheet: {spreadsheet_path}")
print(f"🎵 Audio directory: {audio_directory}")
print(f"📁 Output directory: {output_directory}")
print(f"🎯 Target class: {target_class}")

# Uncomment the line below to run the conversion
# convert_spreadsheet_to_jams(spreadsheet_path, audio_directory, output_directory, target_class)

print("\n💡 To run the conversion:")
print("1. Upload your spreadsheet to Colab")
print("2. Update the paths above")
print("3. Uncomment the convert_spreadsheet_to_jams() line")
print("4. Run this cell")


In [None]:
# Data preparation script for your additional data
import os
import json
import librosa
import soundfile as sf
import numpy as np
from pathlib import Path
import jams

def create_jams_label(audio_file, target_sound_class, start_time=0.0, end_time=None):
    """
    Create a JAMS label file for your audio data.
    
    Args:
        audio_file: Path to audio file
        target_sound_class: Class of the target sound (e.g., 'speech', 'music', 'bird')
        start_time: Start time of target sound in seconds
        end_time: End time of target sound in seconds (None for full duration)
    """
    # Load audio to get duration
    y, sr = librosa.load(audio_file, sr=None)
    duration = len(y) / sr
    
    if end_time is None:
        end_time = duration
    
    # Create JAMS annotation
    jam = jams.JAMS()
    jam.file_metadata.duration = duration
    
    # Create annotation for target sound
    ann = jams.Annotation(namespace='tag_open')
    ann.append(time=start_time, duration=end_time-start_time, value=target_sound_class, confidence=1.0)
    
    jam.annotations.append(ann)
    
    return jam

def prepare_your_data(data_dir, target_class):
    """
    Prepare your data by creating JAMS labels and organizing files.
    
    Args:
        data_dir: Directory containing your audio files
        target_class: Class name for your target sounds
    """
    data_path = Path(data_dir)
    
    # Process each split
    for split in ['train', 'val', 'test']:
        split_dir = data_path / split
        if not split_dir.exists():
            continue
            
        print(f"Processing {split} split...")
        
        # Create directories if they don't exist
        (split_dir / 'mixture').mkdir(exist_ok=True)
        (split_dir / 'target').mkdir(exist_ok=True)
        (split_dir / 'labels').mkdir(exist_ok=True)
        
        # Process audio files
        audio_files = list(split_dir.glob('*.wav')) + list(split_dir.glob('*.mp3')) + list(split_dir.glob('*.flac'))
        
        for audio_file in audio_files:
            # Move to mixture directory
            mixture_file = split_dir / 'mixture' / audio_file.name
            if not mixture_file.exists():
                audio_file.rename(mixture_file)
            
            # Create target file (copy for now - you may want to extract specific parts)
            target_file = split_dir / 'target' / audio_file.name
            if not target_file.exists():
                import shutil
                shutil.copy2(mixture_file, target_file)
            
            # Create JAMS label
            label_file = split_dir / 'labels' / (audio_file.stem + '.jams')
            if not label_file.exists():
                jam = create_jams_label(mixture_file, target_class)
                jam.save(str(label_file))
        
        print(f"✅ Processed {len(audio_files)} files in {split} split")

# Example usage - modify the paths and target class as needed
print("📝 Data preparation script ready!")
print("\nTo use this script:")
print("1. Upload your audio files to data/your_additional_data/train/, data/your_additional_data/val/, etc.")
print("2. Run: prepare_your_data('data/your_additional_data', 'your_target_class')")
print("3. Replace 'your_target_class' with the actual class name (e.g., 'speech', 'music', 'bird')")


In [None]:
# Create training configuration for organized spreadsheet data
import json
import os

# Configuration for data organized from spreadsheet labels
organized_config = {
    "model": "src.training.dcc_tf_binaural",
    "base_metric": "scale_invariant_signal_noise_ratio",
    "fix_lr_epochs": 10,
    "epochs": 30,
    "batch_size": 8,
    "eval_batch_size": 32,
    "n_workers": 4,
    "model_params": {
        "L": 32,
        "label_len": 20,
        "model_dim": 256,
        "num_enc_layers": 10,
        "num_dec_layers": 1,
        "dec_buf_len": 13,
        "dec_chunk_size": 13,
        "use_pos_enc": True,
        "conditioning": "mult",
        "out_buf_len": 4,
        "pretrained_path": "experiments/dc_waveformer/39.pt"
    },
    "train_dataset": "src.training.datasets.curated_binaural_augrir.CuratedBinauralAugRIRDataset",
    "train_data_args": {
        "fg_dir": "data/foams_organized/train",
        "bg_dir": "data/BinauralCuratedDataset/TAU-acoustic-sounds/TAU-urban-acoustic-scenes-2019-development",
        "bg_scaper_dir": "data/BinauralCuratedDataset/bg_scaper_fmt/train",
        "jams_dir": "data/foams_organized/train/labels",
        "hrtf_dir": "data/BinauralCuratedDataset/hrtf",
        "dset": "train",
        "sr": 44100,
        "resample_rate": None,
        "reverb": True
    },
    "val_dataset": "src.training.datasets.curated_binaural_augrir.CuratedBinauralAugRIRDataset",
    "val_data_args": {
        "fg_dir": "data/foams_organized/val",
        "bg_dir": "data/BinauralCuratedDataset/TAU-acoustic-sounds/TAU-urban-acoustic-scenes-2019-development",
        "bg_scaper_dir": "data/BinauralCuratedDataset/bg_scaper_fmt/val",
        "jams_dir": "data/foams_organized/val/labels",
        "hrtf_dir": "data/BinauralCuratedDataset/hrtf",
        "dset": "val",
        "sr": 44100,
        "resample_rate": None,
        "reverb": True
    },
    "test_dataset": "src.training.datasets.curated_binaural_augrir.CuratedBinauralAugRIRDataset",
    "test_data_args": {
        "fg_dir": "data/foams_organized/test",
        "bg_dir": "data/BinauralCuratedDataset/TAU-acoustic-sounds/TAU-urban-acoustic-scenes-2019-evaluation",
        "bg_scaper_dir": "data/BinauralCuratedDataset/bg_scaper_fmt/test",
        "jams_dir": "data/foams_organized/test/labels",
        "hrtf_dir": "data/BinauralCuratedDataset/hrtf",
        "dset": "test",
        "sr": 44100,
        "resample_rate": None,
        "reverb": True
    },
    "optim": {
        "lr": 0.0001,
        "weight_decay": 1e-5
    },
    "lr_sched": {
        "mode": "max",
        "factor": 0.5,
        "patience": 3,
        "min_lr": 1e-6,
        "threshold": 0.01,
        "threshold_mode": "abs"
    },
    "commit_hash": "organized_spreadsheet_training_v1"
}

# Save the configuration
os.makedirs('experiments/organized_training', exist_ok=True)
with open('experiments/organized_training/config.json', 'w') as f:
    json.dump(organized_config, f, indent=4)

print("✅ Configuration for organized spreadsheet data created!")
print("📁 Saved to: experiments/organized_training/config.json")
print("\n🎯 This configuration uses data organized from your spreadsheet labels")


## 4. Create Incremental Training Configuration

Create a new configuration for incremental training that loads the pre-trained model.


In [None]:
# Create updated configuration for FOAMS dataset
import json
import os

# Create incremental training configuration optimized for FOAMS dataset
foams_incremental_config = {
    "model": "src.training.dcc_tf_binaural",
    "base_metric": "scale_invariant_signal_noise_ratio",
    "fix_lr_epochs": 10,  # Reduced for fine-tuning
    "epochs": 30,  # Reduced for fine-tuning
    "batch_size": 8,  # Smaller batch size for fine-tuning
    "eval_batch_size": 32,
    "n_workers": 4,
    "model_params": {
        "L": 32,
        "label_len": 20,
        "model_dim": 256,
        "num_enc_layers": 10,
        "num_dec_layers": 1,
        "dec_buf_len": 13,
        "dec_chunk_size": 13,
        "use_pos_enc": True,
        "conditioning": "mult",
        "out_buf_len": 4,
        "pretrained_path": "experiments/dc_waveformer/39.pt"  # Load pre-trained model
    },
    "train_dataset": "src.training.datasets.curated_binaural_augrir.CuratedBinauralAugRIRDataset",
    "train_data_args": {
        "fg_dir": "data/your_additional_data/train",
        "bg_dir": "data/BinauralCuratedDataset/TAU-acoustic-sounds/TAU-urban-acoustic-scenes-2019-development",
        "bg_scaper_dir": "data/BinauralCuratedDataset/bg_scaper_fmt/train",
        "jams_dir": "data/your_additional_data/train/labels",
        "hrtf_dir": "data/BinauralCuratedDataset/hrtf",
        "dset": "train",
        "sr": 44100,
        "resample_rate": None,
        "reverb": True
    },
    "val_dataset": "src.training.datasets.curated_binaural_augrir.CuratedBinauralAugRIRDataset",
    "val_data_args": {
        "fg_dir": "data/your_additional_data/val",
        "bg_dir": "data/BinauralCuratedDataset/TAU-acoustic-sounds/TAU-urban-acoustic-scenes-2019-development",
        "bg_scaper_dir": "data/BinauralCuratedDataset/bg_scaper_fmt/val",
        "jams_dir": "data/your_additional_data/val/labels",
        "hrtf_dir": "data/BinauralCuratedDataset/hrtf",
        "dset": "val",
        "sr": 44100,
        "resample_rate": None,
        "reverb": True
    },
    "test_dataset": "src.training.datasets.curated_binaural_augrir.CuratedBinauralAugRIRDataset",
    "test_data_args": {
        "fg_dir": "data/your_additional_data/test",
        "bg_dir": "data/BinauralCuratedDataset/TAU-acoustic-sounds/TAU-urban-acoustic-scenes-2019-evaluation",
        "bg_scaper_dir": "data/BinauralCuratedDataset/bg_scaper_fmt/test",
        "jams_dir": "data/your_additional_data/test/labels",
        "hrtf_dir": "data/BinauralCuratedDataset/hrtf",
        "dset": "test",
        "sr": 44100,
        "resample_rate": None,
        "reverb": True
    },
    "optim": {
        "lr": 0.0001,  # Lower learning rate for fine-tuning
        "weight_decay": 1e-5  # Add weight decay for regularization
    },
    "lr_sched": {
        "mode": "max",
        "factor": 0.5,
        "patience": 3,  # More aggressive scheduling for fine-tuning
        "min_lr": 1e-6,
        "threshold": 0.01,
        "threshold_mode": "abs"
    },
    "commit_hash": "foams_incremental_training_v1"
}

# Save the configuration
os.makedirs('experiments/foams_incremental_training', exist_ok=True)
with open('experiments/foams_incremental_training/config.json', 'w') as f:
    json.dump(foams_incremental_config, f, indent=4)

print("✅ FOAMS incremental training configuration created!")
print("📁 Saved to: experiments/foams_incremental_training/config.json")
print("\n🎯 This configuration is optimized for your FOAMS dataset from Google Cloud Storage")


In [None]:
import json
import os

# Create incremental training configuration
incremental_config = {
    "model": "src.training.dcc_tf_binaural",
    "base_metric": "scale_invariant_signal_noise_ratio",
    "fix_lr_epochs": 10,  # Reduced for fine-tuning
    "epochs": 30,  # Reduced for fine-tuning
    "batch_size": 8,  # Smaller batch size for fine-tuning
    "eval_batch_size": 32,
    "n_workers": 4,
    "model_params": {
        "L": 32,
        "label_len": 20,
        "model_dim": 256,
        "num_enc_layers": 10,
        "num_dec_layers": 1,
        "dec_buf_len": 13,
        "dec_chunk_size": 13,
        "use_pos_enc": True,
        "conditioning": "mult",
        "out_buf_len": 4,
        "pretrained_path": "experiments/dc_waveformer/39.pt"  # Load pre-trained model
    },
    "train_dataset": "src.training.datasets.curated_binaural_augrir.CuratedBinauralAugRIRDataset",
    "train_data_args": {
        "fg_dir": "data/your_additional_data/scaper_fmt/train",
        "bg_dir": "data/BinauralCuratedDataset/TAU-acoustic-sounds/TAU-urban-acoustic-scenes-2019-development",
        "bg_scaper_dir": "data/BinauralCuratedDataset/bg_scaper_fmt/train",
        "jams_dir": "data/your_additional_data/labels/train",
        "hrtf_dir": "data/BinauralCuratedDataset/hrtf",
        "dset": "train",
        "sr": 44100,
        "resample_rate": None,
        "reverb": True
    },
    "val_dataset": "src.training.datasets.curated_binaural_augrir.CuratedBinauralAugRIRDataset",
    "val_data_args": {
        "fg_dir": "data/your_additional_data/scaper_fmt/val",
        "bg_dir": "data/BinauralCuratedDataset/TAU-acoustic-sounds/TAU-urban-acoustic-scenes-2019-development",
        "bg_scaper_dir": "data/BinauralCuratedDataset/bg_scaper_fmt/val",
        "jams_dir": "data/your_additional_data/labels/val",
        "hrtf_dir": "data/BinauralCuratedDataset/hrtf",
        "dset": "val",
        "sr": 44100,
        "resample_rate": None,
        "reverb": True
    },
    "test_dataset": "src.training.datasets.curated_binaural_augrir.CuratedBinauralAugRIRDataset",
    "test_data_args": {
        "fg_dir": "data/your_additional_data/scaper_fmt/test",
        "bg_dir": "data/BinauralCuratedDataset/TAU-acoustic-sounds/TAU-urban-acoustic-scenes-2019-evaluation",
        "bg_scaper_dir": "data/BinauralCuratedDataset/bg_scaper_fmt/test",
        "jams_dir": "data/your_additional_data/labels/test",
        "hrtf_dir": "data/BinauralCuratedDataset/hrtf",
        "dset": "test",
        "sr": 44100,
        "resample_rate": None,
        "reverb": True
    },
    "optim": {
        "lr": 0.0001,  # Lower learning rate for fine-tuning
        "weight_decay": 1e-5  # Add weight decay for regularization
    },
    "lr_sched": {
        "mode": "max",
        "factor": 0.5,
        "patience": 3,  # More aggressive scheduling for fine-tuning
        "min_lr": 1e-6,
        "threshold": 0.01,
        "threshold_mode": "abs"
    },
    "commit_hash": "incremental_training_v1"
}

# Save the configuration
os.makedirs('experiments/incremental_training', exist_ok=True)
with open('experiments/incremental_training/config.json', 'w') as f:
    json.dump(incremental_config, f, indent=4)

print("✅ Incremental training configuration created!")
print("📁 Saved to: experiments/incremental_training/config.json")


## 5. Run Incremental Training

Now let's run the incremental training with your additional data.


In [None]:
# Run incremental training with organized spreadsheet data
print("🚀 Starting incremental training with spreadsheet labels...")
print("\n📋 Training configuration:")
print("- Model: Pre-trained SemanticHearing model")
print("- Data: Organized from your spreadsheet labels")
print("- Target: Chewing sound extraction")
print("- Epochs: 30")
print("- Learning rate: 0.0001")
print("- Batch size: 8")
print("\n⏳ This may take several hours depending on your data size...")

# Run training with organized data configuration
!python -m src.training.train experiments/organized_training --use_cuda --start_epoch 0

print("\n✅ Incremental training with spreadsheet labels completed!")


In [None]:
# Check GPU availability
import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")
    print(f"CUDA memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("⚠️  No GPU available. Training will be slow on CPU.")


In [None]:
# Run incremental training with FOAMS dataset
print("🚀 Starting FOAMS incremental training...")
print("\n📋 Training configuration:")
print("- Model: Pre-trained SemanticHearing model")
print("- Dataset: FOAMS dataset from Google Cloud Storage")
print("- Epochs: 30")
print("- Learning rate: 0.0001")
print("- Batch size: 8")
print("\n⏳ This may take several hours depending on your data size...")

# Run training with FOAMS configuration
!python -m src.training.train experiments/foams_incremental_training --use_cuda --start_epoch 0

print("\n✅ FOAMS incremental training completed!")


In [None]:
# Run incremental training
print("🚀 Starting incremental training...")
print("\n📋 Training configuration:")
print("- Model: Pre-trained SemanticHearing model")
print("- Epochs: 30")
print("- Learning rate: 0.0001")
print("- Batch size: 8")
print("\n⏳ This may take several hours depending on your data size...")

# Run training
!python -m src.training.train experiments/incremental_training --use_cuda --start_epoch 0

print("\n✅ Incremental training completed!")


## 6. Evaluate and Test Your Fine-tuned Model

Evaluate the performance and test inference with your fine-tuned model.


In [None]:
# Evaluate the fine-tuned model
print("📊 Evaluating fine-tuned model...")

!python -m src.training.eval experiments/incremental_training --use_cuda

print("\n✅ Evaluation completed!")


In [None]:
# Save your fine-tuned model to Google Drive
import shutil
from datetime import datetime

def save_to_drive():
    """Save the fine-tuned model and results to Google Drive"""
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    
    # Create results directory in Drive
    results_dir = f"/content/drive/MyDrive/SemanticHearing_Results_{timestamp}"
    os.makedirs(results_dir, exist_ok=True)
    
    # Copy model checkpoints
    if os.path.exists('experiments/incremental_training'):
        shutil.copytree('experiments/incremental_training', f"{results_dir}/incremental_training")
        print(f"✅ Model saved to: {results_dir}")
    
    return results_dir

# Save results
results_path = save_to_drive()
print(f"\n📦 Your fine-tuned model has been saved to Google Drive!")
print(f"📁 Location: {results_path}")
print("\n💡 You can now download or use this model for inference!")
