# RNNoise PyTorch - Vietnamese Speech Enhancement

**Project:** Speech-Enhancement/ai/rnnoise-pytorch

**Goal:** Train RNNoise model for Vietnamese speech denoising (ESP32 deployment)

---

## Setup Steps:
1. Mount Google Drive
2. Build C tools (dump_features)
3. Generate training data (.f32)
4. Train PyTorch model
5. Evaluate quality
6. Export to C for ESP32

**Note:** Prepare dataset first (VIVOS + DNS noise) and upload to Drive!

## 1. Mount Drive & Setup Paths

In [None]:
from google.colab import drive
import os

# Mount Drive
drive.mount('/content/drive')

# Project paths (s·ª≠a l·∫°i cho ƒë√∫ng folder c·ªßa b·∫°n)
PROJECT_ROOT = '/content/drive/MyDrive/Speech-Enhancement'
AI_DIR = os.path.join(PROJECT_ROOT, 'ai')

# Key directories
RNNOISE_DIR = os.path.join(AI_DIR, 'rnnoise-pytorch')
REFERENCE_DIR = os.path.join(AI_DIR, 'references', 'rnnoise')
DATA_DIR = os.path.join(AI_DIR, 'data')
LOGS_DIR = os.path.join(AI_DIR, 'logs')
PLOTS_DIR = os.path.join(AI_DIR, 'plots')

# Create directories
for d in [DATA_DIR, LOGS_DIR, PLOTS_DIR]:
    os.makedirs(d, exist_ok=True)

print(f"‚úÖ Project root: {PROJECT_ROOT}")
print(f"‚úÖ RNNoise dir: {RNNOISE_DIR}")
print(f"‚úÖ GPU: {!nvidia-smi -L}")

## 2. Install Dependencies

In [None]:
%cd {RNNOISE_DIR}

# Install Python packages
!pip install -q torch>=2.0.0 numpy tqdm pyyaml
!pip install -q pesq pystoi scipy soundfile librosa matplotlib

# Install build tools for C compilation
!apt-get update -qq && apt-get install -y -qq autoconf automake libtool build-essential sox

print("‚úÖ Dependencies installed!")

## 3. Build C Tools (dump_features)

In [None]:
%cd {REFERENCE_DIR}

# Build from source
!chmod +x autogen.sh configure
!./autogen.sh
!./configure
!make clean
!make

# Verify
if os.path.exists('dump_features'):
    print("‚úÖ dump_features built successfully!")
    !ls -lh dump_features
else:
    print("‚ùå Build failed! Check errors above.")

## 4. Verify Dataset

**Requirements:**
- `clean_speech.pcm` - Vietnamese speech (2+ GB)
- `background_noise.pcm` - DNS noise (20+ GB)
- `foreground_noise.pcm` - Transient sounds (100+ MB)

**Format:** 48kHz, mono, 16-bit PCM

In [None]:
# Check if dataset files exist
speech_pcm = os.path.join(DATA_DIR, 'clean_speech.pcm')
bg_noise_pcm = os.path.join(DATA_DIR, 'background_noise.pcm')
fg_noise_pcm = os.path.join(DATA_DIR, 'foreground_noise.pcm')

for name, path in [("Clean speech", speech_pcm), 
                   ("Background noise", bg_noise_pcm),
                   ("Foreground noise", fg_noise_pcm)]:
    if os.path.exists(path):
        size_mb = os.path.getsize(path) / (1024**2)
        print(f"‚úÖ {name}: {size_mb:.1f} MB")
    else:
        print(f"‚ùå {name}: NOT FOUND! Upload to {DATA_DIR}/")

# Optional: Test playback (first 1 second)
# !play -r 48000 -c 1 -b 16 -e signed-integer "{speech_pcm}" trim 0 1

## 5. Generate Training Features

**This will take ~30-60 minutes for 30K sequences**

In [None]:
%cd {REFERENCE_DIR}

# Output file
features_f32 = 'features_vn_30k.f32'
num_sequences = 30000

# Generate features
!./dump_features \
    "{speech_pcm}" \
    "{bg_noise_pcm}" \
    "{fg_noise_pcm}" \
    "{features_f32}" \
    {num_sequences}

# Verify output
if os.path.exists(features_f32):
    size_gb = os.path.getsize(features_f32) / (1024**3)
    print(f"\n‚úÖ Generated: {features_f32}")
    print(f"   Size: {size_gb:.2f} GB")
    print(f"   Sequences: {num_sequences}")
else:
    print("‚ùå Feature generation failed!")

## 6. Train RNNoise Model

**Parameters:**
- GRU size: 384 (vs default 256)
- Sparsification: 50% (for ESP32)
- Epochs: 150
- Batch size: 128

**Expected time:** ~6-8 hours on T4 GPU

In [None]:
%cd {RNNOISE_DIR}

# Training command
!python scripts/train.py \
    ../references/rnnoise/features_vn_30k.f32 \
    ./output_colab \
    --sparse \
    --epochs 150 \
    --gru-size 384 \
    --batch-size 128 \
    --lr 1e-3 \
    --lr-decay 5e-5 \
    --gamma 0.25 \
    --log-dir ../logs \
    --experiment-name rnnoise_vn_colab

print("\n‚úÖ Training complete!")
print("   Checkpoints: output_colab/checkpoints/")
print("   Logs: ../logs/")

## 7. Plot Training Results

In [None]:
%cd {RNNOISE_DIR}

# Find latest metrics file
import glob
metrics_files = glob.glob(f"{LOGS_DIR}/rnnoise_vn_colab_*_metrics.json")

if metrics_files:
    latest_metrics = sorted(metrics_files)[-1]
    print(f"Using: {latest_metrics}")
    
    # Generate plots
    !python scripts/plot_training.py \
        "{latest_metrics}" \
        "{PLOTS_DIR}"
    
    # Display
    from IPython.display import Image, display
    display(Image(f"{PLOTS_DIR}/loss_curves.png"))
    display(Image(f"{PLOTS_DIR}/convergence.png"))
else:
    print("‚ùå No metrics files found!")

## 8. Evaluate Model Quality

**Requires test dataset (noisy/ and clean/ folders)**

In [None]:
%cd {RNNOISE_DIR}

# Path to test data
test_dir = os.path.join(AI_DIR, 'test_data')
eval_dir = os.path.join(AI_DIR, 'evaluation')

# Find best checkpoint
checkpoints = glob.glob('output_colab/checkpoints/rnnoise_*.pth')
if checkpoints:
    # Use checkpoint 145 (usually best) or final
    best_ckpt = 'output_colab/checkpoints/rnnoise_145.pth' \
                if os.path.exists('output_colab/checkpoints/rnnoise_145.pth') \
                else sorted(checkpoints)[-1]
    
    print(f"Evaluating: {best_ckpt}")
    
    # Run evaluation
    !python scripts/evaluate.py \
        "{best_ckpt}" \
        "{test_dir}" \
        "{eval_dir}" \
        --num-samples 20
    
    # Show results
    results_file = os.path.join(eval_dir, 'evaluation_results.json')
    if os.path.exists(results_file):
        with open(results_file) as f:
            results = json.load(f)
            print("\nüìä Quality Metrics:")
            for metric, values in results['average_metrics'].items():
                print(f"  {metric.upper()}: {values['mean']:.3f} ¬± {values['std']:.3f}")
else:
    print("‚ùå No checkpoints found!")

## 9. Export to C for ESP32

In [None]:
%cd {RNNOISE_DIR}

# Export best checkpoint
export_dir = './exported_c'

!python scripts/export_to_c.py \
    --quantize \
    "{best_ckpt}" \
    "{export_dir}"

# Verify output
c_file = os.path.join(export_dir, 'rnnoise_data.c')
h_file = os.path.join(export_dir, 'rnnoise_data.h')

if os.path.exists(c_file) and os.path.exists(h_file):
    c_size = os.path.getsize(c_file) / 1024
    print(f"\n‚úÖ Export complete!")
    print(f"   C file: {c_size:.0f} KB")
    print(f"   Location: {export_dir}/")
else:
    print("‚ùå Export failed!")

## 10. Download Results

In [None]:
from google.colab import files
import zipfile

# Zip export files
zip_path = '/content/rnnoise_esp32.zip'
with zipfile.ZipFile(zip_path, 'w') as zipf:
    zipf.write(os.path.join(export_dir, 'rnnoise_data.c'), 'rnnoise_data.c')
    zipf.write(os.path.join(export_dir, 'rnnoise_data.h'), 'rnnoise_data.h')

# Zip plots
plots_zip = '/content/training_plots.zip'
with zipfile.ZipFile(plots_zip, 'w') as zipf:
    for plot in glob.glob(f"{PLOTS_DIR}/*.png"):
        zipf.write(plot, os.path.basename(plot))

# Download
print("Downloading files...")
files.download(zip_path)
files.download(plots_zip)

print("\n‚úÖ All done! Files downloaded.")

---

## Summary

**Outputs:**
- ‚úÖ Trained model: `output_colab/checkpoints/rnnoise_*.pth`
- ‚úÖ C files: `exported_c/rnnoise_data.{c,h}`
- ‚úÖ Training plots: `plots/*.png`
- ‚úÖ Quality metrics: `evaluation/evaluation_results.json`

**Next steps:**
1. Integrate C files into ESP32 project
2. Test on actual hardware
3. Fine-tune if needed

**Model specs:**
- Size: ~850 KB (sparse + quantized)
- Latency: ~5-10 ms per frame
- RAM: ~800 KB
- Real-time: ‚úÖ Yes (10ms frames)

---

**Documentation:** `ai/docs/context/rnnoise-pytorch-complete.md`