# Vietnamese ASR Evaluation - Custom Analysis Example

This notebook demonstrates how to use individual modules for custom analysis and evaluation.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/quangnt03/vietnamese-asr-benchmark/blob/main/custom_analysis_example.ipynb)

**Note**: This notebook is compatible with Google Colab. The setup cells below will automatically install dependencies and clone the repository.

## [CONFIG] Google Colab Setup

Run these cells if you're using Google Colab. They will:
1. Detect if running on Colab
2. Clone the repository
3. Install all required dependencies
4. Set up the environment

In [1]:
# Check if running on Google Colab
try:
    import google.colab
    IN_COLAB = True
    print("[OK] Running on Google Colab")
except:
    IN_COLAB = False
    print("[OK] Running locally")

[OK] Running locally


In [2]:
# Clone repository and install dependencies (Colab only)
if IN_COLAB:
    print("Setting up environment for Google Colab...\n")
    
    # Clone the repository
    print(" Cloning repository...")
    !git clone https://github.com/quangnt03/vietnamese-asr-benchmark.git
    
    # Change to repository directory
    import os
    os.chdir('vietnamese-asr-benchmark')
    print(f"[OK] Changed directory to: {os.getcwd()}")
    
    # Install dependencies
    print("\n[PACKAGE] Installing dependencies...")
    !pip install -q -r requirements.txt
    
    print("\n[OK] Setup complete! You can now run the notebook.")
else:
    print("Skipping Colab setup (running locally)")

Skipping Colab setup (running locally)


In [None]:
# Verify installation
if IN_COLAB:
    import sys
    from pathlib import Path
    
    print("Verifying installation...")
    print(f"Python version: {sys.version.split()[0]}")
    print(f"Working directory: {Path.cwd()}")
    
    # Check if key files exist in new structure
    key_files = ['src/metrics.py', 'src/dataset_loader.py', 
                 'src/model_evaluator.py', 'src/visualization.py']
    for file in key_files:
        if Path(file).exists():
            print(f"  [OK] {file}")
        else:
            print(f"  [FAILED] {file} - NOT FOUND")
    
    print("\n[OK] Verification complete")

## 1. Setup and Imports

In [12]:
import sys
import pandas as pd
import numpy as np
from pathlib import Path

# Add src directory to path for local execution
sys.path.insert(0, str(Path.cwd().parent))

# Import custom modules
from src.metrics import ASRMetrics, format_metrics_report
from src.dataset_loader import DatasetManager
from src.model_evaluator import ModelEvaluator, ModelFactory
from src.visualization import ASRVisualizer

## 2. Using the Metrics Module

Calculate ASR metrics for individual transcriptions.

In [4]:
# Initialize metrics calculator
calculator = ASRMetrics()

# Example Vietnamese text
reference = "xin chào tôi là người việt nam"
hypothesis = "xin chào tôi là người việt"

# Calculate all metrics
metrics = calculator.calculate_all_metrics(reference, hypothesis)

print(format_metrics_report(metrics, "Example Metrics"))


                      Example Metrics                       

Word Error Rate (WER):           0.1429 (14.29%)
Character Error Rate (CER):      0.1333 (13.33%)
Match Error Rate (MER):          0.1429 (14.29%)
Word Information Lost (WIL):     0.1429 (14.29%)
Word Information Preserved (WIP): 0.8571 (85.71%)




In [5]:
# Batch evaluation
references = [
    "xin chào tôi là người việt nam",
    "hôm nay thời tiết đẹp",
    "tôi yêu tiếng việt"
]

hypotheses = [
    "xin chào tôi là người việt",
    "hôm nay thời tiết đẹp quá",
    "tôi yêu tiếng việt"
]

batch_metrics = calculator.calculate_batch_metrics(references, hypotheses)

print(format_metrics_report(batch_metrics, "Batch Metrics"))


                       Batch Metrics                        

Word Error Rate (WER):           0.1143 (11.43%)
Character Error Rate (CER):      0.1079 (10.79%)
Match Error Rate (MER):          0.1032 (10.32%)
Word Information Lost (WIL):     0.1032 (10.32%)
Word Information Preserved (WIP): 0.8968 (89.68%)
Sentence Error Rate (SER):       0.6667 (66.67%)

                      Error Breakdown                       
------------------------------------------------------------
Total Words:                     16
Hits:                            15
Substitutions:                   0
Deletions:                       1
Insertions:                      1




## 3. Loading and Analyzing Datasets

In [7]:
# Initialize dataset manager
manager = DatasetManager(base_data_dir="./data")

# Load datasets (will use synthetic data if real data not available)
datasets = manager.load_all_datasets()

# Get statistics
stats_df = manager.get_dataset_statistics()
print("\nDataset Statistics:")
display(stats_df)


Loading local datasets

Loading ViMD dataset...
Creating synthetic example for demonstration...
Creating 100 synthetic samples for vimd...
Loading BUD500 dataset...
Creating 50 synthetic samples for bud500...
Loading LSVSC dataset...
Creating 100 synthetic samples for lsvsc...
Loading VLSP 2020 dataset...
Creating 80 synthetic samples for vlsp2020...
Loading VietMed dataset...
Creating 60 synthetic samples for vietmed...

Dataset Statistics:


Unnamed: 0,Dataset,Num Samples,Total Duration (hours),Avg Duration (seconds),Num Speakers,Num Dialects
0,ViMD,100,0.111167,4.002,10,3
1,BUD500,50,0.053376,3.843106,10,3
2,LSVSC,100,0.109559,3.944106,10,3
3,VLSP2020,80,0.088696,3.991313,10,3
4,VietMed,60,0.067053,4.023204,10,3


In [None]:
# Prepare train/test splits
splits = manager.prepare_train_test_splits(
    train_ratio=0.7,
    val_ratio=0.15,
    test_ratio=0.15
)

# Examine a specific dataset
print("\nViMD Test Set:")
test_samples = splits['ViMD']['test']
print(f"Number of samples: {len(test_samples)}")
print(f"First sample: {test_samples[0].transcription if test_samples else 'N/A'}")

## 4. Working with ASR Models

In [8]:
# List available models
print("Available models:")
for model_key in ModelFactory.get_available_models():
    config = ModelFactory.MODEL_CONFIGS[model_key]
    print(f"  {model_key}: {config.name}")

Available models:
  phowhisper-tiny: PhoWhisper-tiny
  phowhisper-base: PhoWhisper-base
  phowhisper-small: PhoWhisper-small
  phowhisper-medium: PhoWhisper-medium
  phowhisper-large: PhoWhisper-large
  whisper-small: Whisper-small
  whisper-medium: Whisper-medium
  whisper-large: Whisper-large-v3
  wav2vec2-xlsr-vietnamese: Wav2Vec2-XLSR-Vietnamese
  wav2vec2-base-vietnamese: Wav2Vec2-Base-Vietnamese
  wav2vn: Wav2Vn


In [None]:
# Load specific models
evaluator = ModelEvaluator(
    models_to_evaluate=['phowhisper-small', 'whisper-small']
)

evaluator.load_models()
models = evaluator.get_loaded_models()

In [None]:
# Transcribe a sample audio (example with mock data)
if models:
    model_name = list(models.keys())[0]
    model = models[model_name]
    
    # Mock transcription
    transcription = model.transcribe("sample_audio.wav")
    print(f"{model_name} transcription: {transcription}")

## 5. Custom Evaluation Loop

In [None]:
# Example: Evaluate a specific model on a specific dataset
from tqdm.notebook import tqdm

def evaluate_model_on_dataset(model, samples, max_samples=10):
    """
    Custom evaluation function.
    """
    references = []
    hypotheses = []
    
    for sample in tqdm(samples[:max_samples], desc="Transcribing"):
        hypothesis = model.transcribe(sample.audio_path)
        references.append(sample.transcription)
        hypotheses.append(hypothesis)
    
    # Calculate metrics
    calculator = ASRMetrics()
    metrics = calculator.calculate_batch_metrics(references, hypotheses)
    
    return metrics, references, hypotheses

# Run evaluation
if models and splits:
    model_name = list(models.keys())[0]
    dataset_name = list(splits.keys())[0]
    
    print(f"\nEvaluating {model_name} on {dataset_name}...")
    metrics, refs, hyps = evaluate_model_on_dataset(
        models[model_name],
        splits[dataset_name]['test'],
        max_samples=5
    )
    
    print(format_metrics_report(metrics, f"{model_name} on {dataset_name}"))

## 6. Analyzing Results

In [None]:
# Load results from CSV (if available)
results_path = "./results/evaluation_results_*.csv"
import glob

csv_files = glob.glob(str(Path("./results") / "evaluation_results_*.csv"))

if csv_files:
    results_df = pd.read_csv(csv_files[0])
    print("\nEvaluation Results:")
    display(results_df[['Model', 'Dataset', 'wer', 'cer', 'mer', 'ser']])
else:
    print("No results file found. Run the main evaluation first.")

In [None]:
# Custom analysis: Best model per dataset
if csv_files and 'results_df' in locals():
    best_models = results_df.loc[results_df.groupby('Dataset')['wer'].idxmin()]
    print("\nBest Model per Dataset (by WER):")
    display(best_models[['Dataset', 'Model', 'wer', 'cer']])

## 7. Creating Custom Visualizations

In [None]:
# Initialize visualizer
visualizer = ASRVisualizer(output_dir="./custom_plots")

# Create synthetic data for demonstration
if not csv_files:
    # Generate sample data
    models = ['PhoWhisper-small', 'Whisper-small']
    datasets = ['ViMD', 'VLSP2020']
    
    results_data = []
    for model in models:
        for dataset in datasets:
            results_data.append({
                'Model': model,
                'Dataset': dataset,
                'wer': np.random.uniform(0.10, 0.20),
                'cer': np.random.uniform(0.05, 0.12),
                'mer': np.random.uniform(0.08, 0.15),
                'wil': np.random.uniform(0.12, 0.25),
                'ser': np.random.uniform(0.20, 0.35),
                'rtf_mean': np.random.uniform(0.2, 0.4),
                'total_insertions': np.random.randint(5, 20),
                'total_deletions': np.random.randint(5, 20),
                'total_substitutions': np.random.randint(10, 30)
            })
    
    results_df = pd.DataFrame(results_data)

In [None]:
# Create individual plots
visualizer.plot_metric_comparison(results_df, metric='wer')
visualizer.plot_all_metrics_heatmap(results_df)
visualizer.plot_model_performance_radar(results_df)

print("\nPlots saved to: ./custom_plots/")

## 8. Export Results

In [None]:
# Export to different formats
if 'results_df' in locals():
    # CSV
    results_df.to_csv('./custom_results.csv', index=False)
    
    # Excel (requires openpyxl)
    # results_df.to_excel('./custom_results.xlsx', index=False)
    
    # JSON
    results_df.to_json('./custom_results.json', orient='records', indent=2)
    
    print("Results exported successfully!")

## Summary

This notebook demonstrated:

1. [OK] Using the metrics module for ASR evaluation
2. [OK] Loading and analyzing Vietnamese datasets
3. [OK] Working with multiple ASR models
4. [OK] Custom evaluation workflows
5. [OK] Analyzing and visualizing results
6. [OK] Exporting results in various formats

For automated evaluation, use `main_evaluation.py` instead!