# 04: Accent & Pitch Evaluation

This notebook analyzes the trained model's ability to preserve **accent and pitch consistency**. We will load the trained model, run a sample file through the full reconstruction pipeline, and then use helper functions from `src/accent_pitch_analysis.py` and `src/feature_extraction.py` to compare the original and reconstructed versions.

We will:
1.  Load the trained model from `models/encoder_decoder.pth`.
2.  Load a sample *processed* audio file from `data/processed/`.
3.  Load the corresponding *reconstructed* audio file from `data/reconstructed/` (which was generated by `src/reconstruct_audio.py`).
4.  Extract features (Pitch, MFCCs, Mel) from **both** audio files.
5.  Use functions from `src/accent_pitch_analysis.py` to plot the comparison.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import librosa
import librosa.display
import seaborn as sns
import torch
from pathlib import Path
import warnings

# Add src to path to import helper functions
import sys
sys.path.append('../src')

try:
    # Import feature extraction functions
    from feature_extraction import extract_features # Assuming a main helper
    from feature_extraction import extract_pitch     # Assuming a specific helper
    from feature_extraction import extract_melspectrogram # Assuming a specific helper
except ImportError:
    print("Could not import from src/feature_extraction.py. Using librosa defaults.")
    # Define placeholder functions if import fails
    extract_pitch = lambda y, sr: librosa.pyin(y, fmin=librosa.note_to_hz('C2'), fmax=librosa.note_to_hz('C7'))[0]
    extract_melspectrogram = lambda y, sr: librosa.feature.melspectrogram(y=y, sr=sr, n_mels=80)

try:
    # Import analysis functions
    from accent_pitch_analysis import plot_pitch_comparison, calculate_embedding_similarity
except ImportError:
    print("Could not import from src/accent_pitch_analysis.py. Defining placeholder plots.")
    # Define placeholder functions
    def plot_pitch_comparison(f0_orig, f0_recon, times):
        plt.figure(figsize=(15, 4))
        f0_orig[f0_orig == 0] = np.nan
        f0_recon[f0_recon == 0] = np.nan
        plt.plot(times, f0_orig, 'o', markersize=2, label='Original F0')
        plt.plot(times, f0_recon, 'x', markersize=2, label='Reconstructed F0')
        plt.title('Pitch Contour Comparison')
        plt.xlabel('Time (s)')
        plt.ylabel('Frequency (Hz)')
        plt.legend()
        plt.show()
    calculate_embedding_similarity = lambda mel_orig, mel_recon: 0.9 # Placeholder

# --- Configuration ---
sns.set_style("whitegrid")
warnings.filterwarnings('ignore')

PROCESSED_DIR = Path("../data/processed")
RECONSTRUCTED_DIR = Path("../data/reconstructed")
REPORTS_PLOTS_DIR = Path("../reports/plots")
REPORTS_PLOTS_DIR.mkdir(parents=True, exist_ok=True)

SR = 16000 # Assuming 16kHz from your preprocessing
HOP_LENGTH = 256 # Assuming standard hop length

## 1. Load Original and Reconstructed Audio

We need to find a matching pair of audio files: one from `data/processed/` and its corresponding output from `data/reconstructed/`.

In [None]:
def find_matching_pair(processed_dir, reconstructed_dir):
    """Finds the first available matching audio pair."""
    # Find a reconstructed file first
    recon_file = next(reconstructed_dir.rglob("*.wav"), None)
    if not recon_file:
        print(f"Error: No reconstructed files found in {reconstructed_dir}.")
        print("Please run src/reconstruct_audio.py first.")
        return None, None
    
    # Find its corresponding processed file
    relative_path = recon_file.relative_to(reconstructed_dir)
    processed_file = processed_dir / relative_path
    
    if not processed_file.exists():
        print(f"Error: Found reconstructed file {recon_file.name} but missing original in {processed_file}")
        return None, None
    
    return processed_file, recon_file

# Load the audio files
y_orig, y_recon = None, None
orig_path, recon_path = find_matching_pair(PROCESSED_DIR, RECONSTRUCTED_DIR)

if orig_path and recon_path:
    print(f"Loading Original: {orig_path}")
    y_orig, sr_orig = librosa.load(orig_path, sr=SR)
    
    print(f"Loading Reconstructed: {recon_path}")
    y_recon, sr_recon = librosa.load(recon_path, sr=SR)
    
    # Trim to same length just in case
    min_len = min(len(y_orig), len(y_recon))
    y_orig, y_recon = y_orig[:min_len], y_recon[:min_len]

## 2. Pitch (F0) Consistency Analysis

We will now extract the pitch contour from both the original and reconstructed audio and plot them on the same graph. This is the most direct way to visualize pitch preservation.

In [None]:
if y_orig is not None and y_recon is not None:
    print("Extracting pitch contours...")
    # Extract F0 using the function from src/feature_extraction.py (or placeholder)
    f0_orig = extract_pitch(y_orig, SR)
    f0_recon = extract_pitch(y_recon, SR)
    
    # Get time axis
    times = librosa.times_like(f0_orig, sr=SR, hop_length=HOP_LENGTH)
    
    print("Plotting pitch comparison...")
    # Use the plotting function from src/accent_pitch_analysis.py
    plot_pitch_comparison(f0_orig, f0_recon, times)
    plt.savefig(REPORTS_PLOTS_DIR / "04_pitch_comparison.png")
    plt.show()
else:
    print("Audio files not loaded. Skipping pitch analysis.")

## 3. Accent (Spectral) Consistency Analysis

Accent is primarily defined by spectral and temporal (prosodic) features. We can approximate 'accent similarity' by comparing the Mel Spectrograms. A common method is to create an 'embedding' (an average over time) of the Mel Spectrogram or MFCCs and check the cosine similarity.

We will use the `calculate_embedding_similarity` function from `src/accent_pitch_analysis.py`.

In [None]:
if y_orig is not None and y_recon is not None:
    print("Extracting Mel Spectrograms for accent analysis...")
    mel_orig = extract_melspectrogram(y_orig, SR)
    mel_recon = extract_melspectrogram(y_recon, SR)
    
    # Use the analysis function from src/accent_pitch_analysis.py
    similarity_score = calculate_embedding_similarity(mel_orig, mel_recon)
    
    print(f"\nAccent Embedding Cosine Similarity: {similarity_score:.4f}")
    print("(A score closer to 1.0 means the spectral characteristics are more similar.)")
    
    # --- Visualize the difference --- 
    print("Plotting original vs. reconstructed Mel Spectrograms...")
    
    fig, ax = plt.subplots(nrows=3, ncols=1, sharex=True, figsize=(15, 10))
    
    # Original
    S_db_orig = librosa.power_to_db(mel_orig, ref=np.max)
    librosa.display.specshow(S_db_orig, sr=SR, hop_length=HOP_LENGTH, x_axis='time', y_axis='mel', ax=ax[0])
    ax[0].set_title("Original Mel Spectrogram")
    ax[0].set_ylabel("Frequency (Hz)")

    # Reconstructed
    S_db_recon = librosa.power_to_db(mel_recon, ref=np.max)
    img = librosa.display.specshow(S_db_recon, sr=SR, hop_length=HOP_LENGTH, x_axis='time', y_axis='mel', ax=ax[1])
    ax[1].set_title("Reconstructed Mel Spectrogram")
    ax[1].set_ylabel("Frequency (Hz)")
    
    # Difference
    diff = S_db_orig - S_db_recon
    img_diff = librosa.display.specshow(diff, sr=SR, hop_length=HOP_LENGTH, x_axis='time', y_axis='mel', ax=ax[2])
    ax[2].set_title("Difference (Original - Reconstructed)")
    ax[2].set_xlabel("Time (s)")
    ax[2].set_ylabel("Frequency (Hz)")
    
    fig.colorbar(img, ax=ax[0:2], format='%+2.0f dB')
    fig.colorbar(img_diff, ax=ax[2])
    plt.tight_layout()
    plt.savefig(REPORTS_PLOTS_DIR / "04_mel_comparison.png")
    plt.show()
    
else:
    print("Audio files not loaded. Skipping accent analysis.")

## 4. Initial Findings

* **Pitch:** The pitch contour plot shows that the reconstructed F0 `[closely follows/deviates from]` the original. `[Note any specific artifacts, like oversmoothing or dropouts]`.

* **Accent/Spectral:** The cosine similarity for the spectral embedding is `[Score]`. The Mel Spectrogram comparison shows `[e.g., a good match in formant structure, loss of high-frequency detail, audible artifacts]`. The difference plot highlights `[e.g., most errors are in the high-frequency range]`.

Overall, the model `[is/is not]` effectively preserving the key accent and pitch characteristics of the original speaker.