# Batch Audio Analysis

This notebook demonstrates how to perform batch analysis on multiple audio files using the CTC-SpeechRefinement package. We'll explore techniques for analyzing a collection of audio files, comparing their characteristics, and generating summary reports.

## Setup

First, let's import the necessary libraries and set up the environment.

In [None]:
# Add the project root to the Python path
import sys
import os
sys.path.append(os.path.abspath('..'))

# Import libraries
import numpy as np
import matplotlib.pyplot as plt
import librosa
import librosa.display
import pandas as pd
import seaborn as sns
from IPython.display import Audio, display, HTML
import glob
from pathlib import Path
from tqdm.notebook import tqdm

# Import from the project
from ctc_speech_refinement.core.preprocessing.audio import preprocess_audio
from ctc_speech_refinement.core.eda.descriptive_stats import analyze_descriptive_stats, batch_analyze_descriptive_stats
from ctc_speech_refinement.core.eda.time_domain import analyze_time_domain, batch_analyze_time_domain
from ctc_speech_refinement.core.eda.frequency_domain import analyze_frequency_domain, batch_analyze_frequency_domain
from ctc_speech_refinement.core.eda.pitch_timbre import analyze_pitch_timbre, batch_analyze_pitch_timbre
from ctc_speech_refinement.core.eda.anomaly_detection import analyze_anomalies, batch_analyze_anomalies
from ctc_speech_refinement.core.utils.file_utils import get_audio_files

# Set up plotting
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['figure.dpi'] = 100

## 1. Load Multiple Audio Files

Let's load multiple audio files from a directory.

In [None]:
# Define the path to the directory containing audio files
audio_dir = "../data/test1"  # Update this path to your audio directory

# Get list of audio files
audio_files = get_audio_files(audio_dir)

# Print the list of audio files
print(f"Found {len(audio_files)} audio files:")
for file in audio_files:
    print(f"- {file}")

In [None]:
# Load audio files
audio_data_dict = {}
for file in tqdm(audio_files, desc="Loading audio files"):
    try:
        audio_data, sample_rate = librosa.load(file, sr=None)
        audio_data_dict[file] = (audio_data, sample_rate)
    except Exception as e:
        print(f"Error loading {file}: {str(e)}")

# Print basic information about each audio file
print("\nAudio file information:")
for file, (audio_data, sample_rate) in audio_data_dict.items():
    duration = len(audio_data) / sample_rate
    print(f"- {os.path.basename(file)}: {sample_rate} Hz, {duration:.2f} seconds, {len(audio_data)} samples")

## 2. Basic Comparison of Audio Files

Let's compare the basic characteristics of the audio files.

In [None]:
# Compute basic statistics for each audio file
stats_dict = {}
for file, (audio_data, sample_rate) in audio_data_dict.items():
    file_name = os.path.basename(file)
    stats_dict[file_name] = {
        'Duration (s)': len(audio_data) / sample_rate,
        'Sample Rate (Hz)': sample_rate,
        'Mean Amplitude': np.mean(audio_data),
        'Std Dev Amplitude': np.std(audio_data),
        'Min Amplitude': np.min(audio_data),
        'Max Amplitude': np.max(audio_data),
        'RMS': np.sqrt(np.mean(np.square(audio_data))),
        'Zero Crossings': np.sum(librosa.zero_crossings(audio_data))
    }

# Create a DataFrame for comparison
stats_df = pd.DataFrame.from_dict(stats_dict, orient='index')

# Display the statistics
stats_df

In [None]:
# Plot comparison of durations
plt.figure(figsize=(14, 5))
sns.barplot(x=stats_df.index, y='Duration (s)', data=stats_df)
plt.title('Comparison of Audio Durations')
plt.xlabel('Audio File')
plt.ylabel('Duration (seconds)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
# Plot comparison of RMS values
plt.figure(figsize=(14, 5))
sns.barplot(x=stats_df.index, y='RMS', data=stats_df)
plt.title('Comparison of RMS Values')
plt.xlabel('Audio File')
plt.ylabel('RMS')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
# Plot comparison of amplitude statistics
plt.figure(figsize=(14, 8))

# Melt the DataFrame to long format for seaborn
amplitude_stats = ['Mean Amplitude', 'Std Dev Amplitude', 'Min Amplitude', 'Max Amplitude']
amplitude_df = stats_df[amplitude_stats].reset_index().melt(id_vars='index', value_vars=amplitude_stats, var_name='Statistic', value_name='Value')

# Plot
sns.barplot(x='index', y='Value', hue='Statistic', data=amplitude_df)
plt.title('Comparison of Amplitude Statistics')
plt.xlabel('Audio File')
plt.ylabel('Value')
plt.xticks(rotation=45)
plt.legend(title='Statistic')
plt.tight_layout()
plt.show()

## 3. Waveform Comparison

Let's compare the waveforms of the audio files.

In [None]:
# Plot waveforms of all audio files
n_files = len(audio_data_dict)
fig, axes = plt.subplots(n_files, 1, figsize=(14, 3 * n_files))

if n_files == 1:
    axes = [axes]

for i, (file, (audio_data, sample_rate)) in enumerate(audio_data_dict.items()):
    file_name = os.path.basename(file)
    librosa.display.waveshow(audio_data, sr=sample_rate, ax=axes[i])
    axes[i].set_title(f'Waveform: {file_name}')
    axes[i].set_xlabel('Time (s)')
    axes[i].set_ylabel('Amplitude')

plt.tight_layout()
plt.show()

## 4. Spectral Comparison

Let's compare the spectral content of the audio files.

In [None]:
# Plot spectrograms of all audio files
n_files = len(audio_data_dict)
fig, axes = plt.subplots(n_files, 1, figsize=(14, 4 * n_files))

if n_files == 1:
    axes = [axes]

for i, (file, (audio_data, sample_rate)) in enumerate(audio_data_dict.items()):
    file_name = os.path.basename(file)
    
    # Compute spectrogram
    n_fft = 2048
    hop_length = 512
    stft = librosa.stft(audio_data, n_fft=n_fft, hop_length=hop_length)
    stft_db = librosa.amplitude_to_db(np.abs(stft), ref=np.max)
    
    # Plot spectrogram
    img = librosa.display.specshow(stft_db, sr=sample_rate, x_axis='time', y_axis='log', hop_length=hop_length, ax=axes[i])
    fig.colorbar(img, ax=axes[i], format='%+2.0f dB')
    axes[i].set_title(f'Spectrogram: {file_name}')

plt.tight_layout()
plt.show()

In [None]:
# Compute and compare spectral centroids
spectral_centroids = {}
for file, (audio_data, sample_rate) in audio_data_dict.items():
    file_name = os.path.basename(file)
    centroid = librosa.feature.spectral_centroid(y=audio_data, sr=sample_rate, n_fft=2048, hop_length=512)[0]
    spectral_centroids[file_name] = {
        'Mean': np.mean(centroid),
        'Std Dev': np.std(centroid),
        'Min': np.min(centroid),
        'Max': np.max(centroid),
        'Median': np.median(centroid)
    }

# Create a DataFrame for comparison
centroid_df = pd.DataFrame.from_dict(spectral_centroids, orient='index')

# Display the statistics
centroid_df

In [None]:
# Plot comparison of mean spectral centroids
plt.figure(figsize=(14, 5))
sns.barplot(x=centroid_df.index, y='Mean', data=centroid_df)
plt.title('Comparison of Mean Spectral Centroids')
plt.xlabel('Audio File')
plt.ylabel('Frequency (Hz)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## 5. Batch Analysis Using the Package

Let's use the package's built-in batch analysis functions.

In [None]:
# Perform batch descriptive statistics analysis
descriptive_stats_results = batch_analyze_descriptive_stats(audio_data_dict)

# Create a DataFrame for comparison
desc_stats_df = pd.DataFrame()
for file, results in descriptive_stats_results.items():
    file_name = os.path.basename(file)
    desc_stats_df[file_name] = pd.Series(results['stats'])

# Display the statistics
desc_stats_df.T

In [None]:
# Plot comparison of key statistics
key_stats = ['mean', 'std', 'rms', 'crest_factor']
plt.figure(figsize=(14, 8))

# Melt the DataFrame to long format for seaborn
key_stats_df = desc_stats_df.T[key_stats].reset_index().melt(id_vars='index', value_vars=key_stats, var_name='Statistic', value_name='Value')

# Plot
sns.barplot(x='index', y='Value', hue='Statistic', data=key_stats_df)
plt.title('Comparison of Key Audio Statistics')
plt.xlabel('Audio File')
plt.ylabel('Value')
plt.xticks(rotation=45)
plt.legend(title='Statistic')
plt.tight_layout()
plt.show()

In [None]:
# Perform batch time domain analysis
time_domain_results = batch_analyze_time_domain(audio_data_dict)

# Perform batch frequency domain analysis
frequency_domain_results = batch_analyze_frequency_domain(audio_data_dict)

# Perform batch pitch and timbre analysis
pitch_timbre_results = batch_analyze_pitch_timbre(audio_data_dict)

# Perform batch anomaly detection
anomaly_results = batch_analyze_anomalies(audio_data_dict)

## 6. Generate Summary Report

Let's generate a summary report of the batch analysis.

In [None]:
# Create a summary DataFrame
summary_df = pd.DataFrame()

for file, (audio_data, sample_rate) in audio_data_dict.items():
    file_name = os.path.basename(file)
    
    # Basic information
    summary_df.loc[file_name, 'Duration (s)'] = len(audio_data) / sample_rate
    summary_df.loc[file_name, 'Sample Rate (Hz)'] = sample_rate
    
    # Descriptive statistics
    desc_stats = descriptive_stats_results[file]['stats']
    summary_df.loc[file_name, 'Mean Amplitude'] = desc_stats['mean']
    summary_df.loc[file_name, 'Std Dev Amplitude'] = desc_stats['std']
    summary_df.loc[file_name, 'RMS'] = desc_stats['rms']
    summary_df.loc[file_name, 'Crest Factor'] = desc_stats['crest_factor']
    
    # Frequency domain statistics
    freq_stats = frequency_domain_results[file]['spectral_stats']
    summary_df.loc[file_name, 'Mean Spectral Centroid (Hz)'] = freq_stats['centroid_mean']
    summary_df.loc[file_name, 'Mean Spectral Bandwidth (Hz)'] = freq_stats['bandwidth_mean']
    summary_df.loc[file_name, 'Mean Spectral Rolloff (Hz)'] = freq_stats['rolloff_mean']
    
    # Anomaly detection
    summary_df.loc[file_name, 'Amplitude Anomalies (%)'] = anomaly_results[file]['amplitude_anomaly_percentage']
    summary_df.loc[file_name, 'Spectral Anomalies (%)'] = anomaly_results[file]['spectral_anomaly_percentage']

# Display the summary
summary_df

In [None]:
# Generate a heatmap of the summary
plt.figure(figsize=(14, 10))
sns.heatmap(summary_df, annot=True, cmap='viridis', fmt='.2f')
plt.title('Audio Files Summary Heatmap')
plt.tight_layout()
plt.show()

In [None]:
# Generate an HTML summary report
html_report = """
<html>
<head>
    <title>Audio Batch Analysis Summary Report</title>
    <style>
        body { font-family: Arial, sans-serif; margin: 20px; }
        h1 { color: #2c3e50; }
        h2 { color: #3498db; }
        table { border-collapse: collapse; width: 100%; margin-bottom: 20px; }
        th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
        th { background-color: #f2f2f2; }
        tr:nth-child(even) { background-color: #f9f9f9; }
    </style>
</head>
<body>
    <h1>Audio Batch Analysis Summary Report</h1>
    <p>This report summarizes the analysis of multiple audio files.</p>
    
    <h2>Basic Information</h2>
    <table>
        <tr>
            <th>File</th>
            <th>Duration (s)</th>
            <th>Sample Rate (Hz)</th>
        </tr>
"""

for file_name, row in summary_df.iterrows():
    html_report += f"""
        <tr>
            <td>{file_name}</td>
            <td>{row['Duration (s)']:.2f}</td>
            <td>{row['Sample Rate (Hz)']:.0f}</td>
        </tr>
    """

html_report += """
    </table>
    
    <h2>Amplitude Statistics</h2>
    <table>
        <tr>
            <th>File</th>
            <th>Mean Amplitude</th>
            <th>Std Dev Amplitude</th>
            <th>RMS</th>
            <th>Crest Factor</th>
        </tr>
"""

for file_name, row in summary_df.iterrows():
    html_report += f"""
        <tr>
            <td>{file_name}</td>
            <td>{row['Mean Amplitude']:.6f}</td>
            <td>{row['Std Dev Amplitude']:.6f}</td>
            <td>{row['RMS']:.6f}</td>
            <td>{row['Crest Factor']:.2f}</td>
        </tr>
    """

html_report += """
    </table>
    
    <h2>Spectral Statistics</h2>
    <table>
        <tr>
            <th>File</th>
            <th>Mean Spectral Centroid (Hz)</th>
            <th>Mean Spectral Bandwidth (Hz)</th>
            <th>Mean Spectral Rolloff (Hz)</th>
        </tr>
"""

for file_name, row in summary_df.iterrows():
    html_report += f"""
        <tr>
            <td>{file_name}</td>
            <td>{row['Mean Spectral Centroid (Hz)']:.2f}</td>
            <td>{row['Mean Spectral Bandwidth (Hz)']:.2f}</td>
            <td>{row['Mean Spectral Rolloff (Hz)']:.2f}</td>
        </tr>
    """

html_report += """
    </table>
    
    <h2>Anomaly Detection</h2>
    <table>
        <tr>
            <th>File</th>
            <th>Amplitude Anomalies (%)</th>
            <th>Spectral Anomalies (%)</th>
        </tr>
"""

for file_name, row in summary_df.iterrows():
    html_report += f"""
        <tr>
            <td>{file_name}</td>
            <td>{row['Amplitude Anomalies (%)']:.4f}</td>
            <td>{row['Spectral Anomalies (%)']:.4f}</td>
        </tr>
    """

html_report += """
    </table>
</body>
</html>
"""

# Display the HTML report
display(HTML(html_report))

## Conclusion

In this notebook, we've performed a comprehensive batch analysis of multiple audio files. We've compared their basic characteristics, waveforms, spectral content, and other features. We've also generated a summary report that provides an overview of the analysis results.

Batch analysis is an important step in audio data exploration, as it allows us to compare multiple audio files, identify patterns and outliers, and gain insights into the overall characteristics of the audio dataset. This information can be valuable for preprocessing, feature extraction, and model training in speech recognition tasks.