## Introduction

This notebook focuses on a**udio preprocessing and augmentation techniques** used to make ASR systems robust to noise, variation, and real-world conditions.

The goal is to simulate real-world speech variations and observe how preprocessing affects recognition.

## What this notebook contains

* Loading and visualizing audio

* Helper functions for plotting

* Audio augmentation techniques

* Saving augmented audio

* Comparing ASR performance

## 1. Load an Audio File

In [None]:
import librosa, librosa.display
import matplotlib.pyplot as plt
import numpy as np

# Use built-in example or upload your own audio
file_path ="/content/jewellery_audio.wav"
y, sr = librosa.load(file_path, sr=16000)  # force 16kHz for ASR

print(f"Original audio: {'/content/NoteGPT_Speech_1757675744872.wav'}, Duration: {len(y)/sr:.2f}s")

## 2. Helper Function for Visualization

In [None]:
def plot_waveform(y, sr, title="Waveform"):
    plt.figure(figsize=(12, 3))
    librosa.display.waveshow(y, sr=sr)
    plt.title(title)
    plt.show()

## 3. Apply Augmentations

In [None]:
# (a) Add Noise
noise = 0.005 * np.random.randn(len(y))
y_noise = y + noise

# (b) Pitch Shift (up by 2 semitones)
y_pitch = librosa.effects.pitch_shift(y, sr=sr, n_steps=2)

# (c) Time Stretch (slower by 1.25x)
y_stretch = librosa.effects.time_stretch(y, rate=0.8)

# Plot all versions
plot_waveform(y, sr, "Original")
plot_waveform(y_noise, sr, "With Noise")
plot_waveform(y_pitch, sr, "Pitch Shifted (+2 semitones)")
plot_waveform(y_stretch, sr, "Time Stretched (Slower)")

## 4. Save Augmented Audio

In [None]:
import soundfile as sf

sf.write("original.wav", y, sr)
sf.write("noise.wav", y_noise, sr)
sf.write("pitch.wav", y_pitch, sr)
sf.write("stretch.wav", y_stretch, sr)

print("Audio files saved: original.wav, noise.wav, pitch.wav, stretch.wav")

## 5. Compare with ASR (Whisper API)

In [None]:
from google.colab import userdata
OPENAI_API_KEY = userdata.get("OPENAI_API_KEY")
OPENAI_BASE_URL = userdata.get("OPENAI_BASE_URL")
from openai import OpenAI

client = OpenAI(
    api_key=OPENAI_API_KEY,
    base_url=OPENAI_BASE_URL
)


In [None]:
import openai


def transcribe_audio(file_path):
    with open(file_path, "rb") as f:
        response = client.audio.transcriptions.create(
            model="whisper-1",
            file=f
        )
    print(response)
    return response.text

# Run ASR on original vs augmented audio
print("Original:", transcribe_audio("original.wav"))
print("With Noise:", transcribe_audio("noise.wav"))
print("Pitch Shifted:", transcribe_audio("pitch.wav"))
print("Time Stretched:", transcribe_audio("stretch.wav"))

## Observation â€“ Audio Preprocessing & Augmentation

This notebook illustrates the role of audio preprocessing and augmentation in improving the robustness of speech data for real-world applications. By applying techniques such as noise addition and visual comparison of original and augmented audio, it shows how speech signals can be modified while preserving essential linguistic content. The notebook highlights that preprocessing improves audio quality and augmentation increases data diversity, helping speech recognition systems generalize better under varying environmental conditions. Overall, it emphasizes that preprocessing and augmentation are critical steps for building reliable and production-ready Voice AI systems.