## Introduction

This notebook introduces the fundamental concepts of audio processing that are required before building **Speech Recognition (ASR)** or **Voice AI systems**.

The goal of this notebook is to understand how raw audio is represented, visualized, and transformed into meaningful features such as spectrograms and MFCCs, which are the backbone of traditional and modern ASR systems.

## What this notebook contains

This notebook covers the following concepts step by step:

* Loading an audio file

* Visualizing the raw audio waveform

* Converting audio from time domain to frequency domain

* Generating spectrograms and mel-spectrograms

* Extracting MFCC features

These steps build the complete audio processing pipeline used in speech systems.

In [None]:
!pip install librosa matplotlib soundfile -q

## 1. Load an Audio File

In [None]:
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np

# Load audio file
file_path = "/content/jewellery_audio.wav" # built-in sample from librosa
y, sr = librosa.load(file_path, sr=None)  # y = waveform, sr = sample rate

print(f"Audio loaded: {'/content/jewellery_audio.wav'}")
print(f"Duration: {len(y)/sr:.2f} seconds, Sample Rate: {sr}")

## 2. Visualize Raw Waveform

In [None]:
plt.figure(figsize=(14, 4))
librosa.display.waveshow(y, sr=sr)
plt.title("Raw Waveform")
plt.xlabel("Time (s)")
plt.ylabel("Amplitude")
plt.show()

## 3. Generate a Spectrogram

In [None]:
# Short-Time Fourier Transform (STFT)
D = np.abs(librosa.stft(y))

plt.figure(figsize=(14, 6))
librosa.display.specshow(librosa.amplitude_to_db(D, ref=np.max),
                         sr=sr, x_axis='time', y_axis='log')
plt.colorbar(format="%+2.0f dB")
plt.title("Spectrogram (Log-Frequency Scale)")
plt.show()

## 4. Generate a Mel-Spectrogram

In [None]:
# Mel-scaled spectrogram
S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128,
                                   fmax=8000)
S_dB = librosa.power_to_db(S, ref=np.max)

plt.figure(figsize=(14, 6))
librosa.display.specshow(S_dB, sr=sr, x_axis='time',
                         y_axis='mel', fmax=8000)
plt.colorbar(format="%+2.0f dB")
plt.title("Mel-Spectrogram")
plt.show()

## 5. Extract MFCC Features

In [None]:
 # Compute MFCCs
mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)

plt.figure(figsize=(14, 6))
librosa.display.specshow(mfccs, x_axis='time')
plt.colorbar()
plt.title("MFCC (Mel-Frequency Cepstral Coefficients)")
plt.ylabel("MFCC Coefficients")
plt.xlabel("Time")
plt.show()

print("MFCC shape:", mfccs.shape)

##Observation â€“ Audio Basics & Visualization

This notebook demonstrates how raw speech audio is transformed into meaningful visual and numerical representations used in speech processing. It shows the progression from time-domain waveforms to frequency-domain representations such as spectrograms, mel-spectrograms, and MFCCs, highlighting how each representation captures different characteristics of speech. Through visualization, it becomes clear that raw waveforms alone are insufficient for speech understanding, while frequency-based and Mel-scaled features effectively represent phonetic and perceptual information. Overall, the notebook establishes the foundational audio concepts required for both legacy and modern speech recognition systems.