## Module 4: Frequency-Domain Analysis

In this module, we’ll dive into analyzing audio signals in the frequency domain and extract perceptually meaningful features.

### Key Concepts
- **Magnitude vs. Phase Spectra**  
  How the amplitude and phase components of the Fourier transform each contribute to sound reconstruction.  
- **Spectral Centroid, Bandwidth & Roll-off**  
  Measures of “brightness,” spectral spread, and the high-frequency energy cutoff.  
- **Harmonic vs. Inharmonic Content**  
  Identifying tonal (harmonic) structure versus noise-like (inharmonic) elements.

---

### 📓 Notebook Demos

1. **Phase vs. Magnitude Reconstruction**  
   - Extract just the phase spectrum (zero-out magnitudes), invert back to time-domain and listen  
   - Extract just the magnitude spectrum (discard phase), invert (using zero phase) and listen  
   - Compare what each component contributes to timbre  

2. **Spectral Centroid Slider**  
   - Take a musical recording and apply a simple low- or high-pass filter whose cutoff you can drag  
   - Compute and plot the spectral centroid as you move the slider to hear/see how “brightness” changes  
   - Observe how filtering shifts the centroid (“brightness”) of the sound  

3. **Harmonic vs. Inharmonic Content**  
   - Use librosa’s harmonic-percussive source separation (HPSS) to split a recording into its tonal (harmonic) parts and its noise-like (percussive/inharmonic) parts  
   - Plot their spectra side-by-side, compute a simple “harmonic-to-noise ratio” measure  
   - Provide two audio players to A/B listen to the harmonic vs. inharmonic streams  

---

### 🛠 Exercise: Spectral Feature Clustering
- **Task:** Extract spectral features (centroid, bandwidth, roll-off, harmonicity) from example recordings of different instruments.  
- **Analysis:** Use clustering (e.g. K-means) to group similar timbres.  
- **Visualization:** Plot feature scatter-plots and show cluster assignments.  


### Key Concepts

- **Magnitude vs. Phase Spectra**  
  How the amplitude (magnitude) and timing (phase) components of the Fourier transform each contribute to reconstructing the original sound:  
  - The **magnitude spectrum** determines “how loud” each frequency component is.  
  - The **phase spectrum** governs “when” each sinusoid starts, shaping the precise waveform and transient details.  
  Both are required for an accurate time-domain reconstruction.


## Demo 1: Phase vs. Magnitude Reconstruction

In this demo, you will explore how the **magnitude** and **phase** components of the Fourier transform each contribute to reconstructing a sound.

**What to do:**  
1. Edit the `FILENAME` in the “USER SETTINGS” section at the top of the code cell to point to your audio file in `sounds/`.  
2. Run the cell to compute three versions of the signal:
   - **Original**  
   - **Magnitude-Only Reconstruction** (uses the original magnitudes, but sets all phases to zero)  
   - **Phase-Only Reconstruction** (uses the original phases, but sets all magnitudes to one)  
3. Use the built-in audio players to listen to each version in turn.

**What to observe:**  
- **Magnitude-Only:** You should hear the overall spectral envelope (timbre), but transients and fine waveform details will be smeared or distorted.  
- **Phase-Only:** You’ll hear the timing and transient details (attacks, rhythmic cues), but the sound may be quieter or “thinner” since magnitude information is lost.  
- Compare each reconstruction to the original and think about how magnitude shapes *what* you hear, while phase shapes *when* you hear it.


In [None]:
# ── USER SETTINGS ────────────────────────────────────────────────────────────────
FILENAME = 'pitch3.mp3'   # ← place your audio file in the `sounds/` folder
# ────────────────────────────────────────────────────────────────────────────────

import numpy as np
import librosa
from IPython.display import Audio, display
from pathlib import Path

# ── CONFIG (don’t edit below here) ───────────────────────────────────────────────
SOUNDS_DIR = Path('sounds')

# 1) Load your clip
audio_path = SOUNDS_DIR / FILENAME
y, sr = librosa.load(str(audio_path), sr=None)

# 2) Compute (real-valued) FFT via rfft
Y     = np.fft.rfft(y)
mag   = np.abs(Y)
phase = np.angle(Y)

# 3a) Magnitude-only reconstruction (zero phase)
Y_mag = mag              # implicit phase = 0
y_mag = np.fft.irfft(Y_mag, n=len(y))

# 3b) Phase-only reconstruction (unit magnitude)
Y_ph  = np.exp(1j * phase)
y_ph  = np.fft.irfft(Y_ph, n=len(y))

# 4) Play original and reconstructions
print("▶️ Original Audio")
display(Audio(data=y,      rate=sr, autoplay=False))
print("▶️ Magnitude-Only Reconstruction")
display(Audio(data=y_mag,  rate=sr, autoplay=False))
print("▶️ Phase-Only Reconstruction")
display(Audio(data=y_ph,   rate=sr, autoplay=False))


## Key Concepts: Spectral Centroid, Bandwidth & Roll-Off

In frequency-domain analysis we often want simple numbers that capture the “shape” of a spectrum. Three of the most common are:

- **Spectral Centroid**  
  The “center of mass” of the spectrum, computed as the weighted mean of frequencies by their magnitudes:  
  \[
    \text{centroid} = \frac{\sum_{k} f_k \, |X[k]|}{\sum_{k} |X[k]|}
  \]  
  A higher centroid corresponds to a “brighter” or more high-frequency-rich sound.

- **Spectral Bandwidth**  
  A measure of how spread out the spectrum is around its centroid. Often defined as the root-mean-square deviation of frequencies from the centroid, weighted by magnitude. A wider bandwidth indicates a more “noisy” or spectrally diverse timbre.

- **Spectral Roll-Off**  
  The frequency below which a fixed percentage (e.g. 85% or 95%) of the total spectral energy is contained. It provides a simple cutoff measure for the “upper edge” of the spectrum—useful for distinguishing dark vs. bright sounds.

Together, these features quantify perceptual qualities like brightness, sharpness, and noisiness, and form the basis for many audio analysis and classification tasks.  


## Demo 2: Spectral Centroid with Low-/High-Pass Filtering

In this demo you’ll see how simple low-pass and high-pass filters shift the “brightness” of a sound, as measured by its spectral centroid.

---

### How to use:

1. **Edit the inputs at the top of the code cell**  
   - `FILENAME`: name of your audio file in the `sounds/` folder  
   - `CUTOFF_FREQ`: cutoff frequency in Hz (must satisfy `0 < CUTOFF_FREQ < sr/2`)  
   - `FRAME_LENGTH`: window size (in samples) for computing the spectral centroid  
   - `HOP_LENGTH`: hop size between frames (typically `FRAME_LENGTH/4` or similar)  

2. **Run the cell**  
   - The script will load your clip, design 4th-order Butterworth low- and high-pass filters, and apply them with zero-phase filtering.  
   - It computes three audio streams—original, low-pass, and high-pass—and three corresponding spectral-centroid curves.

---

### Listen & compare

- Use the built-in players to A/B listen to the **Original** vs. **Low-Pass** vs. **High-Pass** versions.  
- Note how the low-pass version sounds **“duller”** (high frequencies removed) and the high-pass version sounds **“thinner”** or **“brighter”** (low frequencies removed).

---

### Observe the plots

- **Low-Pass plot** shows the original centroid (blue) vs. the low-pass centroid (orange). You should see the centroid **drop** whenever high frequencies are attenuated.  
- **High-Pass plot** shows the original centroid (blue) vs. the high-pass centroid (green). Here the centroid **rises**, indicating dominance of higher-frequency content.

---

### What to look for

- **Filter effect on centroid:** Low-pass → lower centroid; High-pass → higher centroid.  
- **Time-varying behavior:** How the “brightness” of the signal evolves over time under each filter.  
- **Audio confirmation:** Does what you hear (dull vs. bright) match the centroid curves?  


In [None]:
# ── USER SETTINGS ────────────────────────────────────────────────────────────────
FILENAME     = 'pitch2.mp3'    # ← place your audio file in the `sounds/` folder
CUTOFF_FREQ  = 2000            # ← cutoff frequency in Hz (0 < CUTOFF_FREQ < sr/2)
FRAME_LENGTH = 2048            # ← frame length for spectral centroid
HOP_LENGTH   = FRAME_LENGTH // 4  # ← hop length for spectral centroid
# ────────────────────────────────────────────────────────────────────────────────

import numpy as np
import matplotlib.pyplot as plt
import librosa
from scipy.signal import butter, filtfilt
from IPython.display import Audio, display
from pathlib import Path

# ── CONFIG (don’t edit below here) ───────────────────────────────────────────────
SOUNDS_DIR = Path('sounds')

# 1) Load audio
y, sr = librosa.load(str(SOUNDS_DIR / FILENAME), sr=None)

# 2) Design Butterworth low-pass and high-pass filters
nyq = sr / 2
b_low,  a_low  = butter(N=4, Wn=CUTOFF_FREQ/nyq, btype='lowpass')
b_high, a_high = butter(N=4, Wn=CUTOFF_FREQ/nyq, btype='highpass')

# 3) Apply zero-phase filtering
y_low  = filtfilt(b_low,  a_low,  y)
y_high = filtfilt(b_high, a_high, y)

# 4) Compute spectral centroids
cent_orig = librosa.feature.spectral_centroid(
    y=y, sr=sr,
    n_fft=FRAME_LENGTH,
    hop_length=HOP_LENGTH
)[0]
cent_low = librosa.feature.spectral_centroid(
    y=y_low, sr=sr,
    n_fft=FRAME_LENGTH,
    hop_length=HOP_LENGTH
)[0]
cent_high = librosa.feature.spectral_centroid(
    y=y_high, sr=sr,
    n_fft=FRAME_LENGTH,
    hop_length=HOP_LENGTH
)[0]
times = librosa.frames_to_time(np.arange(len(cent_orig)),
                               sr=sr, hop_length=HOP_LENGTH)

# 5) Play original, low-pass & high-pass audio
print("▶️ Original Audio")
display(Audio(data=y,      rate=sr, autoplay=False))
print(f"▶️ Low-Pass @ {CUTOFF_FREQ} Hz")
display(Audio(data=y_low,  rate=sr, autoplay=False))
print(f"▶️ High-Pass @ {CUTOFF_FREQ} Hz")
display(Audio(data=y_high, rate=sr, autoplay=False))

# 6a) Plot spectral centroid: Low-Pass vs. Original
plt.figure(figsize=(10, 3))
plt.plot(times, cent_orig,  label='Original', linewidth=1.5)
plt.plot(times, cent_low,   label=f'Low-Pass @ {CUTOFF_FREQ} Hz', linewidth=1.5)
plt.title('Spectral Centroid — Low-Pass Filter')
plt.xlabel('Time (s)')
plt.ylabel('Centroid (Hz)')
plt.legend()
plt.grid(True)
plt.show()

# 6b) Plot spectral centroid: High-Pass vs. Original
plt.figure(figsize=(10, 3))
plt.plot(times, cent_orig,   label='Original', linewidth=1.5)
plt.plot(times, cent_high,   label=f'High-Pass @ {CUTOFF_FREQ} Hz', linewidth=1.5)
plt.title('Spectral Centroid — High-Pass Filter')
plt.xlabel('Time (s)')
plt.ylabel('Centroid (Hz)')
plt.legend()
plt.grid(True)
plt.show()


### Harmonic vs. Inharmonic Content

In this section you’ll learn how to separate and compare the **tonal (harmonic)** parts of a signal—those with clear, integer‐multiple frequency relationships—from the **noise‐like (inharmonic)** parts that lack periodic structure.

- **Harmonic content**  
  - Consists of sinusoidal components whose frequencies are integer multiples of a fundamental.  
  - Carries pitch and tonal information (e.g. sustained notes, vocals, strings).  

- **Inharmonic content**  
  - Composed of broad, non-periodic energy (e.g. drum hits, cymbals, breath noise).  
  - Lacks clear harmonic relationships and sounds more “noisy” or percussive.

**Why it matters:**  
- Many audio-processing tasks (timbre analysis, source separation, synthesis) rely on distinguishing tonal vs. noise elements.  
- Being able to isolate each component can improve pitch tracking, denoising, and creative effects.

> In the next demo we’ll use librosa’s **Harmonic–Percussive Source Separation (HPSS)** to split a recording into its harmonic and inharmonic streams, listen to each, and compare their spectra and energy ratios.  


### Demo 3: Harmonic vs. Inharmonic Separation

In this demo you’ll use **Harmonic–Percussive Source Separation (HPSS)** to split a recording into its tonal (harmonic) and noise-like (inharmonic/percussive) components, listen to each, and compare their spectra.

**How to use:**

1. **Edit the USER SETTINGS** at the top of the code cell:
   - `FILENAME` – name of your audio file in the `sounds/` folder  
   - `MARGIN_HARM` – harmonic margin (>0, e.g. 0.1–10.0)  
   - `MARGIN_PERC` – percussive margin (>0, e.g. 0.1–10.0)  
   - `STFT_N_FFT` – window length for the STFT (power of 2, e.g. 512, 1024, 2048)  
   - `STFT_HOP_LEN` – hop length between STFT frames (typically `STFT_N_FFT//4`)

2. **Run the cell**. The script will:
   - Load your clip
   - Perform HPSS to obtain `y_harm` and `y_perc`
   - Display two audio players to listen to each component
   - Compute and plot the average magnitude spectrum of each
   - Print out the harmonic-to-inharmonic energy ratio

**What to observe:**

- **Audio**  
  - The *harmonic* stream retains sustained, pitched elements (notes, vocals)  
  - The *percussive* stream captures transients and noise-like sounds (drums, cymbals)

- **Spectra**  
  - The harmonic spectrum will show sharp peaks at integer multiples of the fundamental  
  - The inharmonic/percussive spectrum will be broader and noisier  

- **Energy Ratio**  
  - A value >1 indicates more energy in harmonic content; <1 means percussive dominates

Feel free to tweak the `MARGIN_HARM` and `MARGIN_PERC` to adjust the separation aggressiveness, or change the STFT parameters to refine the spectral view.


In [None]:
# ── USER SETTINGS ────────────────────────────────────────────────────────────────
FILENAME      = 'pitch3.mp3'   # ← place your file in the `sounds/` folder
MARGIN_HARM   = 1.0                     # ← harmonic margin (>0, e.g. 0.1–10.0)
MARGIN_PERC   = 1.0                     # ← percussive margin (>0, e.g. 0.1–10.0)
STFT_N_FFT    = 2048                    # ← window size for STFT (power of 2)
STFT_HOP_LEN  = STFT_N_FFT // 4         # ← hop length between frames
# ────────────────────────────────────────────────────────────────────────────────

import numpy as np
import librosa
import matplotlib.pyplot as plt
from pathlib import Path
from IPython.display import Audio, display

# ── CONFIG (don’t edit below here) ───────────────────────────────────────────────
SOUNDS_DIR = Path('sounds')
audio_path = SOUNDS_DIR / FILENAME

# 1) Load audio
y, sr = librosa.load(str(audio_path), sr=None)

# 2) Harmonic–Percussive Source Separation
y_harm, y_perc = librosa.effects.hpss(y,
    margin=(MARGIN_HARM, MARGIN_PERC)
)

# 3) Listen to each component
print("▶️ Harmonic Component")
display(Audio(data=y_harm, rate=sr, autoplay=False))
print("▶️ Inharmonic (Percussive) Component")
display(Audio(data=y_perc, rate=sr, autoplay=False))

# 4) Compute average magnitude spectra
D_h = np.abs(librosa.stft(y_harm, n_fft=STFT_N_FFT, hop_length=STFT_HOP_LEN))
D_p = np.abs(librosa.stft(y_perc, n_fft=STFT_N_FFT, hop_length=STFT_HOP_LEN))
mag_h = D_h.mean(axis=1)
mag_p = D_p.mean(axis=1)
freqs = np.linspace(0, sr/2, len(mag_h))

# 5) Plot spectra side-by-side
fig, ax = plt.subplots(1, 2, figsize=(14, 4), sharey=True)

ax[0].plot(freqs, 20*np.log10(mag_h + 1e-8))
ax[0].set_title('Harmonic Spectrum')
ax[0].set_xlabel('Frequency (Hz)')
ax[0].set_ylabel('Magnitude (dB)')
ax[0].grid(True)

ax[1].plot(freqs, 20*np.log10(mag_p + 1e-8), color='C1')
ax[1].set_title('Inharmonic (Percussive) Spectrum')
ax[1].set_xlabel('Frequency (Hz)')
ax[1].grid(True)

fig.suptitle('Harmonic vs. Inharmonic Spectra', fontsize=16)
plt.tight_layout(rect=[0,0,1,0.95])
plt.show()

# 6) Compute and display energy ratio
energy_h = np.sum(y_harm**2)
energy_p = np.sum(y_perc**2)
print(f"🔢 Harmonic-to-Inharmonic Energy Ratio: {energy_h/energy_p:.2f}")


### 🛠 Exercise: Spectral Feature Clustering

**Objective:**  
Apply unsupervised learning to group audio recordings by timbre based on their spectral characteristics.

**Steps:**

1. **Gather Data**  
   - Place several short audio clips (e.g., 2–5 s) of different instruments—piano, violin, flute, trumpet, drums, etc.—in your `sounds/` folder.

2. **Extract Features**  
   For each clip, compute a feature vector of spectral descriptors:
   - **Spectral Centroid**  
   - **Spectral Bandwidth**  
   - **Spectral Roll-off** (e.g., 85 % energy cutoff)  
   - **Harmonic-to-Noise Ratio** (use HPSS or `librosa.effects.hpss` and compute energy ratio)  

3. **Build a Feature Matrix**  
   - Assemble an \(M\times4\) matrix, where \(M\) is the number of clips and each column is one of the four features.

4. **Cluster with K-Means**  
   - Use scikit-learn’s `KMeans` to partition the feature matrix into \(K\) clusters (choose \(K\) equal to the number of instrument types you have).
   - Examine the cluster labels and see which instruments are grouped together.

5. **Visualize**  
   - Create 2D scatter plots of feature pairs (e.g., centroid vs. bandwidth, roll-off vs. H/N ratio), coloring points by their cluster assignment.
   - Optionally, reduce to 2 D with PCA and plot the clusters in PCA space.

6. **Interpret**  
   - Do the clusters make sense perceptually?  
   - Which features were most discriminative?  
   - How would you refine your feature set or clustering parameters?

> **Bonus:** Try another clustering algorithm (e.g., DBSCAN or Agglomerative Clustering) and compare the results!
