## Module 6: Time–Frequency Representations

In this module, we’ll explore how to analyze audio signals simultaneously in time and frequency using a variety of transforms and visualizations.

### Key Concepts
- **Short–Time Fourier Transform (STFT) & Spectrograms**  
  How sliding-window FFTs reveal the time-varying spectral content of a signal.  
- **Constant-Q Transform (CQT) vs. Mel-Spectrogram**  
  Log-frequency analysis tailored to musical pitches (CQT) or perceptual frequency bands (mel).  
- **Wavelet Transforms (Continuous & Discrete)**  
  Multi-scale decomposition using localized time–frequency “atoms.”

---

### 📓 Notebook Demos

1. **STFT Window Visualization**  
   - Play an audio clip while drawing a moving time-window on the waveform  
   - Display the corresponding spectrogram slice in real time  

2. **Linear vs. Mel-Scale Spectrograms**  
   - Compute both representations for the same audio  
   - Visualize and listen to each to appreciate perceptual frequency scaling
3. **Constant-Q Transform (CQT) Analysis**  
   - Compute the CQT spectrogram of your audio (log-spaced bins)  
   - Visualize how bin widths grow with frequency  
   - Listen to the original clip and compare to the CQT-reconstructed audio     

4. **Interactive Continuous Wavelet Transform**  
   - Choose a mother wavelet (Morlet, Mexican hat, etc.)  
   - See its time–frequency tiling and corresponding CWT coefficients  


---

### 🛠 Exercise: CQT-Based Octave Filter Bank
- **Task:** Design and implement a filter bank using the Constant-Q Transform to isolate individual musical octaves.  
- **Steps:**  
  1. Compute the CQT of an audio signal.  
  2. Group CQT bins into octave bands.  
  3. Reconstruct and play back each octave band separately.  
- **Deliverables:**  
  - Plots of the CQT spectrogram with octave overlays.  
  - Reconstructed audio players for each isolated octave.  


### Key Concept: Short–Time Fourier Transform (STFT) & Spectrograms

The **Short–Time Fourier Transform (STFT)** slices a signal into short, overlapping frames and computes an FFT on each one, turning a 1D time-series into a 2D time–frequency map. By windowing each frame (e.g. Hann, Hamming) and sliding it forward by a fixed hop size, you capture how the spectral content evolves over time.

A **spectrogram** is simply the magnitude (or log-magnitude) of the STFT plotted with:

- **X-axis:** Time  
- **Y-axis:** Frequency  
- **Color/Intensity:** Signal energy at each time–frequency point  

With a spectrogram you can clearly see events such as:

- Harmonic stacks moving with a melody  
- Transient “bursts” from percussive hits  
- Formant shifts in speech  

In the next demo, you’ll play an audio clip while a moving window highlights the current frame on the waveform and shows its corresponding slice on the spectrogram, so you can directly connect “what you hear” to “what you see.”  


## Demo: STFT Window Visualization

In this demo, you’ll see exactly how a sliding-window FFT “peels off” the time-varying spectrum of an audio signal—frame by frame.

### What the code does
1. **Loads** your audio file and computes its Short-Time Fourier Transform (STFT) using a Hann window of length `FRAME_SIZE` and hop size `HOP_LENGTH`.  
2. **Displays** an audio player so you can listen to the clip.  
3. **Provides** a slider (`Frame:`) that selects which STFT frame to inspect.  
4. For the selected frame:  
   - **Highlights** the corresponding time-window on the waveform plot.  
   - **Plots** the magnitude spectrum (frequency vs. amplitude) of that window.

### Inputs (edit in the code cell)
- `FILENAME`: Name of your audio file in the `sounds/` folder (WAV or MP3).  
- `FRAME_SIZE`: FFT window length in samples (power of 2, e.g. 512, 1024, 2048).  
- `HOP_LENGTH`: Hop size in samples between successive windows (commonly `FRAME_SIZE//4`).  

### How to run
1. Update the **USER SETTINGS** at the top of the code cell with your file name and desired parameters.  
2. Run the cell to compute the STFT, render the audio player, and show the frame slider.  
3. Drag the **Frame** slider to move through time-frames.  

### What to observe
- **Waveform view**: The shaded region shows which slice of the signal the FFT is analyzing.  
- **Spectrum view**: Peaks reveal the dominant frequencies in that time-window.  
- **Transients** (e.g., drum hits) appear as broad, flat spectra; **tonal** sections (e.g., vocals or sustained notes) show clear spectral lines.  
- By stepping frame by frame, you can link time-domain events to their frequency-domain signatures.


In [None]:
# ── USER SETTINGS ────────────────────────────────────────────────────────────────
FILENAME     = 'cymbal.mp3'   # ← place your audio file in the `sounds/` folder
FRAME_SIZE   = 2048               # ← STFT window size (power of 2)
HOP_LENGTH   = FRAME_SIZE // 4    # ← hop size between frames
# ────────────────────────────────────────────────────────────────────────────────

import numpy as np
import librosa
import matplotlib.pyplot as plt
import ipywidgets as widgets
from IPython.display import Audio, display, clear_output
from pathlib import Path

# ── CONFIG (don’t edit below here) ───────────────────────────────────────────────
SOUNDS_DIR = Path('sounds')
audio_path = SOUNDS_DIR / FILENAME

# 1) Load audio and compute STFT
y, sr = librosa.load(str(audio_path), sr=None)
D     = librosa.stft(y, n_fft=FRAME_SIZE, hop_length=HOP_LENGTH, window='hann')
mag   = np.abs(D)

# 2) Precompute time & frequency axes
times = librosa.frames_to_time(np.arange(mag.shape[1]), sr=sr, hop_length=HOP_LENGTH)
freqs = librosa.fft_frequencies(sr=sr, n_fft=FRAME_SIZE)

# 3) Display audio player
print("▶️ Play Audio")
display(Audio(data=y, rate=sr, autoplay=False))

# 4) Create frame slider
frame_slider = widgets.IntSlider(
    value=0,
    min=0,
    max=mag.shape[1] - 1,
    step=1,
    description='Frame:',
    continuous_update=False
)

# 5) Output area for plots
output = widgets.Output()

# 6) Update function
def update(frame):
    with output:
        clear_output(wait=True)
        t0 = times[frame]
        t1 = t0 + FRAME_SIZE/sr
        
        # a) Waveform + window highlight
        plt.figure(figsize=(10, 3))
        t_axis = np.arange(len(y)) / sr
        plt.plot(t_axis, y, alpha=0.7)
        plt.axvspan(t0, t1, color='C1', alpha=0.3)
        plt.title(f'Waveform (window: {t0:.3f}–{t1:.3f} s)')
        plt.xlabel('Time (s)')
        plt.ylabel('Amplitude')
        plt.tight_layout()
        plt.show()
        
        # b) Spectrum slice at this frame
        plt.figure(figsize=(8, 3))
        plt.plot(freqs, mag[:, frame], alpha=0.7)
        plt.title(f'Spectrum at {t0:.3f} s (frame {frame})')
        plt.xlabel('Frequency (Hz)')
        plt.ylabel('Magnitude')
        plt.tight_layout()
        plt.show()

# 7) Wire slider to update
frame_slider.observe(lambda change: update(change['new']), names='value')

# 8) Initial draw & display UI
update(0)
display(widgets.VBox([frame_slider, output]))


### Key Concept: Constant-Q Transform vs. Mel-Spectrogram

Both CQT and mel-spectrograms provide “time–frequency” views of audio, but they differ in how they space their frequency bins:

- **Constant-Q Transform (CQT)**  
  - **Logarithmic frequency axis:** Bin centers follow musical pitch ratios (e.g. semitones), so each octave is divided into the same number of bins.  
  - **Constant Q-factor:** The ratio of center frequency to bandwidth is constant, giving higher frequency resolution at low frequencies and finer temporal resolution at high frequencies.  
  - **Ideal for music analysis:** Notes, chords, and harmonics align naturally on CQT bins.

- **Mel-Spectrogram**  
  - **Perceptual scale:** Frequencies are warped according to the mel scale, which approximates human pitch perception (more resolution at low frequencies, compressed at high frequencies).  
  - **Linear windowing:** Uses a uniform FFT followed by triangular mel-filterbanks.  
  - **Common in speech / audio ML:** Mel features correlate well with perceptual loudness and are a standard input to many audio-processing models.

**When to choose which?**  
- Use **CQT** when your application is explicitly musical (pitch estimation, chord recognition, transcription).  
- Use **mel-spectrograms** when modeling or visualizing general audio in a perceptually meaningful way (speech recognition, audio embeddings, feature extraction).


## Demo 2: Linear-Frequency vs. Mel-Scale Spectrograms

In this demo you’ll compute and compare two common time–frequency representations for the same audio:

1. **Linear-Frequency Spectrogram**  
   - Uses a standard STFT to show energy at evenly spaced frequency bins.  
2. **Mel-Scale Spectrogram**  
   - Aggregates energy into perceptually-scaled bands (more resolution at low frequencies, less at high).

---

### What the code does

1. **Loads** your chosen audio file (`FILENAME`) at its native sampling rate.  
2. **Computes**:  
   - A linear-frequency spectrogram via `librosa.stft`.  
   - A mel-spectrogram via `librosa.feature.melspectrogram`.  
3. **Converts** both to dB (`amplitude_to_db` / `power_to_db`) for display.  
4. **Plays back** the original audio so you can listen.  
5. **Plots** the two spectrograms side-by-side with a shared color scale.

---

### Inputs you can edit at the top of the code cell

- `FILENAME`  
  - Name of your WAV/MP3 file in the `sounds/` folder.  
- `N_FFT`  
  - FFT window size (power of 2, e.g. 512, 1024, 2048).  
- `HOP_LENGTH`  
  - Hop size between frames (commonly `N_FFT/4`).  
- `N_MELS`  
  - Number of mel bands (e.g. 64, 128, 256).

---

### What to observe

- **Frequency axis**  
  - Linear spectrogram is uniform in Hz, mel spectrogram is warped to match human perception.  
- **Spectral detail**  
  - Mel-scale will show finer detail in the low-frequency range and coarser detail at high frequencies.  
- **Perceptual relevance**  
  - Listen for how the mel spectrogram representation better captures perceptual timbral changes in speech or music.


In [None]:
# ── USER SETTINGS ────────────────────────────────────────────────────────────────
FILENAME     = 'cymbal.mp3'   # ← place your file in `sounds/` (WAV/MP3)
N_FFT        = 2048           # ← FFT window size (power of 2)
HOP_LENGTH   = N_FFT // 4     # ← hop length between frames
N_MELS       = 128            # ← number of mel bands
# ────────────────────────────────────────────────────────────────────────────────

import numpy as np
import matplotlib.pyplot as plt
import librosa
import librosa.display
from IPython.display import Audio, display
from pathlib import Path

# ── CONFIG (don’t edit below here) ───────────────────────────────────────────────
SOUNDS_DIR = Path('sounds')
audio_path = SOUNDS_DIR / FILENAME

# 1) Load audio
y, sr = librosa.load(str(audio_path), sr=None)

# 2) Compute linear-frequency spectrogram (magnitude)
S_lin    = np.abs(librosa.stft(y, n_fft=N_FFT, hop_length=HOP_LENGTH))
S_lin_db = librosa.amplitude_to_db(S_lin, ref=np.max)

# 3) Compute mel-spectrogram
S_mel    = librosa.feature.melspectrogram(
    y=y,
    sr=sr,
    n_fft=N_FFT,
    hop_length=HOP_LENGTH,
    n_mels=N_MELS
)
S_mel_db = librosa.power_to_db(S_mel, ref=np.max)

# 4) Play original audio
print("▶️ Original Audio")
display(Audio(data=y, rate=sr, autoplay=False))

# 5) Plot both spectrograms side by side with constrained_layout
fig, axes = plt.subplots(1, 2, figsize=(14, 5), constrained_layout=True)

# Linear-frequency spectrogram
im1 = librosa.display.specshow(
    S_lin_db,
    sr=sr,
    hop_length=HOP_LENGTH,
    x_axis='time',
    y_axis='linear',
    ax=axes[0]
)
axes[0].set_title('Linear-Frequency Spectrogram')
axes[0].set_xlabel('Time (s)')
axes[0].set_ylabel('Frequency (Hz)')

# Mel-scale spectrogram
im2 = librosa.display.specshow(
    S_mel_db,
    sr=sr,
    hop_length=HOP_LENGTH,
    x_axis='time',
    y_axis='mel',
    fmax=sr/2,
    ax=axes[1]
)
axes[1].set_title(f'Mel-Spectrogram ({N_MELS} bands)')
axes[1].set_xlabel('Time (s)')

# 6) Add a single colorbar for the last image, anchored to the right
fig.colorbar(im2, ax=axes, format='%+2.0f dB', location='right', pad=0.02)

plt.show()


# 🎶 Demo 3: Constant-Q Transform (CQT) Analysis

In this demo, you'll explore a **logarithmic time–frequency representation** that’s tuned to **musical pitches**. You'll compute a **CQT spectrogram** with **geometrically spaced bins**, examine how the **bin widths grow with frequency**, and even **reconstruct the audio** from the CQT to hear what information is preserved.

---

## 🛠️ How to Use

Edit the inputs at the top of the code cell:

- **`FILENAME`**  
  Name of your audio file in the `sounds/` folder (`.wav` or `.mp3`).

- **`N_BINS`**  
  Total number of CQT bins (e.g. `84` for 7 octaves × 12 bins).

- **`BINS_PER_OCTAVE`**  
  Bins per octave (commonly `12` for semitones).

- **`HOP_LENGTH`**  
  Hop size between frames (in samples).

- **`FMIN`**  
  Minimum frequency in Hz (e.g. `C1 = 32.70 Hz`).

---

## ▶️ What Happens When You Run the Cell

1. **Load your clip**
2. **Compute the CQT**  
   `librosa.cqt(y, sr, n_bins=N_BINS, bins_per_octave=BINS_PER_OCTAVE)`
3. **Display the CQT spectrogram** on a log-frequency axis
4. **Plot bin width growth** with respect to center frequency
5. **Reconstruct audio** from the CQT and provide an interactive player

---

## 👀 What to Observe

### 🎼 CQT Spectrogram
- **Logarithmic frequency scaling**: Equal-width spacing in **octaves**, not linear Hz
- **Higher bins** cover **wider frequency bands** — this is ideal for musical analysis

### 📈 Bin-Width Growth Plot
- Plots the **difference between successive center frequencies**
- Confirms that **higher-frequency bins are broader**

### 🎧 Audio Comparison
- Compare **original** vs. **CQT-reconstructed** audio
- Listen for how well the CQT preserves **tonal content** vs. **transient detail**

---

🧠 *Tip*: Use this analysis when working with **melodic**, **harmonic**, or **pitch-based** audio data.


In [None]:
# ── USER SETTINGS ────────────────────────────────────────────────────────────────
FILENAME          = 'cymbal.mp3'  # ← place your file in `sounds/` (WAV/MP3)
N_BINS            = 84                     # ← total CQT bins (e.g. 84 for 7 octaves × 12 bins)
BINS_PER_OCTAVE   = 12                     # ← bins per octave (commonly 12)
HOP_LENGTH        = 512                    # ← hop size between frames
FMIN              = 32.70                  # ← minimum frequency in Hz (e.g. C1 = 32.70 Hz)
# ────────────────────────────────────────────────────────────────────────────────

import numpy as np
import matplotlib.pyplot as plt
import librosa
import librosa.display
from IPython.display import Audio, display
from pathlib import Path

# Load audio
SOUNDS_DIR = Path('sounds')
y, sr = librosa.load(str(SOUNDS_DIR / FILENAME), sr=None)

# Compute CQT
C = librosa.cqt(
    y,
    sr=sr,
    hop_length=HOP_LENGTH,
    fmin=FMIN,
    n_bins=N_BINS,
    bins_per_octave=BINS_PER_OCTAVE
)
C_mag = np.abs(C)
C_db  = librosa.amplitude_to_db(C_mag, ref=np.max)

# Get CQT bin center frequencies
freqs = librosa.cqt_frequencies(
    n_bins=N_BINS,
    fmin=FMIN,
    bins_per_octave=BINS_PER_OCTAVE
)

# 1) Play original audio
print("▶️ Original Audio")
display(Audio(data=y, rate=sr, autoplay=False))

# 2) Plot CQT spectrogram (log-frequency axis)
plt.figure(figsize=(10, 4))
librosa.display.specshow(
    C_db,
    sr=sr,
    hop_length=HOP_LENGTH,
    x_axis='time',
    y_axis='cqt_hz',
    fmin=FMIN,
    bins_per_octave=BINS_PER_OCTAVE
)
plt.colorbar(format='%+2.0f dB')
plt.title('Constant-Q Transform (CQT) Spectrogram')
plt.tight_layout()
plt.show()

# 3) Visualize bin-width growth
bin_widths = np.diff(freqs)
plt.figure(figsize=(8, 4))
plt.plot(freqs[:-1], bin_widths, marker='o')
plt.title('CQT Bin Width vs. Center Frequency')
plt.xlabel('Center Frequency (Hz)')
plt.ylabel('Bin Width (Hz)')
plt.grid(True)
plt.tight_layout()
plt.show()

# 4) Reconstruct audio from CQT and play
y_rec = librosa.icqt(
    C,
    sr=sr,
    hop_length=HOP_LENGTH,
    fmin=FMIN,
    bins_per_octave=BINS_PER_OCTAVE
)
print("▶️ CQT-Reconstructed Audio")
display(Audio(data=y_rec, rate=sr, autoplay=False))


## 🔑 Key Concept: Wavelet Transforms (Continuous & Discrete)

Wavelet transforms perform a **multi-scale decomposition** of a signal using **localized time–frequency “atoms”**, allowing you to **zoom in on both fast transients and slow oscillations simultaneously**.

### 🌊 Continuous Wavelet Transform (CWT)
- Provides a **dense**, highly **redundant** representation.
- Ideal for **detailed time–frequency analysis**.
- Useful for examining signals with **non-stationary features**.

### 📦 Discrete Wavelet Transform (DWT)
- Produces a **compact**, **hierarchical set of coefficients**.
- Commonly used for **data compression** and **denoising**.
- Efficient for **storage and reconstruction**.

---

✅ **Summary**:  
Wavelet transforms offer a powerful way to analyze signals across **multiple time and frequency scales**, making them essential for **non-stationary signal processing** such as in **audio, EEG, image compression**, and more.


# 🎛️ Demo 4: Continuous & Discrete Wavelet Transforms

In this final demo, you’ll explore **two complementary wavelet-based analyses**:

---

## 🌊 Continuous Wavelet Transform (CWT)
- Computes a **dense, redundant time–frequency representation** using a chosen *“mother” wavelet*.
- Enables you to **zoom in on transients at fine scales** and **sustained oscillations at coarse scales**.

## 📦 Discrete Wavelet Transform (DWT)
- Decomposes the signal into a **compact, hierarchical set of coefficients** (approximation + detail bands).
- Commonly used for **denoising** or **compression**.

---

## 🛠️ What to Do

Edit the `USER SETTINGS` at the top of the code cell:

- **`FILENAME`**: Your audio file (`.wav` or `.mp3`) in the `sounds/` folder.
- **`CWT_WAVELET`**: Name of the continuous wavelet (e.g. `'morl'`, `'mexh'`).
- **`CWT_SCALES`**: List of CWT scales (integers ≥ 1) to analyze different resolutions.
- **`DWT_WAVELET`**: Name of the discrete wavelet (e.g. `'db4'`, `'sym5'`).
- **`DWT_LEVEL`**: Number of decomposition levels (integer ≥ 1).

Then **run the cell**.

---

## ▶️ What It Does

1. **Loads your audio clip** and plays the **original audio**.

### 📉 CWT
- Computes and displays a **scalogram**:  
  - \|coefficients\| vs. time & pseudo-frequency on a **log scale**.

### 📊 DWT
- Performs a **multilevel wavelet decomposition**.
- Plots the **energy** in each band (approximation + details).
- **Reconstructs** the final-level approximation and plays it back.

---

## 👀 What to Observe

### 🎼 CWT Scalogram
- Observe how **high-scale (coarse)** coefficients capture **slow, low-frequency content**.
- Notice how **low-scale (fine)** coefficients capture **fast transients**.

### ⚡ DWT Energy Plot
- Identify **which levels carry the most energy**:
  - e.g., Level 0 = Coarse approximation  
  - Higher levels = Finer detail bands

### 🎧 DWT Reconstruction
- Compare the **approximation-only audio** to the **original**.
- Consider: *What details are lost when you exclude all detail bands?*

---

🧠 **Insight**: This hands-on comparison highlights how **wavelets provide flexible, multi-scale views** of audio — offering richer insights than fixed-resolution FFT methods.


In [None]:
# ── USER SETTINGS ────────────────────────────────────────────────────────────────
FILENAME      = 'cymbal.mp3'         # ← place your file in `sounds/` (WAV/MP3)
CWT_WAVELET   = 'morl'               # ← continuous wavelet name (e.g. 'morl','mexh')
CWT_SCALES    = list(range(1, 129))  # ← list of scales for CWT (integers ≥1)
DWT_WAVELET   = 'db4'                # ← discrete wavelet name (e.g. 'db1','db4','sym5')
DWT_LEVEL     = 4                    # ← decomposition level (integer ≥1)
# ────────────────────────────────────────────────────────────────────────────────

import numpy as np
import matplotlib.pyplot as plt
import librosa
import pywt
from IPython.display import Audio, display
from pathlib import Path

# ── CONFIG (don’t edit below here) ───────────────────────────────────────────────
SOUNDS_DIR = Path('sounds')
audio_path = SOUNDS_DIR / FILENAME

# 1) Load audio
y, sr = librosa.load(str(audio_path), sr=None)

# 2) Play original
print("▶️ Original Audio")
display(Audio(data=y, rate=sr, autoplay=False))

# ── Continuous Wavelet Transform (CWT) ────────────────────────────────────────────
coeffs, freqs = pywt.cwt(
    y,
    scales=CWT_SCALES,
    wavelet=CWT_WAVELET,
    sampling_period=1/sr
)

plt.figure(figsize=(10, 4))
plt.imshow(
    np.abs(coeffs),
    extent=[0, len(y)/sr, freqs[-1], freqs[0]],
    aspect='auto',
    cmap='viridis'
)
plt.yscale('log')
plt.colorbar(label='Magnitude')
plt.title(f'Continuous Wavelet Transform (wavelet={CWT_WAVELET})')
plt.xlabel('Time (s)')
plt.ylabel('Pseudo-frequency (Hz)')
plt.tight_layout()
plt.show()

# ── Discrete Wavelet Transform (DWT) ─────────────────────────────────────────────
coeff_list = pywt.wavedec(y, wavelet=DWT_WAVELET, level=DWT_LEVEL)

# energies of each sub-band
levels = list(range(len(coeff_list)))
energies = [np.sum(c**2) for c in coeff_list]

plt.figure(figsize=(8, 4))
plt.stem(levels, energies, basefmt=" ")
plt.title(f'DWT Coefficients Energy (wavelet={DWT_WAVELET})')
plt.xlabel('Level (0 = approximation)')
plt.ylabel('Coefficient Energy')
plt.grid(True)
plt.tight_layout()
plt.show()

# 3) Reconstruct & play final-level approximation
print(f"▶️ DWT Approximation (Level = {DWT_LEVEL})")
# rebuild only the approximation band at index 0 plus zeros for details
y_dwt_approx = pywt.waverec(
    [coeff_list[0]] + [None]*DWT_LEVEL,
    wavelet=DWT_WAVELET
)
# trim/pad to original length
y_dwt_approx = y_dwt_approx[:len(y)]
display(Audio(data=y_dwt_approx, rate=sr, autoplay=False))


# 🎼 Exercise: CQT-Based Octave Filter Bank

## 🎯 Task
Design and implement a **filter bank** using the **Constant-Q Transform (CQT)** to **isolate individual musical octaves** from a recording.

---

## 📝 Steps

1. **Compute the CQT** of an audio signal using  
   ```python
   librosa.cqt(...)
   ```
   ### ⚙️ Octave Band Processing Instructions

- Use a suitable number of **bins per octave**  
  *(e.g., `12` for semitone resolution).*

---

### 🎚️ Group the CQT Bins into Octave Bands:

**Example:**

- **Bins 0–11** → Octave 1  
- **Bins 12–23** → Octave 2  
- **Bins 24–35** → Octave 3  
- *(...and so on)*

---

### 🔄 For Each Octave Band:

1. **Zero out all other CQT bins** not belonging to that octave.
2. **Invert the modified CQT** back to the time domain using:

```python
librosa.icqt(...)
```
### ▶️ Playback or Export

- **Play back** or **export** the **isolated octave audio** for analysis and comparison.

---

## 📦 Deliverables

- 📊 **Plot** of the **full CQT spectrogram** with **colored overlays or annotations** marking each **octave band**.

- 🔊 A **set of audio players** (or exported `.wav` files) to **listen to each reconstructed octave band** in isolation.

---

### 🧠 Tip

Use the following command to help **visualize note/octave positions** on the spectrogram:

```python
librosa.display.specshow(..., y_axis='cqt_note')

