## Module 9: Statistical & Texture Features

In this module we’ll explore higher‐level statistical representations and “texture” descriptors that capture repeating patterns and timbral characteristics beyond frame-based spectra.

### 🔑 Key Concepts
- **Bag-of-Aural-Words & Chroma Features**  
  Represent audio as “codewords” (e.g. spectral clusters) or pitch–class (12-bin) profiles for harmonic analysis.  
- **Recurrence Plots & Self-Similarity Matrices**  
  Visualize how audio frames repeat or evolve over time, revealing structure (e.g. motifs, rhythm).  
- **Texture Descriptors**  
  Compute local patterns (e.g. Local Binary Patterns) on spectrogram “images” to characterize textures like rain, applause, etc.

---

### 📓 Notebook Demos

1. **Interactive Chroma Plot & Key Changes**  
   - Compute a chromagram for a song  
   - Highlight detected chord/key changes with vertical markers  
   - Listen to playback while stepping through harmonic transitions  

2. **Spectrogram Texture & LBP Histograms**  
   - Load two “texture” sounds (e.g. rain vs. applause)  
   - Compute spectrogram and apply Local Binary Patterns (LBP) to each time–frequency patch  
   - Plot LBP histograms side by side  
   - Play both clips to connect timbral texture with their statistical signatures  

---

### 🛠 Exercise: Texture-Based SVM Classification
- **Task:** Extract texture descriptors (e.g. LBP histograms, recurrence-based features) from a set of environmental sounds.  
- **Analysis:** Use scikit-learn to train an **SVM classifier** to distinguish classes (rain, applause, traffic, etc.).  
- **Deliverables:**  
  - Feature extraction code + normalized feature matrix  
  - SVM training script + cross-validation accuracy report  
  - Confusion matrix and sample audio misclassifications  


## 🔑 Key Concepts

### 📦 Bag-of-Aural-Words & Chroma Features
- **Bag-of-Aural-Words**  
  - Treat short audio segments as “words” by clustering spectral or cepstral descriptors.  
  - Represent each clip by a histogram of these codewords, analogous to text retrieval.  
  - Useful for audio classification and retrieval tasks where timbral textures matter.

- **Chroma Features**  
  - Collapse the full spectrum into 12 pitch classes (C, C♯, D, …, B), regardless of octave.  
  - Capture the harmonic/pitch content of a signal—ideal for key detection, chord recognition, and music similarity.  
  - Represented as a 12-dimensional vector per frame, showing energy in each semitone class.

### 🔄 Recurrence Plots & Self-Similarity Matrices
- **Recurrence Plot**  
  - A binary or continuous map showing when a feature vector (e.g. a spectral frame) recurs at later times.  
  - Constructed by comparing each frame to every other frame and thresholding similarity.

- **Self-Similarity Matrix**  
  - A continuous-valued matrix of pairwise similarities (e.g. cosine or Gaussian kernel) between feature vectors over time.  
  - Highlights repeating patterns, sections, and structural motifs—diagonal lines indicate sustained similarity, off-diagonal blocks show repeated motifs.

> These visualizations help you spot rhythmic patterns, song structure (verse–chorus), or repeated textures in environmental sounds.

### **Texture Descriptors**  
  - Treat a spectrogram as an image and extract local texture features (e.g. Local Binary Patterns, Gabor filters).  
  - **Local Binary Patterns (LBP):** Compare each pixel’s intensity to its neighbors to encode micro-texture “codes.”  
  - Capture characteristics of non-stationary textures (e.g. rainfall, applause, engine noise) that are not well described by pitch or harmony.  
  - Useful for environmental sound classification and timbral texture analysis.



# 🎼 Demo: Interactive Chroma Plot & Key/Chord Change Detection

In this demo, you’ll compute a **time–pitch representation** (the **chromagram**) of your audio and automatically detect **candidate key or chord changes** by finding peaks in an **onset “novelty” envelope**.

---

## ⚙️ Inputs  
*(Edit only in the `USER SETTINGS` block at the top of the cell)*

- **`FILENAME`**  
  Name of your audio file in the `sounds/` folder (`.wav` or `.mp3`)

- **`N_FFT`**  
  FFT window size for STFT (must be a power of two, e.g., `1024`, `2048`, `4096`)

- **`HOP_LENGTH`**  
  Hop size between STFT frames (samples)  
  *(typically `N_FFT // 4`)*

- **`DETECT_HOP`**  
  Hop size for onset-strength computation  
  *(usually same as `HOP_LENGTH`)*

---

## 🔍 What the Code Does

1. **Loads your audio clip** via `librosa.load`
2. Computes a **chromagram `C`** using `librosa.feature.chroma_stft`  
   → A 12-row matrix of **pitch-class intensities** over time
3. Derives an **onset-strength envelope `nov`** directly from the chromagram  
   → Highlights **rapid harmonic changes**
4. Detects **peak frames** in `nov` using `librosa.onset.onset_detect`  
   → These are **candidate key/chord transitions**
5. Converts those frame indices into **times (`change_times`)**
6. Plays back the **original audio**
7. Plots the **chromagram** as a **time–pitch heatmap**, overlaid with **vertical lines** at `change_times`
8. Prints the **numeric timestamps** of each detected change

---

## 📤 Outputs & Interpretation

### 🎧 Audio Player
- Listen to the original clip to get familiar with its **harmonic rhythm**

### 📈 Chroma Plot
- The **12 rows** correspond to the **12 pitch classes**:  
  `C, C♯, D, D♯, E, F, F♯, G, G♯, A, A♯, B`
- **Brighter colors** = stronger pitch energy at that moment

### 🟡 Change Markers
- **Yellow dashed vertical lines** = peaks in the chroma onset envelope  
  → These often indicate **key or chord changes**

---

## 🧠 How to Interpret

- Do the **vertical markers** align with **audible harmonic changes**?
- If a chord change is audible **but not marked**:
  - Try lowering `HOP_LENGTH` or `DETECT_HOP` (smaller = finer time resolution)
  - Or pass `backtrack=True` to `onset_detect` for better alignment

- If you see **too many detections**:
  - Try **smoothing** the novelty curve
  - Or **raise the threshold** via arguments to `onset_detect`

---

💡 *This approach uses chroma dynamics to infer harmonic shifts — useful for chord recognition, segmentation, or musical structure analysis.*


In [None]:
# ── USER SETTINGS ────────────────────────────────────────────────────────────────
FILENAME    = 'pianos-by-jtwayne-7-174717.mp3'  # ← your audio file in `sounds/` (WAV or MP3)
N_FFT       = 4096                             # ← FFT window size for chroma (power of 2)
HOP_LENGTH  = 512                              # ← hop length for chroma (samples)
DETECT_HOP  = HOP_LENGTH                       # ← hop length for onset detection (usually same)
# ────────────────────────────────────────────────────────────────────────────────

import numpy as np
import librosa
import librosa.display
import matplotlib.pyplot as plt
from IPython.display import Audio, display
from pathlib import Path

# ── CONFIG (don’t edit below here) ───────────────────────────────────────────────
SOUNDS_DIR = Path('sounds')
audio_path = SOUNDS_DIR / FILENAME

# 1) Load audio
y, sr = librosa.load(str(audio_path), sr=None)

# 2) Compute a STFT‐based chromagram
C = librosa.feature.chroma_stft(
    y=y,
    sr=sr,
    n_fft=N_FFT,
    hop_length=HOP_LENGTH,
    norm=2
)

# 3) Compute a “novelty” onset envelope from the chromagram
nov = librosa.onset.onset_strength(
    S=C,
    sr=sr,
    hop_length=DETECT_HOP
)

# 4) Detect frames where the chroma‐envelope has peaks → candidate key/chord changes
change_frames = librosa.onset.onset_detect(
    onset_envelope=nov,
    sr=sr,
    hop_length=DETECT_HOP,
    backtrack=False,
    units='frames'
)
change_times = librosa.frames_to_time(
    change_frames,
    sr=sr,
    hop_length=DETECT_HOP
)

# 5) Audio playback
print("▶️ Original Audio")
display(Audio(data=y, rate=sr, autoplay=False))

# 6) Plot chromagram and overlay change markers
plt.figure(figsize=(12, 4))
librosa.display.specshow(
    C,
    sr=sr,
    hop_length=HOP_LENGTH,
    x_axis='time',
    y_axis='chroma',
    cmap='coolwarm'
)
for t in change_times:
    plt.axvline(t, color='yellow', linestyle='--', linewidth=1.5, alpha=0.8)
plt.title('Chroma STFT + Detected Key/Chord Changes')
plt.xlabel('Time (s)')
plt.colorbar(label='Chroma intensity')
plt.tight_layout()
plt.show()

# 7) Print change time stamps for reference
print("Detected change times (s):")
print(np.round(change_times, 3))


## Demo: Spectrogram Texture & LBP Histograms

In this demo, you’ll compare two “texture” sounds (e.g. rain vs. applause) by extracting **Local Binary Pattern (LBP)** features from their spectrograms.

**What the code does:**
1. **Loads** two audio clips (`FILENAME1` and `FILENAME2`) from the `sounds/` folder.  
2. Computes their **STFT** magnitudes (`N_FFT`, `HOP_LENGTH`) and converts to dB.  
3. **Normalizes** each spectrogram to 8-bit range and computes **LBP codes** with parameters:
   - `LBP_P`: number of neighbor points  
   - `LBP_R`: radius (in pixels)  
   - `LBP_METHOD`: LBP variant (`'default'`, `'ror'`, `'uniform'`, etc.)  
4. Builds **normalized histograms** of the LBP codes for each texture.  
5. **Plays back** both original audio clips.  
6. **Plots** side-by-side:
   - Top row: log-frequency spectrograms of each clip  
   - Bottom row: corresponding LBP code histograms  

**Inputs (edit at top of code):**
- `FILENAME1`, `FILENAME2`: filenames in `sounds/` (WAV/MP3)  
- `N_FFT`: FFT window size (power of 2, e.g. 512, 1024, 2048)  
- `HOP_LENGTH`: hop size (≤ `N_FFT`, e.g. `N_FFT//4`)  
- `LBP_P`: integer ≥ 1, number of sampling points on the circle  
- `LBP_R`: integer ≥ 1, radius of the circle (in spectrogram “pixels”)  
- `LBP_METHOD`: one of `'default'`, `'ror'`, `'uniform'`, etc.

**Outputs to observe:**
- **Audio players**: listen and compare the two textures.  
- **Spectrograms**: visually inspect time–frequency energy distributions.  
- **LBP histograms**: see how local texture patterns differ statistically.

**How to interpret:**
- **Spectrogram** differences (e.g., sustained broadband vs. pulsed patterns) map to different **LBP histograms**.  
- A texture with more fine-grained, random fluctuations (e.g. rain) will exhibit a broader LBP distribution than a more rhythmic or tonal texture (e.g. applause).  
- Use these histograms as compact “texture fingerprints” for classification or visualization tasks.


In [None]:
# ── USER SETTINGS ────────────────────────────────────────────────────────────────
FILENAME1    = 'calming-rain-257596.mp3'       # ← place your first texture clip in `sounds/` (WAV/MP3)
FILENAME2    = 'applause-180037.mp3'   # ← place your second texture clip in `sounds/`
N_FFT        = 2048            # ← STFT window size (power of 2)
HOP_LENGTH   = 512             # ← hop length between frames
LBP_P        = 8               # ← number of circularly symmetric neighbor set points
LBP_R        = 1               # ← radius of circle (in pixels) for LBP
LBP_METHOD   = 'uniform'       # ← LBP method: 'default', 'ror', 'uniform', etc.
# ────────────────────────────────────────────────────────────────────────────────

import numpy as np
import matplotlib.pyplot as plt
import librosa
import librosa.display
from skimage.feature import local_binary_pattern
from IPython.display import Audio, display
from pathlib import Path

# ── CONFIG (don’t edit below here) ───────────────────────────────────────────────
SOUNDS_DIR = Path('sounds')

def load_audio(fname):
    path = SOUNDS_DIR / fname
    y, sr = librosa.load(str(path), sr=None)
    return y, sr

# 1) Load both texture clips
y1, sr1 = load_audio(FILENAME1)
y2, sr2 = load_audio(FILENAME2)

# 2) Compute log-magnitude spectrograms
D1 = np.abs(librosa.stft(y1, n_fft=N_FFT, hop_length=HOP_LENGTH))
S1 = librosa.amplitude_to_db(D1, ref=np.max)
D2 = np.abs(librosa.stft(y2, n_fft=N_FFT, hop_length=HOP_LENGTH))
S2 = librosa.amplitude_to_db(D2, ref=np.max)

# 3) Compute LBP codes over the spectrograms
#    Normalize to [0, 255] and cast to uint8 for LBP compatibility
img1 = ((S1 - S1.min()) / (S1.max() - S1.min()) * 255).astype(np.uint8)
img2 = ((S2 - S2.min()) / (S2.max() - S2.min()) * 255).astype(np.uint8)

lbp1 = local_binary_pattern(img1, P=LBP_P, R=LBP_R, method=LBP_METHOD)
lbp2 = local_binary_pattern(img2, P=LBP_P, R=LBP_R, method=LBP_METHOD)

# 4) Build histograms of LBP codes
n_bins = int(lbp1.max() + 1)
hist1, _ = np.histogram(lbp1.ravel(), bins=n_bins, range=(0, n_bins), density=True)
hist2, _ = np.histogram(lbp2.ravel(), bins=n_bins, range=(0, n_bins), density=True)
bins = np.arange(n_bins)

# 5) Playback the two texture sounds
print("▶️ Texture 1:", FILENAME1)
display(Audio(data=y1, rate=sr1, autoplay=False))
print("▶️ Texture 2:", FILENAME2)
display(Audio(data=y2, rate=sr2, autoplay=False))

# 6) Plot spectrograms and LBP histograms
fig, axes = plt.subplots(2, 2, figsize=(12, 8))

# Spectrogram of Texture 1
librosa.display.specshow(S1, sr=sr1, hop_length=HOP_LENGTH,
                         x_axis='time', y_axis='log', ax=axes[0,0])
axes[0,0].set_title(f'Spectrogram: {FILENAME1}')

# Spectrogram of Texture 2
librosa.display.specshow(S2, sr=sr2, hop_length=HOP_LENGTH,
                         x_axis='time', y_axis='log', ax=axes[0,1])
axes[0,1].set_title(f'Spectrogram: {FILENAME2}')

# LBP histogram for Texture 1
axes[1,0].bar(bins, hist1, width=0.8, color='C0', alpha=0.7)
axes[1,0].set_title('LBP Histogram: Texture 1')
axes[1,0].set_xlabel('LBP Code')
axes[1,0].set_ylabel('Normalized Frequency')

# LBP histogram for Texture 2
axes[1,1].bar(bins, hist2, width=0.8, color='C1', alpha=0.7)
axes[1,1].set_title('LBP Histogram: Texture 2')
axes[1,1].set_xlabel('LBP Code')

plt.tight_layout()
plt.show()


### 🛠 Exercise: Texture-Based SVM Classification

- **Task:**  
  Extract texture descriptors from a collection of environmental sounds. For example:  
  - **LBP histograms** computed on log-magnitude spectrogram patches  
  - **Recurrence-based features** such as spectral self-similarity statistics  

- **Analysis:**  
  1. Assemble a feature matrix where each row represents one audio clip and columns are your chosen texture features.  
  2. Normalize or standardize each feature.  
  3. Use `scikit-learn` to train an **SVM classifier** (e.g. `sklearn.svm.SVC`) to distinguish sound classes (rain, applause, traffic, etc.).  
  4. Evaluate with cross-validation to report classification accuracy.

- **Deliverables:**  
  - Python code for feature extraction and normalization  
  - SVM training script with cross-validation results (e.g. accuracy, precision/recall)  
  - A **confusion matrix** plot, and a few example audio clips that were misclassified (with their true vs. predicted labels)
