# EE519 ‚Äî Lecture 9 (Linear Prediction / LPC) ‚Äî Notebook 9.4
## Where LPC works/fails + optional mini ML demo

**Theme:** Compare vowel vs fricative vs silence; optionally train a tiny classifier.

---
### üß≠ In-class workflow
1. Read the short explanation above each code cell
2. Predict what you expect to see
3. Run
4. Save at least one key figure

### üßØ Debugging quick panel (‚ÄúIf you see X, do Y‚Äù)
- **Module import error** ‚Üí run the ‚ÄúEnvironment & imports‚Äù cell again; restart kernel if needed.
- **Audio playback is silent** ‚Üí re-record closer to mic; ensure waveform peak is not near zero.
- **`frame_selections` missing** ‚Üí go back to Notebook 9.0 and define time ranges / frames, then save to manifest.
- **LPC envelope looks too wiggly** ‚Üí reduce order `p` (try 10‚Äì16).
- **LPC envelope looks too flat** ‚Üí increase order `p` slightly or pick a steadier vowel region.
- **FFT vs LPC don‚Äôt ‚Äúoverlay‚Äù** ‚Üí use the provided ‚Äúnormalize-to-peak‚Äù plot (shape comparison) cell.


### üéØ Learning goals
- Run end-to-end without manual clip/frame prompts (assumes Notebook 9.0 selections saved)
- Save key plots to the project folder


## 0. Environment & imports (run this first)

This notebook uses:
- `numpy`, `matplotlib`
- `scipy` (signal + linalg)
- optional: `sounddevice` (recording)
- optional: `sklearn` (mini ML demo only)

If any import fails, the cell prints what to do next.


In [1]:
import numpy as np
import matplotlib.pyplot as plt

# Core scipy imports (required)
try:
    import scipy.signal as sig
    import scipy.linalg as la
    import scipy.io.wavfile as wavfile
    SCIPY_OK = True
    print("scipy imports: ‚úÖ")
except Exception as e:
    SCIPY_OK = False
    print("scipy imports: ‚ùå")
    print("Error:", e)

# Optional recording
try:
    import sounddevice as sd
    HAS_SD = True
    print("sounddevice: ‚úÖ (recording enabled)")
except Exception as e:
    HAS_SD = False
    print("sounddevice: ‚ùå (recording disabled)")

from pathlib import Path
import json, os, time
from IPython.display import Audio, display


scipy imports: ‚úÖ
sounddevice: ‚úÖ (recording enabled)


## 1. Project + manifest workflow (same spirit as Lectures 7/8)

We will use one project folder:
```
EE519_L9_Project/
  recordings/
  figures/
  features/
  cache/
  manifest.json
```

‚úÖ You can re-run this cell any time safely.


In [2]:
PROJECT_DIR = Path("EE519_L9_Project")
REC_DIR = PROJECT_DIR / "recordings"
FIG_DIR = PROJECT_DIR / "figures"
FEAT_DIR = PROJECT_DIR / "features"
CACHE_DIR = PROJECT_DIR / "cache"

for d in [PROJECT_DIR, REC_DIR, FIG_DIR, FEAT_DIR, CACHE_DIR]:
    d.mkdir(parents=True, exist_ok=True)

MANIFEST_PATH = PROJECT_DIR / "manifest.json"

def load_manifest():
    if MANIFEST_PATH.exists():
        return json.loads(MANIFEST_PATH.read_text())
    return {"clips": [], "meta": {"created": time.time(), "course":"EE519", "lecture":9}}

def save_manifest(m):
    MANIFEST_PATH.write_text(json.dumps(m, indent=2))

manifest = load_manifest()
print("Manifest clips:", len(manifest["clips"]))
print("Project dir:", PROJECT_DIR.resolve())


Manifest clips: 9
Project dir: C:\Users\K\Documents\usc\ee519\ee519-lecture\lecture10\EE519_L9_Project


## 2. Utilities (audio I/O, framing, STFT, saving figures)

These helpers are used throughout Lecture 9 notebooks.


In [3]:
def read_wav(path):
    fs, x = wavfile.read(path)
    x = x.astype(np.float32)
    if x.ndim > 1:
        x = x.mean(axis=1)
    if np.max(np.abs(x)) > 1.5:
        x = x / 32768.0
    return fs, x

def peak_normalize(x, target=0.95):
    m = np.max(np.abs(x)) + 1e-12
    return x * (target / m)

def play_audio(x, fs, label=""):
    print(label, f"(fs={fs}, length={len(x)/fs:.2f}s)")
    display(Audio(x, rate=fs))

def savefig(name):
    out = FIG_DIR / name
    plt.savefig(out, dpi=180, bbox_inches="tight")
    print("Saved:", out)

def hann(N):
    return np.hanning(N).astype(np.float32)

def frame_signal(x, N, H):
    if len(x) < N:
        raise ValueError("Signal shorter than frame length N.")
    num = 1 + (len(x) - N) // H
    frames = np.stack([x[i*H:i*H+N] for i in range(num)], axis=0)
    return frames

def stft_scipy(x, fs, win_ms=25, hop_ms=10, nfft=None, window="hann"):
    N = int(win_ms * 1e-3 * fs)
    H = int(hop_ms * 1e-3 * fs)
    if nfft is None:
        nfft = 1 << int(np.ceil(np.log2(N)))
    f, t, Z = sig.stft(x, fs=fs, window=window, nperseg=N, noverlap=N-H, nfft=nfft, boundary=None, padded=False)
    return f, t, Z, N, H

def plot_spectrogram(Z, fs, title, fmax=8000):
    S = 20*np.log10(np.abs(Z)+1e-12)
    plt.figure(figsize=(10,4))
    plt.imshow(S, origin="lower", aspect="auto",
               extent=[0, Z.shape[1], 0, fs/2])
    plt.ylim([0, fmax])
    plt.colorbar(label="dB")
    plt.title(title)
    plt.xlabel("Frame index")
    plt.ylabel("Frequency (Hz)")
    plt.show()


## LPC core functions (used in Notebooks 9.1‚Äì9.4)

### Important fix vs earlier versions
- `toeplitz` is in `scipy.linalg`, not `scipy.signal`.
- We therefore use `la.toeplitz` to avoid errors.

### Autocorrelation convention
We use a **biased** autocorrelation estimate:
\$
r[k] = \sum_{n=0}^{N-1-k} x[n]\,x[n+k]
\$

This is common in LPC autocorrelation method demonstrations.


In [4]:
def autocorr_biased(x, p):
    x = np.asarray(x, dtype=np.float64)
    r = np.zeros(p+1, dtype=np.float64)
    for k in range(p+1):
        r[k] = np.sum(x[:len(x)-k] * x[k:])
    return r

def lpc_autocorr_method(x, p):
    r = autocorr_biased(x, p)
    R = la.toeplitz(r[:-1])  # r[0..p-1]
    rhs = -r[1:]
    a = np.linalg.solve(R + 1e-12*np.eye(p), rhs)
    return a, r

def lpc_residual(x, a):
    A = np.concatenate([[1.0], a])
    e = sig.lfilter(A, [1.0], x)
    return e

def lpc_envelope_db(a, fs, nfft=4096):
    A = np.concatenate([[1.0], a])
    # Use freqz (stable, consistent)
    w, h = sig.freqz([1.0], A, worN=nfft, fs=fs)
    env_db = 20*np.log10(np.abs(h)+1e-12)
    return w, env_db

def fft_mag_db(x, fs, nfft=4096):
    X = np.fft.rfft(x, n=nfft)
    f = np.fft.rfftfreq(nfft, 1/fs)
    mag_db = 20*np.log10(np.abs(X)+1e-12)
    return f, mag_db

def normalize_to_peak(y_db):
    return y_db - np.max(y_db)


## Load a clip that already has `frame_selections`

‚úÖ If this errors, go back to **Notebook 9.0**, select time ranges, and save to manifest.


In [5]:
def pick_first_clip_with_selections(prefer_label="vowel"):
    m = load_manifest()
    # first try preferred label
    for i,c in enumerate(m["clips"]):
        if c.get("label")==prefer_label and "frame_selections" in c and len(c["frame_selections"].get("vowel_frames",[]))>0:
            return i,c,m
    # otherwise any with selections
    for i,c in enumerate(m["clips"]):
        if "frame_selections" in c:
            return i,c,m
    raise RuntimeError("No clip has frame_selections. Run Notebook 9.0 to select and save frames.")

CLIP_IDX, clip, manifest = pick_first_clip_with_selections("vowel")
print("Using clip:", CLIP_IDX, clip["filename"], "| label:", clip.get("label"))

fs, x = read_wav(REC_DIR / clip["filename"])
x = peak_normalize(x)

sel = clip["frame_selections"]
WIN_MS = sel.get("win_ms", 25)
HOP_MS = sel.get("hop_ms", 10)
N = int(WIN_MS*1e-3*fs)
H = int(HOP_MS*1e-3*fs)

frames = frame_signal(x, N, H) * hann(N)[None,:]
vowel_frames = sel.get("vowel_frames", [])
fric_frames = sel.get("fricative_frames", [])
sil_frames = sel.get("silence_frames", [])

print("Counts | vowel:", len(vowel_frames), "fric:", len(fric_frames), "sil:", len(sil_frames))


Using clip: 6 F01_fric_s.wav | label: fricative
Counts | vowel: 30 fric: 10 sil: 4


### üîß Practical audio conditioning (recommended)
Two small steps often make LPC envelopes cleaner:

1) **DC removal**: removes microphone offset  
2) **Pre-emphasis**: boosts high frequencies so the envelope is more balanced

You can toggle this on/off to see the effect.


In [6]:
USE_PREEMPH = True
PREEMPH_ALPHA = 0.97

if USE_PREEMPH:
    x = x - np.mean(x)
    x = sig.lfilter([1, -PREEMPH_ALPHA], [1], x)
    x = peak_normalize(x)
    print("Applied DC removal + pre-emphasis.")
else:
    print("Skipped pre-emphasis.")


Applied DC removal + pre-emphasis.


## Mini ML demo (optional): vowel vs fricative classification

We build a tiny dataset from your selected frames.


In [7]:
try:
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
    SKLEARN_OK = True
    print("sklearn: ‚úÖ")
except Exception as e:
    SKLEARN_OK = False
    print("sklearn: ‚ùå (skip this section)")
    print("Error:", e)


sklearn: ‚úÖ


In [8]:
if SKLEARN_OK:
    def lpc_feature(xf, p=12):
        a,_ = lpc_autocorr_method(xf, p)
        e = lpc_residual(xf, a)
        feat = np.concatenate([a, [np.log(np.mean(e**2)+1e-12)]])
        return feat

    X_feat = []
    y = []
    for idx in vowel_frames:
        X_feat.append(lpc_feature(frames[idx], p=12)); y.append(1)
    for idx in fric_frames:
        X_feat.append(lpc_feature(frames[idx], p=12)); y.append(0)

    X_feat = np.stack(X_feat, axis=0)
    y = np.array(y)
    print("Dataset:", X_feat.shape, "labels:", y.shape)

    X_train, X_test, y_train, y_test = train_test_split(X_feat, y, test_size=0.9, random_state=0, stratify=y)
    clf = LogisticRegression(max_iter=3000)
    clf.fit(X_train, y_train)
    pred = clf.predict(X_test)
    print("Accuracy:", accuracy_score(y_test, pred))
    print("Confusion matrix:\n", confusion_matrix(y_test, pred))
    print(classification_report(y_test, pred))


Dataset: (40, 13) labels: (40,)
Accuracy: 0.75
Confusion matrix:
 [[ 0  9]
 [ 0 27]]
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         9
           1       0.75      1.00      0.86        27

    accuracy                           0.75        36
   macro avg       0.38      0.50      0.43        36
weighted avg       0.56      0.75      0.64        36



  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


### üéß Listen to the residual (very important intuition)
- For a **vowel**, residual should sound like a buzzy excitation (pitch pulses).
- For a **fricative**, residual often resembles noise (because the model is mismatched).


---
## ‚úÖ What you learned (Notebook 9.4)
- You ran the LPC pipeline without fighting imports/toeplitz errors.
- You compared FFT vs LPC envelope using a **normalized-to-peak** plot (shape match).
- You saved figures into the project folder for later slides/reports.


---
## üß† Reflection (Notebook 9.4)

### What you learned
- When LPC works well (steady voiced segments) and when it fails (fricatives/silence).
- How the LPC residual and envelope behave across vowel vs fricative vs silence.
- How LPC-based features can support a tiny classifier (concept demo).

### Common mistakes to notice (and fix next time)
- Expecting an all-pole model to fit fricative spectra (noise-like + zeros).
- Training a classifier on too few samples and over-interpreting accuracy.
- Mixing labels because frame selections included multiple phonetic events.

### Reflective questions
1. Describe one clear visual difference between vowel and fricative LPC envelopes.
2. Why does an all-pole model struggle on fricatives (conceptually)?
3. If you wanted to classify more reliably, what data/feature improvements would you make?
4. What is one takeaway rule you will remember about using LPC in practice?

### Quick self-check
- [ ] I can state one scenario where LPC is appropriate and one where it is not.
- [ ] I can explain why tiny ML demos are useful but not definitive.


### Answers

1. Vowel LPC envelopes are smoother with peaks from the formants while fricatives are flatter.
2. Fricatives aren't periodic so they aren't as easily represented. This can cause an all-pole model to fail.
3. We could use different frames or other features such as cepstral. I will be using MFCC for my project.
4. Not to pick too high order but just enought to capture formants.