### Can DTW handle real‑time guitar alignment?
Yes—DTW (or one of its streaming variants) is still the work‑horse for real‑time audio‑to‑score / audio‑to‑audio synchronisation. Classic “offline” DTW is \(O(nm)\), but **online / windowed DTW** keeps only a sliding band of the cost matrix so both memory and computation stay linear in time, making it practical for live tracking.

---

## A practical recipe

| Stage | What to do | Tips for guitar |
|-------|------------|-----------------|
| **1. Pre‑compute reference features** | • Offline, convert the reference recording to a *beat‑synchronous* chroma or constant‑Q matrix (12 × T). <br>• Optionally zero‑mean & unit‑variance–scale each chroma bin. | Chroma is almost immune to timbre changes and partial tuning drift. Transpose the matrix to every key if you expect capo use. |
| **2. Live feature extraction** | • Stream the mic/DI input, frame length ≈ 2048 samples (≈ 46 ms at 44.1 kHz), hop ≈ 512 samples (12 ms). <br>• Compute chroma or HPCP on each frame. <br>• Low‑pass‑filter the chroma sequence (or median‑filter) to tame fret buzz. | GPU/Numba CQT helps keep latency below 10 ms. |
| **3. Incremental DTW update** | • Keep a *Sakoe–Chiba* band of width \(w\) frames (e.g., ±15 % of the diagonal). <br>• For each new live frame, append a new column in the local‑distance band and update accumulated costs only within the band. <br>• Discard columns left of the band to cap memory. | \(w\) controls the tolerated tempo drift: start with \(w≈40\) (≈ 0.5 s) and tune. |
| **4. Look‑ahead buffer** | Hold \(L\) frames (e.g., 0.4–0.8 s) before emitting the alignment. This gives DTW a little future context and avoids flip‑flopping. Total system latency = hop × L. |  |
| **5. Extract mis‑alignment signal** | • At each step, read the latest best‑path coordinate \((i,j)\) (reference frame, live frame). <br>• *Lag* \(= t_{\text{live}} - t_{\text{ref}}\). Positive lag ⇒ performer is late. <br>• *Tempo ratio* \(= \frac{\Delta j}{\Delta i}\) over the last few path steps. > 1 ⇒ slowing down. | Smooth lag/tempo with a short Kalman or exponential filter to reduce jitter. |
| **6. Act** | • Display a running lag meter, flash when \|lag\| > 100 ms. <br>• Or time‑stretch the reference audio to follow the player for accompaniment (common in MIR “score‑following”). |  |




---

## Tricks for “getting the most”

1. **Multi‑resolution fusion** First align beat‑synchronous chroma (very cheap) to lock the coarse position, then refine with frame‑level DTW in a ±1‑beat window.  
2. **Distance choice** Cosine on \(l_2\)-normalised chroma is popular; Soft‑DTW with learned step weights can add robustness at virtually no extra latency if you keep the band narrow.  
3. **Robustness to noise** Mix an onset‑strength envelope or log‑spectral flux into the local‑distance metric; sudden pick attacks dominate chroma similarity on guitars with heavy distortion.  
4. **Hardware** One CPU core easily handles chroma + banded DP for a single stream, but GPU or Apple Neural Engine frees you to drop hop size to 256 samples for < 6 ms algorithmic latency.  
5. **Datasets & evaluation** Try GuitarSet and the newer GAPS dataset for unit tests and hyper‑parameter sweeps.

---




### Alternatives when DTW struggles

| Technique | Pros | Cons |
|-----------|------|------|
| Hidden Markov Models / Viterbi | Naturally model tempo variation & skips | Needs training, slightly higher latency |
| Neural sequence‑to‑sequence aligners (Transformer “score followers”) | Learn complex timbral invariances | Heavy model; requires GPU |
| Beat‑synchronous correlation (simple CQT × CQT) | Latency ≤ 1 frame; trivial code | Breaks if tempo drifts > ±5 % |

**For most real‑time guitar tracking, windowed or online DTW with chroma features remains the sweet spot—transparent, easy to tune, and robust to moderate tempo fluctuations. Start with a 0.5‑second look‑ahead, a 15 % Sakoe–Chiba band, cosine‑distance chroma, and add smoothing; you’ll get reliable mis‑alignment estimates well within typical stage‑monitor tolerances.**