[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neosatrapahereje/music_alignment_tutorial/blob/main/Symbolic_Music_Alignment.ipynb)



# Symbolic Music Alignment

In [None]:
try:
    import google.colab

    IN_COLAB = True
except:
    IN_COLAB = False

if IN_COLAB:
    # Install partitura
    ! pip install partitura
    ! pip install fastdtw

    # To be able to access helper modules in the repo for this tutorial
    # (not necessary if the jupyter notebook is run locally instead of google colab)
    !git clone https://github.com/cpjku/vienna4x22.git
    !git clone https://github.com/neosatrapahereje/music_alignment_tutorial
    import sys
    sys.path.insert(0, "./music_alignment_tutorial/")

In [None]:
# Let's start by importing some stuff
import os 
import glob
import warnings

import numpy as np
import matplotlib.pyplot as plt
import partitura as pt

from alignment import fast_dynamic_time_warping, greedy_note_alignment

from typing import List

warnings.filterwarnings("ignore")
%config InlineBackend.figure_format ='retina'

if IN_COLAB:
    V4X22_DATASET_DIR = "./vienna4x22"
else:
    # Path to the Vienna 4x22 dataset
    from load_data import init_dataset

    V4X22_DATASET_DIR = init_dataset()

MUSICXML_DIR = os.path.join(V4X22_DATASET_DIR, "musicxml")
MIDI_DIR = os.path.join(V4X22_DATASET_DIR, "midi")
MATCH_DIR = os.path.join(V4X22_DATASET_DIR, "match")

## Feature Representations

To make musical data comparable for alignment algorithms, the first step is to extract features that capture relevant aspects while suppressing irrelevant details.

In this lecture we are going to focus on 2 common features representations:

1. Piano Rolls
2. Pitch Class Distributions

### Piano Rolls

A piano roll is a 2D representation of (MIDI) pitch and time. We can extract piano rolls from symbolic music files with Partitura!

In [None]:
# Let's load a score and a performance of the score

# Path to the MusicXML file
score_fn = os.path.join(MUSICXML_DIR, "Chopin_op10_no3.musicxml")
performance_fn = os.path.join(MIDI_DIR, "Chopin_op10_no3_p01.mid")

score = pt.load_score(score_fn)
performance = pt.load_performance(performance_fn)

In [None]:
# Compute piano roll
use_piano_range = False
score_pr = pt.utils.music.compute_pianoroll(
    note_info=score,
    piano_range=use_piano_range,
)

performance_pr = pt.utils.music.compute_pianoroll(
    note_info=performance,
    piano_range=use_piano_range,
)

In [None]:
%matplotlib inline
fig, axes = plt.subplots(2, figsize=(10, 7))
axes[0].imshow(
    score_pr.todense(),
    aspect="auto",
    origin="lower",
    cmap="gray",
    interpolation="nearest",
)
axes[1].imshow(
    performance_pr.todense(),
    aspect="auto",
    origin="lower",
    cmap="gray",
    interpolation="nearest",
)
y_label = "Piano key" if use_piano_range else "MIDI pitch"
axes[0].set_ylabel(y_label)
axes[1].set_ylabel(y_label)
axes[0].set_title("Score")
axes[1].set_title("Performance")
axes[1].set_xlabel("Time")
plt.show()

For more information, see the documentation of  [`compute_pianoroll`](https://partitura.readthedocs.io/en/latest/modules/partitura.utils.html#partitura.utils.compute_pianoroll).

### Pitch Class Distributions

These features are the symbolic equivalent to *chroma* features in audio. This representation is basically a piano roll that has been folded into a single octave.

In [None]:
score_pc_pr = pt.utils.music.compute_pitch_class_pianoroll(
    score,
    normalize=True,
    time_unit="beat",
    time_div=4,
)

Let's plot this feature and compare it to a piano roll of the same score!

In [None]:
score_pr = pt.utils.music.compute_pianoroll(
    note_info=score,
    time_unit="beat",
    time_div=4,
    piano_range=False,
)

fig, axes = plt.subplots(2, figsize=(10, 5), sharex=True)

axes[0].imshow(
    score_pc_pr,
    aspect="auto",
    origin="lower",
    cmap="gray",
    interpolation="nearest",
)
axes[0].set_title("Pitch Class Distribution")
axes[0].set_ylabel("Pitch classes")
axes[1].imshow(
    score_pr.todense(),
    aspect="auto",
    origin="lower",
    cmap="gray",
    interpolation="nearest",
)
axes[1].set_title("Piano roll")
axes[1].set_ylabel("MIDI pitch")

plt.show()

## Alignment Methods

We move now to methods for computing the alignment between features from one version of a piece of music to another. Common methods are dynamic programming approaches like dynamic time warping (DTW) and probabilistic approaches like hidden Markov models.

### Alignments with Dynamic Time Warping.

* DTW is a dynamic programming algorithm to find the **optimal** alignment between to time-dependent sequences. 
* Unlike Euclidean distance, which requires point-to-point correspondence between two sequences, DTW allows for elastic transformations of the time axis, enabling it to find an optimal match between two sequences that may vary in time or speed.
* The DTW algorithm finds the alignment between two sequence in three steps:

    1. Compute the pairwise distance between elements in sequence $\mathbf{X}$ and $\mathbf{Y}$.
    2. Compute the accumulated cost matrix $\mathbf{D}$. The element $D_{ij}$ represents the "cost" or "effort" required for $x_i$ and $y_j$ to be aligned.
    3. Find the best alignment by backtracking 

We will explore these steps with a simple example. 

In [None]:
from slideshow_helper import dtw_example

dtw_example(interactive=True)

## Music Alignment with DTW

1. Compute features from score and the performance
2. Compute the alignment between the sequences of features using DTW
3. Use a greedy note alignment to estimate the note-wise alignment

Let's compare alignment using piano rolls and pitch class distributions

In [None]:
# This file contains the ground truth alignment
gt_alignment_fn = os.path.join(MATCH_DIR, "Chopin_op10_no3_p01.match")

# Load the alignment and the performance
performance, gt_alignment = pt.load_match(
    gt_alignment_fn, pedal_threshold=127, first_note_at_zero=True
)
pnote_array = performance.note_array()

# Load the score
score_fn = os.path.join(MUSICXML_DIR, "Chopin_op10_no3.musicxml")
score = pt.load_score(score_fn)
snote_array = score.note_array()

Compute alignment using pitch class distributions as features.

In [None]:
# Compute the features
score_pcr, sidx = pt.utils.music.compute_pitch_class_pianoroll(
    note_info=score,
    time_unit="beat",
    time_div=8,
    return_idxs=True,
    binary=True,
    note_separation=True,
)

performance_pcr, pidx = pt.utils.music.compute_pitch_class_pianoroll(
    note_info=performance,
    time_unit="sec",
    time_div=8,
    return_idxs=True,
    binary=True,
    note_separation=True,
)

reference_features = score_pcr.T
performance_features = performance_pcr.T

In [None]:
# DTW
dtw_pcr_warping_path = fast_dynamic_time_warping(
    X=reference_features,
    Y=performance_features,
    metric="cityblock",
)

dtw_pcr_alignment = greedy_note_alignment(
    warping_path=dtw_pcr_warping_path,
    idx1=sidx,
    note_array1=snote_array,
    idx2=pidx,
    note_array2=pnote_array,
)

And now we compute the alignments using piano rolls.

In [None]:
# Compute the features
score_pr, sidx = pt.utils.music.compute_pianoroll(
    note_info=score,
    time_unit="beat",
    time_div=8,
    return_idxs=True,
    piano_range=True,
    binary=True,
    note_separation=True,
)

performance_pr, pidx = pt.utils.music.compute_pianoroll(
    note_info=performance,
    time_unit="sec",
    time_div=8,
    return_idxs=True,
    piano_range=True,
    binary=True,
    note_separation=True,
)

reference_features = score_pr.toarray().T
performance_features = performance_pr.toarray().T

# DTW
dtw_pr_warping_path = fast_dynamic_time_warping(
    X=reference_features,
    Y=performance_features,
    metric="cityblock",
)

dtw_pr_alignment = greedy_note_alignment(
    warping_path=dtw_pr_warping_path,
    idx1=sidx,
    note_array1=snote_array,
    idx2=pidx,
    note_array2=pnote_array,
)

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
axes[0].plot(
    dtw_pr_warping_path[:, 0],
    dtw_pr_warping_path[:, 1],
    label="DTW (piano roll)",
)
axes[0].plot(
    dtw_pcr_warping_path[:, 0],
    dtw_pcr_warping_path[:, 1],
    label="DTW (pitch class)",
)
axes[1].plot(
    dtw_pr_warping_path[:, 0],
    dtw_pr_warping_path[:, 1],
    label="DTW (piano roll)",
)
axes[1].plot(
    dtw_pcr_warping_path[:, 0],
    dtw_pcr_warping_path[:, 1],
    label="DTW (pitch class)",
)
axes[0].set_xlabel("Index in score")
axes[1].set_xlabel("Index in score")
axes[0].set_ylabel("Index in performance")
axes[1].set_xlim((200, 300))
axes[1].set_ylim((450, 550))
plt.legend()
plt.show()

We can compare the performance of the alignments:

In [None]:
from helper import evaluate_alignment_notewise

print(f"Method\tF-score\tPrecision\tRecall")

methods = [
    (dtw_pr_alignment, "DTW (piano roll)"),
    (dtw_pcr_alignment, "DTW (pitch class)"),
]

for align, method in methods:
    precision, recall, fscore = evaluate_alignment_notewise(
        prediction=align,
        ground_truth=gt_alignment,
    )
    print(f"{method}\t{fscore:.4f}\t{precision:.4f}\t{recall:.4f}")

## Alignment Applications: Comparing Expressive Performances

In this example, we are going to compare tempo curves of different performances of the same piece.

In [None]:
from helper import compute_tempo_curve

# get all match files
piece = "Chopin_op10_no3"
matchfiles = glob.glob(os.path.join(MATCH_DIR, f"{piece}_p*.match"))
matchfiles.sort()

# Load the score
score_fn = os.path.join(MUSICXML_DIR, f"{piece}.musicxml")
score = pt.load_score(score_fn)
snote_array = score.note_array()

tempo_curves = []
for i, matchfile in enumerate(matchfiles):
    # load alignment
    perf, alignment = pt.load_match(matchfile)
    # Compute tempo curves
    tempo_curve = compute_tempo_curve(
        perf=perf,
        score=snote_array,
        alignment=alignment,
    )
    tempo_curves.append(tempo_curve)

In [None]:
fig, ax = plt.subplots(1, figsize=(15, 8))
color = plt.cm.rainbow(np.linspace(0, 1, len(tempo_curves)))
for i, tempo_info in enumerate(tempo_curves):
    score_time = tempo_info[:, 0]
    tempo_curve = tempo_info[:, 1]
    ax.plot(
        score_time,
        tempo_curve,
        label=f"pianist {i + 1:02d}",
        alpha=0.4,
        c=color[i],
    )

# plot average performance
ax.plot(
    score_time,
    np.mean([tc[:, 1] for tc in tempo_curves], axis=0),
    label="average",
    c="black",
    linewidth=2,
)

# get starting time of each measure in the score
measure_times = score[0].beat_map(
    [measure.start.t for measure in score[0].iter_all(pt.score.Measure)]
)
# do not include pickup measure
measure_times = measure_times[measure_times >= 0]
ax.set_title(piece)
ax.set_xlabel("Score time (beats)")
ax.set_ylabel("Tempo (bpm)")
ax.set_xticks(measure_times)
plt.legend(frameon=False, bbox_to_anchor=(1.15, 0.9))
plt.grid(axis="x")
plt.show()