# Demo: Using The Sync Toolbox for an Experiment                                                      on High-Resolution Music Alignment


Music synchronization aims to automatically align multiple music representations such as audio recordings, MIDI files, and sheet music. For this task, we have recently published the **Sync Toolbox**[1], an open-source Python package for efficient, robust, and accurate music synchronization. This work combines spectral flux used as onset features with conventional chroma features to increase the alignment accuracy. We conduct some experiments within the **Sync Toolbox** framework to show that our approach preserves the accuracy compared with another high-resolution approach while being computationally simpler. 

In [None]:
# Loading some modules and defining some constants used later
import IPython.display as ipd
import matplotlib.pyplot as plt
import librosa.display
import numpy as np
import os
import pandas as pd
import scipy.interpolate

from synctoolbox.dtw.mrmsdtw import sync_via_mrmsdtw
from synctoolbox.dtw.utils import compute_optimal_chroma_shift, shift_chroma_vectors, make_path_strictly_monotonic
from synctoolbox.feature.chroma import pitch_to_chroma, quantize_chroma, quantized_chroma_to_CENS
from synctoolbox.feature.dlnco import pitch_onset_features_to_DLNCO
from synctoolbox.feature.novelty import spectral_flux
from synctoolbox.feature.pitch import audio_to_pitch_features
from synctoolbox.feature.pitch_onset import audio_to_pitch_onset_features
from synctoolbox.feature.utils import estimate_tuning
%matplotlib inline

Fs = 22050
FEATURE_RATE = 50
STEP_WEIGHTS = np.array([1.5, 1.5, 2.0])
THRESHOLD_REC = 10 ** 6
FIG_SIZE = (9, 3)
GAMMA = 10.0
AUDIO_DIR = 'winterreise/01_RawData/audio_wav/'
MEASURE_ANN_DIR = 'winterreise/02_Annotations/ann_audio_measure/'
FILENAME_PREFIX = 'Schubert_D911-'
FIGURE_DIR = 'figs'

# Create a directory for the figures, if not exists
if not os.path.exists(FIGURE_DIR):
    os.makedirs(FIGURE_DIR)

### Download the latest version of the Schubert Winterreise Dataset (SWD)
**Schubert Winterreise Dataset (SWD)**[2] comprises several representations of the song cycle *Winterreise* D911 (Op. 89), which consists of 24 songs composed for solo voice with piano accompaniment. For our experiments, we focus on the  music recordings by the baritones Gerhard Hüsch and Randall Scarlata and the corresponding measure annotations. These two versions are publicly available, which allows reproducing all our experiments based on open-source code and open-source data.


In [None]:
!apt-get install unzip wget
!wget "https://zenodo.org/record/5139893/files/Schubert_Winterreise_Dataset_v2-0.zip?download=1" -O winterreise.zip
!unzip winterreise -d winterreise
!rm -r winterreise.zip

In [None]:
def get_chroma_features_from_audio(audio,
                                   tuning_offset,
                                   Fs=Fs,
                                   feature_rate=FEATURE_RATE,
                                   verbose=False):
    
    f_pitch = audio_to_pitch_features(f_audio=audio, 
                                      Fs=Fs, 
                                      tuning_offset=tuning_offset, 
                                      feature_rate=feature_rate, 
                                      verbose=verbose)
    f_chroma = pitch_to_chroma(f_pitch=f_pitch)
    f_chroma_quantized = quantize_chroma(f_chroma=f_chroma)
    
    return f_chroma_quantized

def get_spectral_flux_from_audio(audio,
                                 feature_sequence_length,
                                 gamma=GAMMA, # log compression param
                                 Fs=Fs,
                                 feature_rate=FEATURE_RATE):
    f_novelty = spectral_flux(audio, Fs=Fs, feature_rate=feature_rate, gamma=gamma)
        
            
    if f_novelty.size < feature_sequence_length:
        # The feature sequence length of the chroma features are not same as the novelty curve 
        # due to the padding while the computation of STFT for chroma features and
        # the differentiation in spectral flux
        diff = feature_sequence_length - f_novelty.size
        pad = int(diff / 2)
        f_novelty = np.concatenate((np.zeros(pad), f_novelty, np.zeros(pad)))
       
    return f_novelty.reshape(1, -1)


def get_DLNCO_features_from_audio(audio,
                                  tuning_offset,
                                  feature_sequence_length,
                                  Fs=Fs,
                                  feature_rate=FEATURE_RATE,
                                  verbose=False):
    f_pitch_onset = audio_to_pitch_onset_features(f_audio=audio, 
                                                  Fs=Fs, 
                                                  tuning_offset=tuning_offset, 
                                                  verbose=verbose)
    
    f_DLNCO = pitch_onset_features_to_DLNCO(f_peaks=f_pitch_onset, 
                                            feature_rate=feature_rate, 
                                            feature_sequence_length=feature_sequence_length, 
                                            visualize=verbose)
    
    return f_DLNCO

## Alignment with MrMsDTW

We now perform alignment using MrMsDTW[3] using three settings as input features:
* Chroma
* Chroma & DLNCO [4]
* Chroma & Spectral Flux [5]

In [None]:
wp_dict = dict()

for song_id in range(1, 25):
    if song_id < 10:
        song_id = '0' + str(song_id)
    else:
        song_id = str(song_id)
        
    filename_1 = FILENAME_PREFIX + song_id + '_HU33'
    filename_2 = FILENAME_PREFIX + song_id + '_SC06'
    
    print(f"\nRunning for the Song ID {song_id} in SWD.")
    
    # read audio
    audio_1, _ = librosa.load(os.path.join(AUDIO_DIR, filename_1 + '.wav'), sr=Fs)
    audio_2, _ = librosa.load(os.path.join(AUDIO_DIR, filename_2 + '.wav'), sr=Fs)
    
    # estimate tuning
    tuning_offset_1 = estimate_tuning(audio_1, Fs)
    tuning_offset_2 = estimate_tuning(audio_2, Fs)

    # generate chroma features
    f_chroma_quantized_1 = get_chroma_features_from_audio(audio=audio_1,
                                                          tuning_offset=tuning_offset_1)
    
    f_chroma_quantized_2 = get_chroma_features_from_audio(audio=audio_2,
                                                          tuning_offset=tuning_offset_2)
    
    # generate novelty features (i.e. spectral flux)
    f_sf_1 = get_spectral_flux_from_audio(audio=audio_1, 
                                          feature_sequence_length=f_chroma_quantized_1.shape[1])
    
    f_sf_2 = get_spectral_flux_from_audio(audio=audio_2, 
                                          feature_sequence_length=f_chroma_quantized_2.shape[1])
    
 
    # generate DLNCO features
    f_DLNCO_1 = get_DLNCO_features_from_audio(audio=audio_1, 
                                              tuning_offset=tuning_offset_1,
                                              feature_sequence_length=f_chroma_quantized_1.shape[1])
    
    f_DLNCO_2 = get_DLNCO_features_from_audio(audio=audio_2, 
                                              tuning_offset=tuning_offset_2,
                                              feature_sequence_length=f_chroma_quantized_2.shape[1])
    
    # compute the optimal chroma shift and shift the chroma-based features of the second recording
    opt_chroma_shift = compute_optimal_chroma_shift(quantized_chroma_to_CENS(f_chroma_quantized_1, 
                                                                             201, 50, 
                                                                             FEATURE_RATE)[0], 
                                                    quantized_chroma_to_CENS(f_chroma_quantized_2, 
                                                                             201, 50, 
                                                                             FEATURE_RATE)[0])
    
    f_chroma_quantized_2 = shift_chroma_vectors(f_chroma_quantized_2, opt_chroma_shift)
    f_DLNCO_2 = shift_chroma_vectors(f_DLNCO_2, opt_chroma_shift)

    
    # run MrMsDTW for chroma
    wp_chroma = sync_via_mrmsdtw(f_chroma1=f_chroma_quantized_1, 
                                 f_chroma2=f_chroma_quantized_2, 
                                 input_feature_rate=FEATURE_RATE, 
                                 step_weights=STEP_WEIGHTS, 
                                 threshold_rec=THRESHOLD_REC, 
                                 verbose=False)
    
    
    # run MrMsDTW for chroma & DLNCO
    wp_chroma_dlnco = sync_via_mrmsdtw(f_chroma1=f_chroma_quantized_1, 
                                       f_onset1=f_DLNCO_1, 
                                       f_chroma2=f_chroma_quantized_2, 
                                       f_onset2=f_DLNCO_2, 
                                       input_feature_rate=FEATURE_RATE, 
                                       step_weights=STEP_WEIGHTS, 
                                       threshold_rec=THRESHOLD_REC, 
                                       verbose=False)
    
    
    # run MrMsDTW for chroma & spectral flux
    wp_chroma_sf = sync_via_mrmsdtw(f_chroma1=f_chroma_quantized_1, 
                                    f_onset1=f_sf_1, 
                                    f_chroma2=f_chroma_quantized_2, 
                                    f_onset2=f_sf_2, 
                                    input_feature_rate=FEATURE_RATE, 
                                    step_weights=STEP_WEIGHTS, 
                                    threshold_rec=THRESHOLD_REC, 
                                    verbose=False)
        
        
    wp_dict[song_id] = dict()
    wp_dict[song_id]['wp_chroma'] = wp_chroma
    wp_dict[song_id]['wp_chroma_dlnco'] = wp_chroma_dlnco
    wp_dict[song_id]['wp_chroma_sf'] = wp_chroma_sf

## Evaluation

For the evaluation, we utilize the pairwise alignment error $\epsilon_{P}$ from [6]. Given two versions of the same music piece with the time-continuous axes $[0, T_{1}]$ and $[0, T_{2}]$, the monotonous alignment can be modeled as a function
	
$$\mathcal{A}: [0, T_{1}] \rightarrow [0, T_{2}].$$
	
The pairwise alignment error $\epsilon_{P}$ for a given alignment of two recording is specified as the mean over the values

$$\epsilon_{P}(g_{1}):=|\mathcal{A}(g_{1}) - g_{2}|,$$

where $(g_{1}, g_{2}) \in [0, T_{1}] \times [0, T_{2}]$ indicates the ground-truth pairs of measure annotations.

In [None]:
def get_stats(wp, 
              measure_ann_filepath_1,
              measure_ann_filepath_2,
              feature_rate=FEATURE_RATE,
              tolerances=[30, 50, 100, 150, 200, 300, 500, 1000]): # tolerances in milliseconds
    wp = make_path_strictly_monotonic(wp)
    
    measure_ann_1 = pd.read_csv(filepath_or_buffer=measure_ann_filepath_1, delimiter=';')['start']
    measure_ann_2 = pd.read_csv(filepath_or_buffer=measure_ann_filepath_2, delimiter=';')['start']

    measure_positions_1_transferred_to_2 = scipy.interpolate.interp1d(wp[0] / feature_rate, 
                                                                      wp[1] / feature_rate, 
                                                                      kind='linear')(measure_ann_1)

    absolute_errors_at_measures = np.abs(measure_positions_1_transferred_to_2 - measure_ann_2)
    
    misalignments = np.zeros(len(tolerances))
    
    for idx, tolerance in enumerate(tolerances):  # in milliseconds
        misalignments[idx] = np.mean((absolute_errors_at_measures>tolerance/1000.0))
        
    mean = np.mean(absolute_errors_at_measures) * 1000.0
    std = np.std(absolute_errors_at_measures) * 1000.0
    
    return mean, std, np.array(misalignments), absolute_errors_at_measures

In [None]:
stats_dict = dict()
TOLERANCES = [30, 50, 100, 150, 200, 300, 500, 1000]

for song_id in wp_dict:
    wp_chroma_dlnco = wp_dict[song_id]['wp_chroma_dlnco']
    wp_chroma_sf = wp_dict[song_id]['wp_chroma_sf']
    wp_chroma = wp_dict[song_id]['wp_chroma']

    filename_1 = FILENAME_PREFIX + song_id + '_HU33'
    filename_2 = FILENAME_PREFIX + song_id + '_SC06'
    
    stats_dict[song_id] = dict()
    
    mean, std, misalignments, err = get_stats(wp = wp_chroma, 
                                              measure_ann_filepath_1 = os.path.join(MEASURE_ANN_DIR, filename_1 + '.csv'),
                                              measure_ann_filepath_2 = os.path.join(MEASURE_ANN_DIR, filename_2 + '.csv'),
                                              tolerances = TOLERANCES)
    
    
    stats_dict[song_id]['chroma'] = dict()
    stats_dict[song_id]['chroma']['mean'] = mean
    stats_dict[song_id]['chroma']['std'] = std
    stats_dict[song_id]['chroma']['misalignments'] = misalignments
    stats_dict[song_id]['chroma']['absolute_errors'] = err
    

    mean, std, misalignments, err = get_stats(wp=wp_chroma_dlnco, 
                                              measure_ann_filepath_1=os.path.join(MEASURE_ANN_DIR, filename_1 + '.csv'),
                                              measure_ann_filepath_2=os.path.join(MEASURE_ANN_DIR, filename_2 + '.csv'),
                                              tolerances=TOLERANCES)
    
    stats_dict[song_id]['chroma_dlnco'] = dict()
    stats_dict[song_id]['chroma_dlnco']['mean'] = mean
    stats_dict[song_id]['chroma_dlnco']['std'] = std
    stats_dict[song_id]['chroma_dlnco']['misalignments'] = misalignments
    stats_dict[song_id]['chroma_dlnco']['absolute_errors'] = err
    
    mean, std, misalignments, err= get_stats(wp=wp_chroma_sf, 
                                             measure_ann_filepath_1=os.path.join(MEASURE_ANN_DIR, filename_1 + '.csv'),
                                             measure_ann_filepath_2=os.path.join(MEASURE_ANN_DIR, filename_2 + '.csv'),
                                             tolerances=TOLERANCES)
    
    stats_dict[song_id]['chroma_sf'] = dict()
    stats_dict[song_id]['chroma_sf']['mean'] = mean
    stats_dict[song_id]['chroma_sf']['std'] = std
    stats_dict[song_id]['chroma_sf']['misalignments'] = misalignments
    stats_dict[song_id]['chroma_sf']['absolute_errors'] = err

## Table of Misalignment Rates per Song in the SWD Dataset
In addition to the pair-wise alignment error, one may also consider the misalignment rate from [6], which identifies the percentage of measure positions in an alignment with an error above a given threshold $\tau$.

In [None]:
rows = pd.MultiIndex.from_product([stats_dict.keys()],
                                   names=['Song ID'])
columns = pd.MultiIndex.from_product([['Chroma', 
                                       'Chroma & DLNCO', 
                                       'Chroma & Spectral Flux'], TOLERANCES],
                                     names=['Feature Type', '$\u03C4$ (ms)'])
data = np.zeros((len(stats_dict), len(misalignments) * 3))
for row_idx, song_id in enumerate(stats_dict):
    data[row_idx, :len(misalignments)] = stats_dict[song_id]['chroma']['misalignments'] * 100
    data[row_idx, len(misalignments):2*len(misalignments)] = stats_dict[song_id]['chroma_dlnco']['misalignments'] * 100
    data[row_idx, 2*len(misalignments):3*len(misalignments)] = stats_dict[song_id]['chroma_sf']['misalignments'] * 100

df = pd.DataFrame(data, index=rows, columns=columns)
with pd.option_context('display.float_format', '{:0.2f}'.format):
    ipd.display(df)

In [None]:
chroma_means = np.zeros(len(stats_dict))
chroma_std = np.zeros(len(stats_dict))

chroma_dlnco_means = np.zeros(len(stats_dict))
chroma_dlnco_std = np.zeros(len(stats_dict))

chroma_sf_means = np.zeros(len(stats_dict))
chroma_sf_std = np.zeros(len(stats_dict))

for idx, song_id in enumerate(stats_dict):
    chroma_means[idx] = stats_dict[song_id]['chroma']['mean']
    chroma_std[idx] = stats_dict[song_id]['chroma']['std']
    chroma_dlnco_means[idx] = stats_dict[song_id]['chroma_dlnco']['mean']
    chroma_dlnco_std[idx] = stats_dict[song_id]['chroma_dlnco']['std']
    chroma_sf_means[idx] = stats_dict[song_id]['chroma_sf']['mean']
    chroma_sf_std[idx] = stats_dict[song_id]['chroma_sf']['std']

In [None]:
SONG_LABELS=['01. Gute Nacht', '02. Die Wetterfahne', '03. Gefrorne Tränen', '04. Erstarrung',
             '05. Der Lindenbaum', '06. Wasserflut', '07. Auf dem Flusse', '08. Rückblick',
             '09. Irrlicht', '10. Rast', '11. Frühlingstraum', '12. Einsamkeit',
             '13. Die Post', '14. Der greise Kopf', '15. Die Krähe', '16. Letzte Hoffnung',
             '17. Im Dorfe', '18. Der stürmische Morgen', '19. Täuschung', '20. Der Wegweiser',
             '21. Das Wirtshaus', '22. Muth', '23. Die Nebensonnen', '24. Der Leiermann']

### Errorbar Plot for Misalignment Rates per Feature, given a Threshold $\tau$

In [None]:
plt.figure(figsize=(12,4))

x_axis = (np.array(TOLERANCES) / 1000).astype('str')

plt.errorbar(x=x_axis,
             y=df['Chroma'].mean(), 
             yerr=df['Chroma'].std(),
             marker='x', 
             markersize=7,
             alpha=0.75,
             fmt='gray',
             linestyle=':',
             capsize=3,
             elinewidth=2,
             label='Chroma');

plt.errorbar(x=x_axis,
             y=df['Chroma & DLNCO'].mean(), 
             yerr=df['Chroma & DLNCO'].std(),
             marker='^', 
             markersize=7,
             fmt='red',
             linestyle='--',
             capsize=3,
             elinewidth=2,
             label='Chroma & DLNCO');

plt.errorbar(x=x_axis,
             y=df['Chroma & Spectral Flux'].mean(), 
             yerr=df['Chroma & Spectral Flux'].std(),
             marker='o', 
             linestyle='-',
             alpha=0.6,
             markersize=7,
             elinewidth=2,
             capsize=3,
             label='Chroma & Spectral Flux' )

plt.xticks(x_axis);
plt.ylabel('Misalignment rate (%)', fontsize=14)
plt.xlabel('Threshold $\u03C4$ (seconds)', fontsize=14)
plt.legend(fontsize=13, loc='upper right')
plt.grid(linestyle=':');
plt.savefig(f'{FIGURE_DIR}/misalignment_rates_errorbar.pdf', dpi=300)

In [None]:
plt.figure(figsize=(12,4))

x_axis = (np.array(TOLERANCES) / 1000).astype('str')
x_axis_arr = np.arange(len(TOLERANCES))
BAR_WIDTH = 0.275

plt.bar(x=x_axis_arr,
        height=df['Chroma'].mean(), 
        yerr=df['Chroma'].std(),
        width=BAR_WIDTH,
        color='gray',
        label='Chroma',
        alpha=0.75);
        
plt.bar(x=x_axis_arr + BAR_WIDTH,
        height=df['Chroma & DLNCO'].mean(), 
        yerr=df['Chroma & DLNCO'].std(),
        width=BAR_WIDTH,
        alpha=0.75,
        color='red',
        label='Chroma & DLNCO')


        
plt.bar(x=x_axis_arr + 2 * BAR_WIDTH,
        height=df['Chroma & Spectral Flux'].mean(), 
        yerr=df['Chroma & Spectral Flux'].std(),
        width=BAR_WIDTH,
        alpha=0.75,
        color='#1f77b4',
        label='Chroma & Spectral Flux')


plt.xticks([r + BAR_WIDTH for r in np.arange(len(TOLERANCES))], x_axis)

plt.ylabel('Misalignment rate (%)', fontsize=14)
plt.xlabel('Threshold $\u03C4$ (seconds)', fontsize=14)
plt.legend(fontsize=13, loc='upper right')
plt.grid(linestyle=':');
plt.savefig(f'{FIGURE_DIR}/misalignment_rates_bar.pdf', dpi=300)

### Errorbar Plot for Mean Alignment Error per Song

In [None]:
plt.figure(figsize=(12,4))


plt.errorbar(x=np.arange(1, len(stats_dict)+1) - 0.15,
             y=chroma_means / 1000, 
             yerr=chroma_std / 1000,
             marker='x', 
             markersize=6,
             fmt='gray',
             alpha=0.75,
             linestyle='',
             capsize=3,
             elinewidth=2,
             label='Chroma');

plt.errorbar(x=np.arange(1, len(stats_dict)+1),
             y=chroma_dlnco_means / 1000, 
             yerr=chroma_dlnco_std / 1000,
             marker='^', 
             markersize=6,
             fmt='red',
             linestyle='',
             capsize=3,
             elinewidth=2,
             label='Chroma & DLNCO');

plt.errorbar(x=np.arange(1, len(stats_dict)+1) + 0.15,
             y=chroma_sf_means / 1000, 
             yerr=chroma_sf_std / 1000,
             marker='o', 
             linestyle='',
             markersize=6,
             elinewidth=2,
             capsize=3,
             label='Chroma & Spectral Flux' )

plt.legend(fontsize=13, loc='upper right')
plt.grid(linestyle=':');
plt.ylabel('Mean Alignment Error (sec)', fontsize=14)
plt.xlabel('Song Number in SWD', fontsize=14)

SHOW_SONG_LABELS = False

if SHOW_SONG_LABELS:
    labels = SONG_LABELS
    rotation = 90
else:
    labels = None
    rotation = None
    
plt.xticks(np.arange(1, 25, 1), 
           labels=labels, 
           rotation=rotation);
plt.savefig(f'{FIGURE_DIR}/mean_error_errorbar.pdf', dpi=300)

### Boxplot for Mean Alignment Error per Song

In [None]:
plt.figure(figsize=(12, 4))
chroma_errors = [stats_dict[song_id]['chroma']['absolute_errors'] for song_id in stats_dict]#
chroma_sf_errors = [stats_dict[song_id]['chroma_sf']['absolute_errors'] for song_id in stats_dict]#
chroma_dlnco_errors = [stats_dict[song_id]['chroma_dlnco']['absolute_errors'] for song_id in stats_dict]#

SHOW_OUTLIERS = True
SHOW_MEANS = True


if SHOW_OUTLIERS:
    sym = '+'
else:
    sym = ''

c = 'gray'
bp1 = plt.boxplot(chroma_errors, 
                  widths = 0.20,
                  sym=sym,
                  boxprops=dict(color=c, alpha=0.75),
                  capprops=dict(color=c),
                  whiskerprops=dict(color=c),
                  medianprops=dict(color=c),
                  positions=np.arange(0, 24) - 0.25,
                  showmeans=SHOW_MEANS,
                  meanprops={"marker":"o",
                             "markerfacecolor":"white", 
                             "markeredgecolor":"gray",
                             "markersize":"5"},
                  flierprops={"markersize":"4",
                             "markerfacecolor":"white",
                             "markeredgecolor":"gray"});

c = 'red'
bp2 = plt.boxplot(chroma_dlnco_errors, 
                  widths = 0.20,
                  sym=sym,
                  boxprops=dict(color=c),
                  capprops=dict(color=c),
                  whiskerprops=dict(color=c),
                  medianprops=dict(color=c),
                  positions=np.arange(0, 24),
                  showmeans=SHOW_MEANS,
                  meanprops={"marker":"o",
                             "markerfacecolor":"white", 
                             "markeredgecolor":"red",
                             "markersize":"5"},
                  flierprops={"markersize":"4",
                             "markerfacecolor":"white",
                             "markeredgecolor":"red"});

c = '#1f77b4'
bp3 = plt.boxplot(chroma_sf_errors, 
                  widths = 0.20,
                  sym=sym,
                  boxprops=dict(color=c),
                  capprops=dict(color=c),
                  whiskerprops=dict(color=c),
                  medianprops=dict(color=c),
                  positions=np.arange(0, 24) + 0.25,
                  showmeans=SHOW_MEANS,
                  meanprops={"marker":"o",
                             "markerfacecolor":"white", 
                             "markeredgecolor":c,
                             "markersize":"5"},
                  flierprops={"markersize":"4",
                             "markerfacecolor":"white",
                             "markeredgecolor":c});



plt.xticks(ticks=np.arange(0,24), labels=stats_dict.keys());
plt.legend([bp1["boxes"][0], bp2["boxes"][0], bp3["boxes"][0]], 
           ['Chroma', 
            'Chroma & DLNCO',
            'Chroma & Spectral Flux'], 
           loc='upper right',
           fontsize=13);
plt.grid(linestyle=':');
plt.ylabel('Alignment Error (sec)', fontsize=14)
plt.xlabel('Song Number in SWD', fontsize=14)
plt.ylim([0, 0.55]);
plt.savefig(f'{FIGURE_DIR}/error_boxplot.pdf', dpi=300);

### References

[1] M. Müller, Y. Özer, M. Krause, T. Prätzlich, and J. Driedger, “Sync toolbox: A python package for efficient, robust, and accurate music synchronization,” Journal of Open Source Software, vol. 6, no. 64, p. 3434, 2021. [Online]. Available: https://doi.org/10.21105/joss.03434

[2] C. Weiß, F. Zalkow, V. Arifi-Müller, M. Müller, H. V.Koops, A. Volk, and H. Grohganz, “Schubert Winterreise dataset: A multimodal scenario for music analysis,” ACM Journal on Computing and Cultural Heritage (JOCCH), vol. 14, no. 2, pp. 25:1–18, 2021.

[3] T. Prätzlich, J. Driedger, and M. Müller, “Memory-restricted multiscale dynamic time warping,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Shanghai, China, March 2016, pp. 569–573.

[4] S. Ewert, M. Müller, and P. Grosche, “High resolution audio synchronization using chroma onset features,”
in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan, Apr. 2009, pp. 1869–1872.

[5] P. Grosche, M. Müller, and S. Ewert, “Combination of onset-features with applications to high-resolution
music synchronization,” in Proceedings of the International Conference on Acoustics (NAG/DAGA), 2009, pp. 357–360.

[6] T. Prätzlich and M. Müller, “Triple-based analysis of music alignments without the need of ground-truth annotations,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Shanghai, China, March 2016, pp. 266–270.