Self-Supervised Hierarchical Metrical Analysis

This is the official repository for the paper Deep Self-Supervised Hierarchical Metrical Structure Modeling accepted by ICASSP 2023.

Preprint: https://arxiv.org/abs/2210.17183

Resources

Code and README
Pre-trained models (download the models from here)
Demos
Data annotations (for evaluation only)
More information on training data (to be added)
Model training (to be added)
Baseline models for reproduction (to be added)

Demo

All MIDI demos are suggested to be viewed using a DAW (e.g., Cakewalk, Fl Studio or GarageBand) with track-wise piano roll visualization.

See output for the full results of the MIDI analysis model on RWC-POP.

The file ends with _crf.mid is the results decoded by Conditional Random Field (CRF). Each output MIDI file contains an extra MIDI track with name Layers besides tracks from the original MIDI file. Layers is a drum track that labels L drum notes on a level-L boundary beyond measures (better viewed in an DAW with track-wise piano rolls). A downbeat without any drum notes is interpreted as a level-0 boundary.
The file ends with _raw.mid is the raw output predicted by Temporal Convolutional Network (TCN) without CRF decoding. There are a total of L=8 midi tracks with names Layer-l for l=0...7 added to the MIDI file, corresponding to the frame-wise probability of a metrical boundary of level at least (l + 1).

Pretrained models

The pre-trained models are available here. Please download and extract all *.sdict files to the folder cache_data before continue.

Run models on your own samples

Notice:

you need correct beat (required) and downbeat (optional) labels for both MIDI and audio files.
For MIDI files, the labels are automatically calculated based on tempo change and key signature events (as in most DAWs).
For audio files, the labels need to be provided separately as a text file with format we will specify below.
Downbeat labels are for hypermetrical structure analysis. If downbeat labels are wrong, the CRF-decoded results will be meaningless, but the raw TCN output is not influenced.
Beat labels need to be accurete up to 32th-note level. Beat labels that fail to achieve this precision may cause degraded model performance. Notice that many automatic beat tracking models (e.g., madmom) might have precision issues.
Currently, only 4/4 songs are supported.

Run the MIDI model

For the MIDI model, run the following code to analyze example_input/midi_input/RM-P001.SMF_SYNC.MID.

python tcn_downbeat_eval.py example_input/midi_input/RM-P001.SMF_SYNC.MID

The program will produce two MIDI files *_crf.mid and *_raw.mid stored in the output folder. The file formats are already described above.

Run the audio model

For the audio model, you also need to install sonic visualizer and add it to your environment variable. Make sure your sonic visualizer folder is in your PATH environment variable. To check this, running the following command on your console should open sonic visualizer:

"sonic visualiser"

To use pre-trained model on custom audio samples, you will need a beat label file like this:

For each row, the first number is the position of each beat (in second), and the second number is the downbeat label (1=downbeat). Two numbers are separated by tabs.

Then, run the following code:

python tcn_downbeat_eval.py example_input/audio_input/RWC-POP-001-48kbps.mp3 example_input/audio_input/RWC-POP-001.lab

The program will call sonic visualizer (if correctly installed) to visualize the results. The results contain 3 layers:

A beat-sync version of your music file (time-stretched according to your beat labels to ensure a constant BPM)
A time instance layer showing hypermetrical structure analysis results (requires correct downbeat labels)
A spectrogram layer showing the raw prediction of the model on L=8 metrical layers.

Notice: imprecise beat labels will likely cause low-confident predictions like this:

Credits

Credits to other repositories:

Supervised metrical Structure analysis from https://github.com/music-x-lab/Hierarchical-Metrical-Structure
Osu parser adapted from https://github.com/Awlexus/python-osu-parser (used for data pre-processing)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
annotation		annotation
cache_data		cache_data
data		data
example_input		example_input
extractors		extractors
io_new		io_new
mir		mir
modules		modules
osu_parser		osu_parser
output/tcn_downbeat_unsupervised_v3.0_1024_context_6_tcn		output/tcn_downbeat_unsupervised_v3.0_1024_context_6_tcn
screenshot		screenshot
.gitignore		.gitignore
README.MD		README.MD
audio_structure_rule.py		audio_structure_rule.py
crf.py		crf.py
data_preprocess.py		data_preprocess.py
datasets.py		datasets.py
gttm_analyzer.py		gttm_analyzer.py
metrical_crf.py		metrical_crf.py
midi_structure.py		midi_structure.py
osu_common.py		osu_common.py
settings.py		settings.py
simple_tcn.py		simple_tcn.py
simple_tcn_eval.py		simple_tcn_eval.py
tcn_audio_metrical_eval.py		tcn_audio_metrical_eval.py
tcn_audio_metrical_structure.py		tcn_audio_metrical_structure.py
tcn_audio_metrical_supervised.py		tcn_audio_metrical_supervised.py
tcn_downbeat_eval.py		tcn_downbeat_eval.py
tcn_downbeat_supervised.py		tcn_downbeat_supervised.py
tcn_downbeat_unsupervised.py		tcn_downbeat_unsupervised.py

music-x-lab/Self-Supervised-Metrical-Structure

Folders and files

Latest commit

History

Repository files navigation

Self-Supervised Hierarchical Metrical Analysis

Resources

Demo

Pretrained models

Run models on your own samples

Run the MIDI model

Run the audio model

Credits

About

Resources

Stars

Watchers

Forks

Languages