Skip to content

ilya16/PianoCoRe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

PianoCoRe: Combined and Refined Piano MIDI Dataset

Project repository for the article "PianoCoRe: Combined and Refined Piano MIDI Dataset"

Published in the Transactions of the International Society for Music Information Retrieval

Author: Ilya Borovik

TISMIR Zenodo Dataset

Description

PianoCoRe is a large-scale piano MIDI dataset that unifies and refines major open-source piano corpora. It contains 250,046 performances of 5,625 pieces written by 483 composers, totaling 21,763 hours of performed music.

PianoCoRe provides the most diverse composer- and composition-annotated piano MIDI data. The metadata includes deduplication flags, MIDI quality labels and precise note-level score-performance alignments.

The alignments are refined using a Refined Alignment for Scores and Performances (RAScoP) pipeline, integrated into the Symbolic Music Performance modeling (SyMuPe) Python package. The pipeline ensures perfect note-by-note score-performance synchronization for expressive performance modeling.

For more details, please refer to the full article.

Usage examples will be added after the update of the SyMuPe package.

Dataset Tiers

Dataset Composers Pieces Performances Hours Scores Alignments
PianoCoRe-C 483 5,625 250,046 21,763 75.3% no
PianoCoRe-B 478 5,591 214,092 18,757 75.0% no
PianoCoRe-A 151 1,591 157,207 12,509 100% note
PianoCoRe-A* 137 1,517 130,275 10,330 100% note

To support different research applications, the dataset is organized into tiered subsets:

  • PianoCoRe-C (Combined): a complete mixed-source piano performance collection.

    Applications: piano performance analysis, data cleaning algorithms.

  • PianoCoRe-B (Base): a deduplicated and quality-filtered subset.

    Applications: large-scale pre-training, piano performance generation.

  • PianoCoRe-A (Aligned): a subset containing performances aligned to score.

    Applications: score-performance analysis, expressive piano performance rendering.

  • PianoCoRe-A*: a high quality subset of the best-quality performances and note-level alignments.

    Applications: expressive piano performance rendering, performance-to-score transcription.

Tier flags are provided in the metadata of both the Zenodo and Hugging Face versions of the dataset.

License

The dataset, original and processed files, metadata, and alignment annotations are published under a CC BY-NC-SA 4.0 license. The license respects the licenses used for the source datasets. The underlying MIDI transcriptions are provided strictly for non-commercial research and educational purposes.

Acknowledgments

PianoCoRe is built upon the invaluable contributions of the open music information retrieval community and existing open-source datasets. Acknowledgements and credits are given to the creators of the following source corpora:

Dataset Reference Links License
MAESTRO Hawthorne et al. (2019) Paper / Dataset CC BY-NC-SA 4.0
ASAP Foscarin et al. (2020) Paper / Dataset CC BY-NC-SA 4.0
(n)ASAP Peter et al. (2023) Paper / Dataset CC BY-NC-SA 4.0
ATEPP Zhang et al. (2022) Paper / Dataset CC BY 4.0
GiantMIDI-Piano Kong et al. (2022) Paper / Dataset CC BY 4.0
Aria-MIDI Bradshaw and Colton (2025) Paper / Dataset CC BY-NC-SA 4.0
PERiScoPe Borovik et al. (2025) Paper / Dataset CC BY-NC-SA 4.0
PDMX Long et al. (2025) Paper / Dataset CC BY 4.0

Citation

If you use this dataset in your research, please cite:

Borovik, I. (2026). PianoCoRe: Combined and Refined Piano MIDI Dataset. Transactions of the International Society for Music Information Retrieval, 9(1), 144-163. DOI: 10.5334/tismir.333

@article{borovik2026pianocore,
  title={{PianoCoRe: Combined and Refined Piano MIDI Dataset}},
  author={Borovik, Ilya},
  journal={Transactions of the International Society for Music Information Retrieval},
  volume={9},
  number={1},
  pages={144--163},
  year={2026},
  doi={10.5334/tismir.333}
}

About

PianoCoRe: Combined and Refined Piano MIDI Dataset (TISMIR)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors