Check the data sets #1

liebharc · 2024-05-15T06:53:29Z

Introduction

The efficacy of a transformer model is significantly influenced by the quality of its training data. However, the original training dataset utilized by https://github.com/NetEase/Polyphonic-TrOMR/tree/master remains unpublished. Consequently, this repository relies on https://github.com/liebharc/Polyphonic-TrOMR/tree/master, which trains the transformer on datasets sourced from https://grfia.dlsi.ua.es/primus/, https://sites.google.com/view/multiscore-project/datasets, and https://github.com/itec-hust/CPMS. Notably, for the grandstaff dataset, extensive preprocessing is essential, including the segmentation of the grandstaff into individual staves. In the past, significant improvements in performance have been achieved through rectifying errors in datasets, such as stave segmentation, accidental placement, or the conversion of humdrum files into the TrOMR semantic format.

The Task Itself

It would be helpful to have another set of eyes go through all the datasets, especially the grandstaff one. Just take a peek at some random staff images and their corresponding semantic representations. If you spot any issues, we should either tweak our preprocessing methods to fix them or just kick those problematic cases out of the datasets. That way, we won't confuse the transformer during training.

Update

The CPMS dataset has been removed for now. And the "Lieder" dataset has been added. The task itself remains important.

liebharc · 2024-06-30T20:20:06Z

With the changes which lead to v0.2.0 the most severe issues in the data sets should be fixed. The fixes lead to a significant improvement in performance. I'll leave this issue open, as a 2nd pair of eyes would be really useful.

liebharc added help wanted Extra attention is needed good first issue Good for newcomers labels May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check the data sets #1

Check the data sets #1

liebharc commented May 15, 2024 •

edited

Loading

liebharc commented Jun 30, 2024

Check the data sets #1

Check the data sets #1

Comments

liebharc commented May 15, 2024 • edited Loading

Introduction

The Task Itself

Update

liebharc commented Jun 30, 2024

liebharc commented May 15, 2024 •

edited

Loading