Please use the current development version and report any bugs you encounter:
`pip install git+https://github.com/DCMLab/wavescapes.git@corpus_structure`

In [1]:
import os, shutil, subprocess
import ms3

In [2]:
DATA_FOLDER = os.path.abspath('../mscz') # on the HPC: /scratch/data/musescore.com/
SAMPLE_FOLDER = os.path.abspath('./mscz')
CONVERSION_FOLDER = os.path.abspath('./mscx')
OUTPUT_PATHS = dict(
    events = os.path.abspath('./events'),
    notes = os.path.abspath('./notes'),
    measures = os.path.abspath('./measures'),
    labels = os.path.abspath('./labels'),
    metadata = os.path.abspath('./metadata'),
)


Creating a sample of 1000 zipped MuseScore 3 for development purposes:

Using the MuseScore 3 binary to convert the file format to the current version of MuseScore 3 (needs to be installed or available as AppImage). [MuseScore commandline options](https://musescore.org/en/handbook/3/command-line-options)

ToDos:

* parallelize the conversion (anyone interested in learning [ray](https://www.ray.io/)?)
* avoid processing files more than once, making use of the fact that the file names correspond to IDs
* add proper error handling, keeping track of a mapping ID -> `errors message` for files that cannot successfully be converted (checkout stdout and stderr arguments of [subprocess.run()](https://docs.python.org/3/library/subprocess.html#subprocess.run))

In [3]:
# indicate the path to your MuseScore 3 executable or try using the standard path:
musescore_cmd = ms3.get_musescore('auto')
print(musescore_cmd)

C:\Program Files\MuseScore 3\bin\MuseScore3.exe


In [4]:
for i, entry in enumerate(os.scandir(SAMPLE_FOLDER)):
    if i == 10:
        break
    ID, file_extension = os.path.splitext(entry.name)
    converted_file_path = os.path.join(CONVERSION_FOLDER, ID + '.mscx')
    print(f"Converting {entry.path} to {converted_file_path}...", end=' ')
    
    result = subprocess.run([musescore_cmd,"--score-meta", "-o", converted_file_path, entry.path], capture_output=True, text=True)
    print(f"Exit code: {result.returncode}")
    print(f"Result: {result.stdout.strip()}") # the extraction of metadata as JSON does not work on Windows; please store the JSON to the metadata output folder
    print(f"Errors: {result.stderr.strip()}")

Converting D:\musescore-dataset\ml_project\mscz\100000.mscz to D:\musescore-dataset\ml_project\mscx\100000.mscx... Exit code: 1
Result: m_crashReporterWChar: C:/Program Files/MuseScore 3/bin/MuseScore3-crash-reporter.exe
Errors: Cannot read file D:\musescore-dataset\ml_project\mscz\100000.mscz:
Converting D:\musescore-dataset\ml_project\mscz\100007.mscz to D:\musescore-dataset\ml_project\mscx\100007.mscx... Exit code: 1
Result: m_crashReporterWChar: C:/Program Files/MuseScore 3/bin/MuseScore3-crash-reporter.exe
Errors: Cannot read file D:\musescore-dataset\ml_project\mscz\100007.mscz:
Converting D:\musescore-dataset\ml_project\mscz\100014.mscz to D:\musescore-dataset\ml_project\mscx\100014.mscx... Exit code: 1
Result: m_crashReporterWChar: C:/Program Files/MuseScore 3/bin/MuseScore3-crash-reporter.exe
Errors: Cannot read file D:\musescore-dataset\ml_project\mscz\100014.mscz:
Converting D:\musescore-dataset\ml_project\mscz\1000256.mscz to D:\musescore-dataset\ml_project\mscx\1000256.msc

Using the ms3 parsing library to extract score information:

In [5]:
for entry in os.scandir(CONVERSION_FOLDER):
    if entry.is_dir():
        continue
    parsed = ms3.Score(entry.path, read_only=True)
    ID, _ = os.path.splitext(entry.name)
    tsv_name = f"{ID}.tsv"
    dataframes = dict(
        events = parsed.mscx.events(),
        notes = parsed.mscx.notes(),
        measures = parsed.mscx.measures(),
        labels = parsed.mscx.labels(),
    )
    for facet, df in dataframes.items():
        if df is None:
            continue
        tsv_path = os.path.join(OUTPUT_PATHS[facet], tsv_name)
        df.to_csv(tsv_path, sep='\t', index=False)
    metadata = parsed.mscx.metadata # please add this nested dictionary to the JSON stored in the previous step
    metadata['id'] = ID

	MC 15, the 1st measure of a 2nd volta, should have MN 14, not MN 15.
