Skip to content

k-l-lambda/imslp-mining

Repository files navigation

IMSLP Mining

Prerequisites

  • Clone IMSLP Crawling dataset.

    And configure env variables in .env:

     DATABASE_URL=file:/path-to-imslp-crawling/data.db
     IMSLP_FILES_DIR=/path-to-imslp-crawling/files
  • Config file in project root direcotry: config.local.yaml

    • pyclients, the OMR python hosts. E.g.
      pyclients:
        semantic: tcp://localhost:12025
        textLoc:  tcp://localhost:12026
        textOcr: tcp://localhost:12027
        brackets: tcp://localhost:12028
  • Copy OMR node package to ./tools/libs/omr.

Data Pipeline

# setup work folders and create base.yaml
yarn ts ./tools/dataInit.ts

# copy midi files
yarn ts ./tools/copyMIDI.ts

# Audio
##	split audio files and remove silent audio
yarn ts ./tools/audioSplitter.ts
python ./spectrumPlotter.py

##	piano audio to MIDI
python ./pianoTranscriber.py

# sheet music
## 	page location
yarn ts ./tools/pageReader.ts

##	vison
yarn ts ./tools/ocr.ts
yarn ts ./tools/scoreInit.ts
yarn ts ./tools/scoreVision.ts

##	regulation
yarn ts ./tools/spartitoConstructor.ts
yarn ts ./tools/spartitoSolver.ts

Maestro Pipeline

# save MIDI hashes in midi-hash.yaml
yarn ts ./tools/midiIndexing.ts

yarn ts ./tools/maestroIndexer.ts

Data dependencies of scripts

script input output
dataInit db basic.yaml
copyMIDI basic.yaml origin.midi
audioSplitter .mp3, .ogg, .flac spleeter.log, .wav
spectrumPlotter .wav spectrum.log, .wav(delete)
pianoTranscriber .wav .midi
pageReader basic.yaml, .pdf layout.json, image-bed
ocr basic.yaml, layout.json, image-bed omr.yaml, layout.json
scoreInit basic.yaml, layout.json omr.yaml, score.json
scoreVision basic.yaml, score.json, image-bed omr.yaml, score.json, image-bed(if enabled gauge)
spartitoConstructor basic.yaml, score.json omr.yaml, .spartito.json, .spartito.midi
spartitoSolver basic.yaml, omr.yaml, .spartito.json .spartito.json (in target directory)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors