Skip to content

Latest commit

 

History

History

onsets_frames_transcription

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Status

This repository is currently inactive and serves only as a supplement to the papers mentioned below. For our current transcription work, see the MT3 blog post and MT3 GitHub repository.

Onsets and Frames Transcription

State of the art piano and drum transcription models, including velocity estimation.

For more information, see our papers and blog posts:

Note that while we provide commits for code used in papers, we can provide support only for the code at HEAD.

You may also be interested in a PyTorch Onsets and Frames implementation by Jong Wook Kim (not supported by the Magenta team).

Finally, we have also open sourced the align_fine tool for high performance fine alignment of sequences that are already coarsely aligned, as described in the "Fine Alignment" section of the Appendix in the MAESTRO paper.

JavaScript App

The easiest way to try out the piano transcription model is with our web app: Piano Scribe. You can try transcribing audio files right in your browser without installing any software. You can read more about it on our blog post, Piano Transcription in the Browser with Onsets and Frames.

Colab Notebook

We also provide an Onsets and Frames Colab Notebook for both piano and drum transcription models.

Transcription Script

If you would like to run transcription locally, you can use the transcribe script. First, set up your Magenta environment.

For piano transcription, download our pre-trained checkpoint, which is trained on the MAESTRO dataset.

After unzipping that checkpoint, you can run the following command:

MODEL_DIR=<path to directory containing checkpoint>
onsets_frames_transcription_transcribe \
  --model_dir="${MODEL_DIR}" \
  <piano_recording1.wav, piano_recording2.wav, ...>

For drum transcription, use the checkpoint trained on the E-GMD dataset and run the following command:

MODEL_DIR=<path to directory containing checkpoint>
onsets_frames_transcription_transcribe \
  --model_dir="${MODEL_DIR}" \
  --config="drums" \
  <drums_recording1.wav, drums_recording2.wav, ...>

Train your own

If you would like to train the model yourself, first set up the Magenta development environment. If you're going to run dataset creation yourself (i.e. not using the pre-generated TFRecord files), you'll also need Apache Beam so the install command is:

pip install -e .[beam]

MAESTRO Dataset

If you plan on using the default dataset creation setup, you can also just download a pre-generated copy of the TFRecord files that will be generated by the steps below: onsets_frames_dataset_maestro_v1.0.0.zip. If you modify any of the steps or want to use custom code, you will need to do the following steps.

For training and evaluation, we will use the MAESTRO dataset. These steps will process the raw dataset into training examples containing 20-second chunks of audio/MIDI and validation/test examples containing full pieces.

Our dataset creation tool is written using Apache Beam. These instructions will cover how to run it using Google Cloud Dataflow, but you could run it with any platform that supports Beam.

To prepare the dataset, do the following:

  1. Set up Google Cloud Dataflow. The quickest way to do this is described in this guide.
  2. Run the following command:
BUCKET=bucket_name
PROJECT=project_name
MAGENTA_SETUP_PATH=/path/to/magenta/setup.py

PIPELINE_OPTIONS=\
"--runner=DataflowRunner,"\
"--project=${PROJECT},"\
"--temp_location=gs://${BUCKET}/tmp,"\
"--setup_file=${MAGENTA_SETUP_PATH}"

onsets_frames_transcription_create_dataset \
  --output_directory=gs://${BUCKET}/datagen \
  --pipeline_options="${PIPELINE_OPTIONS}" \
  --alsologtostderr

Depending on your setup, this could take up to a couple hours to run (on Google Cloud, this may cost around $20). Once it completes, you should have about 19 GB of files in the output_directory.

You could also train using Google Cloud, but these instructions assume you have downloaded the generated TFRecord files to your local machine.

MAPS Dataset (Optional)

Training and evaluation will happen on the MAESTRO dataset. If you would also like to evaluate (or even train) on the MAPS dataset, follow these steps.

First, you'll need to download a copy of the MAPS Database. Unzip the MAPS zip files after you've downloaded them.

Next, you'll need to create TFRecord files that contain the relevant data from MAPS by running the following command:

MAPS_DIR=<path to directory containing unzipped MAPS dataset>
OUTPUT_DIR=<path where the output TFRecord files should be stored>

onsets_frames_transcription_create_dataset_maps \
  --input_dir="${MAPS_DIR}" \
  --output_dir="${OUTPUT_DIR}"

Custom Dataset

To use your own dataset or the E-GMD dataset, see the instructions onsets_frames_transcription_create_tfrecords. You'll then need to register the new dataset in configs.py and process it with onsets_frames_transcription_create_dataset.

Training

Now can train your own transcription model using the training TFRecord file generated during dataset creation.

Note that if you have the transform_audio hparam set to true (which it is by default), you will need to have the sox binary installed on your system.

TRAIN_EXAMPLES=<path to training tfrecord(s) generated during dataset creation>
RUN_DIR=<path where checkpoints and summary events should be saved>

onsets_frames_transcription_train \
  --examples_path="${TRAIN_EXAMPLES}" \
  --model_dir="${RUN_DIR}" \
  --mode='train'

You can also run an eval job during training to check metrics:

TEST_EXAMPLES=<path to eval tfrecord(s) generated during dataset creation>
MODEL_DIR=<path where checkpoints should be loaded>
OUTPUT_DIR=$MODEL_DIR/eval

onsets_frames_transcription_infer \
  --examples_path="${TEST_EXAMPLES}" \
  --model_dir="${MODEL_DIR}" \
  --output_dir="${OUTPUT_DIR}" \
  --hparams="use_cudnn=false" \
  --eval_loop

During training, you can check on progress using TensorBoard:

tensorboard --logdir="${RUN_DIR}"

The drums config supports training on TPUs. To do that, first create a GCP VM with TPU attached:

ctpu up \
  --tf-version=1.15 \
  --machine-type=n1-standard-8 \
  --disk-size-gb=2048 \
  --tpu-size=v3-8 \
  --zone=us-central1-a

Then start training:

onsets_frames_transcription_train \
  --examples_path=gs://${BUCKET}/datagen/train_x2.tfrecord* \
  --model_dir gs://${BUCKET}/model --config=drums \
  --use_tpu \
  --preprocess_examples=false \
  --hparams=batch_size=64 \
  --tpu_cluster=${CLUSTER}

Inference

To get final performance metrics for the model, run the onsets_frames_transcription_infer script.

MODEL_DIR=<path where checkpoints should be loaded>
TEST_EXAMPLES=<path to eval tfrecord(s) generated during dataset creation>
OUTPUT_DIR=<path where output should be saved>

onsets_frames_transcription_infer \
  --model_dir="${MODEL_DIR}" \
  --examples_path="${TEST_EXAMPLES}" \
  --output_dir="${OUTPUT_DIR}"

You can check on the metrics resulting from inference using TensorBoard:

tensorboard --logdir="${OUTPUT_DIR}"