PyTorch version of Spotify's Basic Pitch, a lightweight audio-to-MIDI converter. The provided weights in Spotify's repo are converted using this script. Hopefully this helps researchers who are more accustomed to PyTorch to re-use the pretrained model.
For transcribing MIDI files, similar to Basic Pitch:
from basic_pitch_torch.inference import predict
model_output, midi_data, note_events = predict(audio_path)
For loading the nn.Module
:
from basic_pitch_torch.model import BasicPitchTorch
pt_model = BasicPitchTorch()
pt_model.load_state_dict(torch.load('assets/basic_pitch_pytorch_icassp_2022.pth'))
pt_model.eval()
with torch.no_grad():
output_pt = pt_model(y_torch)
contour_pt, note_pt, onset_pt = output_pt['contour'], output_pt['note'], output_pt['onset']
In tests/
we show two levels of validation tests using a test audio from GuitarSet:
-
On model output
- Most of the discrepancies originated from float division (e.g.
normalized_log
) and error propagation further down the network. The difference should be minimal enough to be ignored during MIDI note creation.
Contour abs diff - max: 0.0003006, min: 0.0, avg: 5.863e-06 Onset abs diff - max: 0.0002712, min: 0.0, avg: 1.431e-05 Note abs diff - max: 0.0002297, min: 0.0, avg: 6.6e-06
- Most of the discrepancies originated from float division (e.g.
-
On MIDI transcription
- The transcribed MIDI using both TF and PT models are identical (see
midi_data_pt.mid
andmidi_data_tf.mid
)
- The transcribed MIDI using both TF and PT models are identical (see
Bittner, Rachel M., et al. "A lightweight instrument-agnostic model for polyphonic note transcription and multipitch estimation." ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022.