melodic-dictation
This project is a matlab script for automated melodic dictation.
How it works
First, the spectrogram of the music track is extracted using a Short-Term Fourier Transform (STFT, or TFCT).
Then, the idea is to break down the spectrogram matrix V as a product of two matrices: V = W*H Where:
- W is a preconstituted matrix containing the spectrograms of all the notes in the chromatic scale
- H is a matrix describing which notes are played at all times - a.k.a the music score.
So while we can easily generate W, the goal is to find H using the Non-negative Matrix Factorization (NMF) method. An algorithm using the Kullback-Leibler divergence is described in a paper by Lee and Seung.
Applied to the previous spectrogram, we get the following H matrix:
Some signal processing will help to determine the start and end of each note:
And we can finally export the result to a music score or a MIDI file
A few examples
Original | Reconstruction |
---|---|
tetris.mid | tetris_out.mid |
zelda.mid | zelda_out.mid |
mario.mid | mario_out.mid |