Implement melody encoder and support glide input #143
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implementation of #142.
Experiment results
We trained pitch predictors on three datasets, each containing one singer to test the effects of melody encoder:
The comparisons on maximum RPAs (raw pitch accuracy with tolerance of 50 cents) achieved after convergence (>150k steps) are shown below.
The results showed that melody encoder is more suitable than base pitch to carry music score information, especially on expressive datasets. On TensorBoard, significant improvements on short slurs and long vibratos were also observed. In our internal tests, pitch predictors with melody encoder also outperformed the old method on out-of-range notes, and can still show its sensitiveness even if the music scores are far higher than normal range (e.g. over C7 for a male singer). [Demo]
Additional experiments on ornaments: the glides
With the modeling of melody encoder on note sequence, we successfully introduced ornament flags to the architecture of the variance model. For this time we tested glides, where the pitch smoothly rises at the beginning of the note, or drops at the end of the note. We labeled 31 notes that glide up and 75 notes that glide down out of 71 minutes of data from Female#1, and left everything else unchanged. The experiment results showed a slightly higher RPA with glide type embedding than the baseline. In further tests, melody encoder with glide type embedding can produce accurate and natural glides with simple glide flags, without having to draw manual pitch curves like before. [Demo]