In [1]:
import sys
import subprocess
import torch
sys.path.append('../midi-generator')

%load_ext autoreload
%autoreload 2

In [2]:
import IPython.display as ipd

from model.dataset import MidiDataset

from utils.load_model import load_model
from utils.generate_midi import generate_midi
from utils.seed import set_seed
from utils.write_notes import write_notes

To listen to midi files from Jupyter notebook, let's define help function which transforms `*.mid` file to `*.wav` file. 

In [3]:
def mid2wav(mid_path, wav_path):
    subprocess.check_output(['timidity', mid_path, '-OwS', '-o', wav_path])

The next step is loading the model from the checkpoint. To make experiments reproducible let's also specify random seed.

You can also try to use the model, which was trained with label smoothing (see `../results/smoothing.ch`).

In [4]:
seed = 1234
set_seed(seed)

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model, vocab = load_model(checkpoint_path='../results/test.ch', device=device)

# MIDI file generation

Let's generate a new file. Notice, that the parameter `seq_len` specifies the length of the output sequence of notes. 

Function `generate_midi` return sequence of generated notes and offsets between them.

## Nucleus (`top-p`) Sampling

Sample from the most probable tokens, which sum of probabilities gives `top-p`.  If `top-p == 0` the most probable token is sampled.

## Temperature

As `temperature` → 0 this approaches greedy decoding, while `temperature` → ∞ asymptotically approaches uniform sampling from the vocabulary.

In [5]:
note_seq, offset_seq = generate_midi(model, vocab, seq_len=1024, top_p=0, temperature=1, device=device)
note_seq = vocab.decode(note_seq)

HBox(children=(IntProgress(value=0, max=1024), HTML(value='')))




Let's listen to result midi.

In [None]:
# midi with constant offsets
notes = MidiDataset.decode_notes(note_seq, offset_seq=None)

mid_path = '../results/output_without_offsets.mid'
wav_path = '../results/output_without_offsets.wav'

write_notes(mid_path, notes)
mid2wav(mid_path, wav_path)

ipd.Audio(wav_path)

In [None]:
# midi with generated offsets
notes = MidiDataset.decode_notes(note_seq, offset_seq=offset_seq)

mid_path = '../results/output_with_offsets.mid'
wav_path = '../results/output_with_offsets.wav'

write_notes(mid_path, notes)
mid2wav(mid_path, wav_path)

ipd.Audio(wav_path)

The result with constant offsets sounds better, doesn't it? :)

Be free to try different generation parameters (`top-p` and `temperature`) to understand their impact on the resulting sound.

You can also train your own model with different specs (e.g. different hidden size) or use label smoothing during training.