## MIDI Generator

In [1]:
## Uncomment command below to kill current job:
#!neuro kill $(hostname)

In [None]:
import random
import sys
import subprocess
import torch
sys.path.append('../midi-generator')

%load_ext autoreload
%autoreload 2

In [None]:
import IPython.display as ipd

from model.dataset import MidiDataset

from utils.load_model import load_model
from utils.generate_midi import generate_midi
from utils.seed import set_seed
from utils.write_notes import write_notes

Each `*.mid` file can be thought of as a sequence where notes and chords follow each other with specified time offsets between them. So, following this model a next note can be predicted with a `seq2seq` model. In this work, a simple `GRU`-based model is used.

Note that the number of available notes and chord in vocabulary is not specified and depends on a dataset which a model was trained on.

To listen to MIDI files from Jupyter notebook, let's define help function which transforms `*.mid` file to `*.wav` file. 

In [None]:
def mid2wav(mid_path, wav_path):
    subprocess.check_output(['timidity', mid_path, '-OwS', '-o', wav_path])

The next step is loading the model from the checkpoint. To make experiments reproducible let's also specify random seed.

You can also try to use the model, which was trained with label smoothing (see `../results/smoothing.ch`).

In [None]:
seed = 1234
set_seed(seed)

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print(device)
model, vocab = load_model(checkpoint_path='../results/test.ch', device=device)

Let's also specify additional help function to avoid code duplication.

In [None]:
def dump_result(file_preffix, vocab, note_seq, offset_seq=None):
    note_seq = vocab.decode(note_seq)
    notes = MidiDataset.decode_notes(note_seq, offset_seq=offset_seq)

    mid_path = file_preffix + '.mid'
    wav_path = file_preffix + '.wav'

    write_notes(mid_path, notes)
    mid2wav(mid_path, wav_path)
    
    return wav_path

# MIDI file generation

Let's generate a new file. Note that the parameter `seq_len` specifies the length of the output sequence of notes. 

Function `generate_midi` return sequence of generated notes and offsets between them.

## Nucleus (`top-p`) Sampling

Sample from the most probable tokens, which sum of probabilities gives `top-p`.  If `top-p == 0` the most probable token is sampled.

## Temperature

As `temperature` → 0 this approaches greedy decoding, while `temperature` → ∞ asymptotically approaches uniform sampling from the vocabulary.

In [None]:
note_seq, offset_seq = generate_midi(model, vocab, seq_len=128, top_p=0, temperature=1, device=device)

Let's listen to result midi.

In [None]:
# midi with constant offsets
ipd.Audio(dump_result('../results/output_without_offsets', vocab, note_seq, offset_seq=None))

In [None]:
# midi with generated offsets
ipd.Audio(dump_result('../results/output_with_offsets.mid', vocab, note_seq, offset_seq))

The result with constant offsets sounds better, doesn't it? :)

Be free to try different generation parameters (`top-p` and `temperature`) to understand their impact on the resulting sound.

You can also train your own model with different specs (e.g. different hidden size) or use label smoothing during training.

# Continue existing file

## Continue sampled notes
For beginning, let's continue sound that consists of sampled from `vocab` notes. 

In [None]:
seed = 4321
set_seed(seed)

history_notes = random.choices(range(len(vocab)), k=20)
history_offsets = len(history_notes) * [0.5]

In [None]:
ipd.Audio(dump_result('../results/random_history', vocab, history_notes, history_offsets))

It sounds a little bit chaotic. Let's try to continue this with our model.

In [None]:
history = [*zip(history_notes, history_offsets)]
note_seq, offset_seq = generate_midi(model, vocab, seq_len=128, top_p=0, temperature=1, device=device, 
                                     history=history)

In [None]:
# midi with constant offsets
ipd.Audio(dump_result('../results/random_without_offsets', vocab, note_seq, offset_seq=None))

After the sampled part ends, the generated melody starts to sound better.

## Continue existed melody

In [None]:
raw_notest = MidiDataset.load_raw_notes('../data/mining.mid')

In [None]:
org_note_seq, org_offset_seq = MidiDataset.encode_notes(raw_notest)
org_note_seq = vocab.encode(org_note_seq)

Let's listen to it

In [None]:
ipd.Audio(dump_result('../results/original_sound', vocab, org_note_seq, org_offset_seq))

and take 20 first elements from the sequence as out history sequence.

In [None]:
history_notes = org_note_seq[:20]
history_offsets = org_offset_seq[:20]

In [None]:
history = [*zip(history_notes, history_offsets)]
note_seq, offset_seq = generate_midi(model, vocab, seq_len=128, top_p=0, temperature=1, device=device, 
                                     history=history)

In [None]:
# result melody without generated offsets
ipd.Audio(dump_result('../results/continue_rand_without_offsets', vocab, note_seq, offset_seq=None))

In [None]:
# result melody with generated offsets
ipd.Audio(dump_result('../results/continue_rand_with_offsets', vocab, note_seq, offset_seq))

You can try to overfit your model on one melody to get better results. Otherwise, you can use already pretrained model (`../results/onemelody.ch`)

# Model overfitted on one melody

Let's try the same thing which we did before. Let's continue melody, but this time do it with the model, 
which was overfitted with this melody.

In [None]:
seed = 1234
set_seed(seed)

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model, vocab = load_model(checkpoint_path='../results/onemelody.ch', device=device)

In [None]:
raw_notest = MidiDataset.load_raw_notes('../data/Final_Fantasy_Matouyas_Cave_Piano.mid')
org_note_seq, org_offset_seq = MidiDataset.encode_notes(raw_notest)
org_note_seq = vocab.encode(org_note_seq)

Let's listen to it.

In [None]:
ipd.Audio(dump_result('../results/onemelody_original_sound', vocab, org_note_seq, org_offset_seq))

In [None]:
end = 60
history_notes = org_note_seq[:end]
history_offsets = org_offset_seq[:end]

Listen to history part of loaded melody.

In [None]:
ipd.Audio(dump_result('../results/onemelody_history', vocab, history_notes, history_offsets))

Now we can try to continue the original melody with our model. But firstly, you can listen to the original tail part of the melody do refresh it in the memory and have reference to compare with.

In [None]:
tail_notes = org_note_seq[end:]
tail_offsets = org_offset_seq[end:]
ipd.Audio(dump_result('../results/onemelody_tail', vocab, tail_notes, tail_offsets))

In [None]:
history = [*zip(history_notes, history_offsets)]
note_seq, offset_seq = generate_midi(model, vocab, seq_len=128, top_p=0, temperature=1, device=device, 
                                     history=history)

# delete history part
note_seq = note_seq[end:]
offset_seq = offset_seq[end:]

In [None]:
# result melody without generated offsets
ipd.Audio(dump_result('../results/continue_onemelody_without_offsets', vocab, note_seq, offset_seq=None))

In [None]:
# result melody with generated offsets
ipd.Audio(dump_result('../results/continue_onemelody_with_offsets', vocab, note_seq, offset_seq))

As you can hear, this time, the model generated better offsets and the result melody does not sound so chaostic.