This is a collection of observations around why MIDI is not training well.

In [3]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
/notebooks/MusicTransformer-tensorflow2.0


In [4]:
%cd /notebooks/MusicTransformer-tensorflow2.0

/notebooks/MusicTransformer-tensorflow2.0


# 1. Comparing Datasets

I first worked through the local GPU issues, and then set it up to train `classic_piano` in comparison to `midi`. I graphed the metrics through Tensorboard.

In [None]:
from train import train
train(
    '/data/classic_piano_preprocessed',
    './save',
    log_dir='/tmp/logs/classic_piano_preprocessed')

In [None]:
from train import train
train(
    '../out/transformer-preprocess',
    './save-transformer-preprocess',
    log_dir='/tmp/logs/transformer-preprocess'
)

Below is a graph of the accuracy, with `classic_piano` through ~65 epochs and `midi` through ~20 epochs. `classic_piano` is clearly on the upswing, and looks [comparable in slope to the results in the original repo](https://github.com/jason9693/MusicTransformer-tensorflow2.0#result).

`midi` is going nowhere fast (which is why I stopped it early).

![Accuracy](./accuracy.png)

Here's a graph of the loss. The loss for `midi` stays at 0.

![Loss](./loss.png)

The most likely culprit is that there's something wrong with the underlying data. So let's look at that.

# 2. Examining the Data

Let's log out the tensor information we're getting from each dataset.

In [68]:
from data import Data
import tensorflow as tf

batch_size = 2
max_seq = 1024

input_paths = [
    '/data/classic_piano_preprocessed',
    '../out/transformer-preprocess',    
]

for input_path in input_paths:
    print('\n\nInput path: {}'.format(input_path))
    dataset = Data(input_path)
    batch_x, batch_y = dataset.slide_seq2seq_batch(batch_size, max_seq)
    print('batch_x')
    print(batch_x.shape)
    print(batch_x[0:5])
    print('min', tf.math.reduce_min(batch_x, axis=None, keepdims=False, name=None))
    print('max', tf.math.reduce_max(batch_x, axis=None, keepdims=False, name=None))
    print('mean', tf.math.reduce_mean(batch_x, axis=None, keepdims=False, name=None))
    print('batch_y')
    print(batch_y.shape)
    print(batch_y)
    print('min', tf.math.reduce_min(batch_y, axis=None, keepdims=False, name=None))
    print('max', tf.math.reduce_max(batch_y, axis=None, keepdims=False, name=None))
    print('mean', tf.math.reduce_mean(batch_y, axis=None, keepdims=False, name=None))



Input path: /data/classic_piano_preprocessed
batch_x
(2, 1024)
[[367  51 256 ... 304 163 361]
 [369  79 281 ...  75 366  60]]
min tf.Tensor(27, shape=(), dtype=int64)
max tf.Tensor(380, shape=(), dtype=int64)
mean tf.Tensor(218, shape=(), dtype=int64)
batch_y
(2, 1024)
[[ 51 256 168 ... 163 361  40]
 [ 79 281 179 ... 366  60 308]]
min tf.Tensor(27, shape=(), dtype=int64)
max tf.Tensor(380, shape=(), dtype=int64)
mean tf.Tensor(218, shape=(), dtype=int64)


Input path: ../out/transformer-preprocess
batch_x
(2, 1024)
[[390. 388. 388. ... 388. 388. 388.]
 [390. 388. 388. ... 388. 388. 388.]]
min tf.Tensor(388.0, shape=(), dtype=float64)
max tf.Tensor(390.0, shape=(), dtype=float64)
mean tf.Tensor(388.001953125, shape=(), dtype=float64)
batch_y
(2, 1024)
[[388. 388. 388. ... 388. 388. 388.]
 [388. 388. 388. ... 388. 388. 388.]]
min tf.Tensor(388.0, shape=(), dtype=float64)
max tf.Tensor(388.0, shape=(), dtype=float64)
mean tf.Tensor(388.0, shape=(), dtype=float64)


The shapes of each tensor batch are identical, but the data is pretty different. For `classic_piano`, there's a pretty wide range of values (27-380), but for `midi`, they all seem to fall in a small band (`388-390`).

That seems pretty suspicious.

# 3. Preprocessing MIDI step

Next let's see how the data is being preprocessed by in the original repo.

In [None]:
# Download the classic_piano dataset

!sh /notebooks/MusicTransformer-tensorflow2.0/dataset/scripts/classic_piano_downloader.sh /data/classic_piano

In [13]:
# in pachctl-interaction
# get.py --repo midi --out ../musictransformer/dev/out
# !mv ../src/out ../out

In [14]:
import os

CLASSIC_PIANO_DATA = '/data/classic_piano'
MIDI = '../out/midi'

print('{} files in {}'.format(len(os.listdir(CLASSIC_PIANO_DATA)), 'classic_piano'))
print('{} files in {}'.format(len(os.listdir(MIDI)), 'MIDI'))

329 files in classic_piano
156 files in MIDI


As an aside, the sample dataset has 329 files, while we have 156; but I'm not sure how best to compare the lengths of the files. It may be worth measuring more definitively how our dataset compares to `classic_piano` or one of the other sample ones.

We can try running `preprocess` on the different datasets and see if anything jumps out as being fishy.

In [15]:
from preprocess import preprocess_midi_files_under_original

preprocess_midi_files_under_original(CLASSIC_PIANO_DATA, '/tmp/data/classic_piano_preprocessed')
preprocess_midi_files_under_original(MIDI, '/tmp/data/midi')

preprocess.py


MIDI Paths in /data/classic_piano: 100%|██████████| 329/329 [00:35<00:00,  9.37it/s]
MIDI Paths in ../out/midi: 100%|██████████| 156/156 [00:16<00:00,  9.34it/s]


Everything seems legit. Let's try pre-processing individual files

In [16]:
from preprocess import preprocess_midi_original
import os

def printPreprocessedData(file):
    print(file)
    print(preprocess_midi_original(file)[0:20])

classic_piano_data = ['{folder}/{file}'.format(folder=CLASSIC_PIANO_DATA, file=file) for file in os.listdir(CLASSIC_PIANO_DATA)]
midi_data = [os.path.abspath('{folder}/{file}'.format(folder=MIDI, file=file)) for file in os.listdir(MIDI)]

In [17]:
printPreprocessedData(classic_piano_data[2])

/data/classic_piano/schuim-4_format0.mid
[366, 83, 366, 44, 265, 364, 87, 211, 264, 366, 83, 172, 264, 364, 80, 265, 208, 366, 80, 366]


In [18]:
for i in range(0, 5):
    printPreprocessedData(midi_data[i])

/notebooks/out/midi/liquidmind-10-thejoyofquietpt2.wav.mid
[]
/notebooks/out/midi/gas-03-nach1912.wav.mid
[]
/notebooks/out/midi/easternawakening-14-hillsofnepal.wav.mid
[]
/notebooks/out/midi/rudyadrian-06-thelegendofkristylynn.wav.mid
[]
/notebooks/out/midi/deuter-05-thesource.wav.mid
[]


Very interesting! So what jumps out here is that the `MIDI` files are being pre-processed as empty arrays. This would go a good way towards explaining why the incoming tensors in the training step look so fishy.

Why are they failing to be preprocessed? Let's crack open `midi_processor` and see what `encode_midi` is doing. Below is a reproduction of some of the code from the original `midi_processor/processor.py` file:

In [19]:
import pretty_midi

midi_file_one = pretty_midi.PrettyMIDI(midi_file=midi_data[0])
piano_file_one = pretty_midi.PrettyMIDI(midi_file=classic_piano_data[0])

from midi_processor.processor import _control_preprocess, _note_preprocess, _divide_note, _make_time_sift_events, _snote2events

def _note_preprocess_jupyter(susteins, notes):
    note_stream = []

    for sustain in susteins:
        for note_idx, note in enumerate(notes):
            if note.start < sustain.start:
                note_stream.append(note)
            elif note.start > sustain.end:
                notes = notes[note_idx:]
                sustain.transposition_notes()
                break
            else:
                sustain.add_managed_note(note)

    for sustain in susteins:
        note_stream += sustain.managed_notes

    note_stream.sort(key= lambda x: x.start)
    return note_stream

def encode_midi_jupyter(file_path):
    events = []
    notes = []
    mid = pretty_midi.PrettyMIDI(midi_file=file_path)

    for inst in mid.instruments:
        inst_notes = inst.notes
        # ctrl.number is the number of sustain control. If you want to know abour the number type of control,
        # see https://www.midi.org/specifications-old/item/table-3-control-change-messages-data-bytes-2
        ctrls = _control_preprocess([ctrl for ctrl in inst.control_changes if ctrl.number == 64])
        notes += _note_preprocess(ctrls, inst_notes)

    dnotes = _divide_note(notes)

    # print(dnotes)
    dnotes.sort(key=lambda x: x.time)
    # print('sorted:')
    # print(dnotes)
    cur_time = 0
    cur_vel = 0
    for snote in dnotes:
        events += _make_time_sift_events(prev_time=cur_time, post_time=snote.time)
        events += _snote2events(snote=snote, prev_vel=cur_vel)
        # events += _make_time_sift_events(prev_time=cur_time, post_time=snote.time)

        cur_time = snote.time
        cur_vel = snote.velocity

    return [e.to_int() for e in events]

def printEncodedMIDIJupyter(file):
    print(file)
    print(encode_midi_jupyter(file)[0:20])

In [20]:
printEncodedMIDIJupyter(classic_piano_data[0])

/data/classic_piano/scn15_7_format0.mid
[355, 335, 368, 60, 330, 188, 367, 41, 370, 65, 355, 305, 362, 53, 362, 60, 361, 57, 361, 48]


In [21]:
printEncodedMIDIJupyter(midi_data[0])

/notebooks/out/midi/liquidmind-10-thejoyofquietpt2.wav.mid
[]


What appears to be happening is that the pre-processing step excepts the MIDI files to contain sustain notes. Our dataset has no sustain notes.

Let's try and return the raw notes, _if_ no sustain notes are found:

In [22]:
import pretty_midi

midi_file_one = pretty_midi.PrettyMIDI(midi_file=midi_data[0])
piano_file_one = pretty_midi.PrettyMIDI(midi_file=classic_piano_data[0])

from midi_processor.processor import _control_preprocess, _note_preprocess, _divide_note, _make_time_sift_events, _snote2events

def _note_preprocess_jupyter(susteins, notes):
    note_stream = []

    for sustain in susteins:
        for note_idx, note in enumerate(notes):
            if note.start < sustain.start:
                note_stream.append(note)
            elif note.start > sustain.end:
                notes = notes[note_idx:]
                sustain.transposition_notes()
                break
            else:
                sustain.add_managed_note(note)

    for sustain in susteins:
        note_stream += sustain.managed_notes

    note_stream.sort(key= lambda x: x.start)
    return note_stream

def encode_midi_jupyter(file_path):
    events = []
    notes = []
    mid = pretty_midi.PrettyMIDI(midi_file=file_path)

    for inst in mid.instruments:
        inst_notes = inst.notes
        # ctrl.number is the number of sustain control. If you want to know abour the number type of control,
        # see https://www.midi.org/specifications-old/item/table-3-control-change-messages-data-bytes-2
        ctrls_for_preprocessing = [ctrl for ctrl in inst.control_changes if ctrl.number == 64]
        if len(ctrls_for_preprocessing) > 0:
            ctrls = _control_preprocess(ctrls_for_preprocessing)
            notes += _note_preprocess(ctrls, inst_notes)
        else:
            notes = inst_notes

    dnotes = _divide_note(notes)

    dnotes.sort(key=lambda x: x.time)
    cur_time = 0
    cur_vel = 0
    for snote in dnotes:
        events += _make_time_sift_events(prev_time=cur_time, post_time=snote.time)
        events += _snote2events(snote=snote, prev_vel=cur_vel)

        cur_time = snote.time
        cur_vel = snote.velocity

    return [e.to_int() for e in events]

def printEncodedMIDIJupyter(file):
    print(file)
    print(encode_midi_jupyter(file)[0:20])

In [23]:
printEncodedMIDIJupyter(midi_data[0])

/notebooks/out/midi/liquidmind-10-thejoyofquietpt2.wav.mid
[261, 361, 55, 261, 362, 60, 268, 183, 362, 55, 361, 52, 268, 188, 362, 60, 258, 183, 363, 55]


Now we're getting data! Let's walk it back up the chain and try preprocessing our data with the fix that handles empty sustain notes:

In [24]:
from preprocess import preprocess_midi
import os

def printPreprocessedData(file):
    print(file)
    print(preprocess_midi(file)[0:2])    

classic_piano_data = ['{folder}/{file}'.format(folder=CLASSIC_PIANO_DATA, file=file) for file in os.listdir(CLASSIC_PIANO_DATA)]
midi_data = [os.path.abspath('{folder}/{file}'.format(folder=MIDI, file=file)) for file in os.listdir(MIDI)]

In [25]:
printPreprocessedData(classic_piano_data[2])

/data/classic_piano/schuim-4_format0.mid
[366, 83]


In [26]:
printEncodedMIDIJupyter(midi_data[0])

/notebooks/out/midi/liquidmind-10-thejoyofquietpt2.wav.mid
[261, 361, 55, 261, 362, 60, 268, 183, 362, 55, 361, 52, 268, 188, 362, 60, 258, 183, 363, 55]


Now a single file is being processed correctly. Let's try the whole dataset.

In [91]:
from preprocess import preprocess_midi_files_under

preprocess_midi_files_under(CLASSIC_PIANO_DATA, '/tmp/data/classic_piano_preprocessed')
preprocess_midi_files_under(MIDI, '/tmp/data/midi')

MIDI Paths: 100%|██████████| 329/329 [00:36<00:00,  9.04it/s]
MIDI Paths: 100%|██████████| 156/156 [00:26<00:00,  5.89it/s]


In [92]:
from data import Data
import tensorflow as tf

batch_size = 2
max_seq = 1024

input_paths = [
    '/tmp/data/classic_piano_preprocessed',
    '/tmp/data/midi',    
]

for input_path in input_paths:
    print('\n\nInput path: {}'.format(input_path))
    dataset = Data(input_path)
    batch_x, batch_y = dataset.slide_seq2seq_batch(batch_size, max_seq)
    print('batch_x')
    print(batch_x.shape)
    print(batch_x[0:5])
    print('min', tf.math.reduce_min(batch_x, axis=None, keepdims=False, name=None))
    print('max', tf.math.reduce_max(batch_x, axis=None, keepdims=False, name=None))
    print('mean', tf.math.reduce_mean(batch_x, axis=None, keepdims=False, name=None))
    print('batch_y')
    print(batch_y.shape)
    print(batch_y)
    print('min', tf.math.reduce_min(batch_y, axis=None, keepdims=False, name=None))
    print('max', tf.math.reduce_max(batch_y, axis=None, keepdims=False, name=None))
    print('mean', tf.math.reduce_mean(batch_y, axis=None, keepdims=False, name=None))



Input path: /tmp/data/classic_piano_preprocessed
batch_x
(2, 1024)
[[256 368  45 ... 371  73 213]
 [372  64 369 ... 257 178 264]]
min tf.Tensor(33, shape=(), dtype=int64)
max tf.Tensor(377, shape=(), dtype=int64)
mean tf.Tensor(221, shape=(), dtype=int64)
batch_y
(2, 1024)
[[368  45 256 ...  73 213 376]
 [ 64 369  60 ... 178 264 364]]
min tf.Tensor(33, shape=(), dtype=int64)
max tf.Tensor(377, shape=(), dtype=int64)
mean tf.Tensor(221, shape=(), dtype=int64)


Input path: /tmp/data/midi
batch_x
(2, 1024)
[[183 364  50 ...  57 268 190]
 [366  59 369 ... 369  57 371]]
min tf.Tensor(29, shape=(), dtype=int64)
max tf.Tensor(374, shape=(), dtype=int64)
mean tf.Tensor(216, shape=(), dtype=int64)
batch_y
(2, 1024)
[[364  50 364 ... 268 190 367]
 [ 59 369  64 ...  57 371  60]]
min tf.Tensor(29, shape=(), dtype=int64)
max tf.Tensor(374, shape=(), dtype=int64)
mean tf.Tensor(216, shape=(), dtype=int64)


This looks much better! Let's give training a shot.

In [None]:
from train import train
train(
    '/tmp/data/midi',
    './save/midi',
    log_dir='/tmp/logs/midi-preprocess'
)

These logs look much better; our training data appears to be tracking much closer to the `classic_piano` dataset:

![New Accuracy](new-accuracy.png)

Loss:

![New Loss](new-loss.png)