## 6. Process MIDI data

This notebook contains code for processing the downloaded MIDI in a number of ways. It isn't necessary to run, since the processed data is available to be downloaded HERE.

In [1]:
import music21
from music21 import converter, instrument, note
from pathlib import Path
import os
from tqdm import tqdm
from fractions import Fraction
import numpy as np
from PIL import Image
import re

### Representations

In my project, I explored a few different ways of representing the complex MIDI data. The first step for each of these is to convert the MIDI file into a numpy matrix with dimensions `(n_timesteps, 88)`, where each timestep represents 1/12 of a measure and 88 is the number of keys on the piano. For a given coordinate `(i, j)`, the value is 1 if the key is pressed and 0 otherwise.

**Created in this notebook**
- Raw text; `corpus-txt/`
    - Remove duplicate timesteps (held chords)
    - Convert each timestep to a string of "on" notes e.g. `"C4,E4,G4,B4"`
    - Large vocabulary
- Pairs; `corpus-pairs-txt/`
    - Similar to raw text, except each chord is decomposed into the unique pairs of notes
    - For example, the chord above gets converted into `"C4,E4 C4,G4 C4,B4 E4,G4 E4,B4 G4,B4"`
    - The idea is that a chord is made up of the intervals within it
    - Much smaller vocabulary

**Created in notebook `08-make-cleaned-chord-dataset.ipynb`**
- Chords; `chords-txt-cleaned/`
    - Timesteps are filtered out if the notes present are not "different enough" from the previous timestep
    - This is an attempt to reduce the number of timesteps to only include big chord changes instead of single notes being added or dropping out
    - To reduce the vocabulary size, only the middle 3 octaves (36 keys v. 88 keys) are used
- Chords augmented; `chords-txt-augmented/`
    - Same as cleaned, except every original MIDI track is transposed to all 12 keys

In [2]:
# Piano MIDI
PIANO_MIDI_DIR = Path('piano-midi/')

# For saving text representations of the midi
TXT_FILEPATH = Path('corpus-txt/')
TXT_FILEPATH.mkdir(exist_ok=True)

# For saving text representations of note pairs within the midi
TXT_PAIRS_FILEPATH = Path('corpus-pairs-txt/')
TXT_PAIRS_FILEPATH.mkdir(exist_ok=True)

# 
NP_FILEPATH = Path('midi-np/')
NP_FILEPATH.mkdir(exist_ok=True)

NP_CLEANED_FILEPATH = Path('midi-np-cleaned/')
NP_CLEANED_FILEPATH.mkdir(exist_ok=True)

NP_AUGMENTED_FILEPATH = Path('midi-np-augmented/')
NP_AUGMENTED_FILEPATH.mkdir(exist_ok=True)

MIDI_FILES = [PIANO_MIDI_DIR / file for file in os.listdir(PIANO_MIDI_DIR)]

In [23]:
class CompactNote():
    # Mininal representation of a note
    def __init__(self, pitch, offset, duration):
        self.pitch = pitch
        self.offset = offset
        self.duration = duration
        self.end = Fraction(self.offset) + Fraction(self.duration)
        
    def __repr__(self):
        return f"<CompactNote @ {self.pitch} :: {self.offset} => {self.end} (duration: {self.duration})>"

In [24]:
# Used to estimate what key a track is in
key_pattern = np.array([1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1]) / 7
key_patterns = np.zeros((12, 12))
for i in range(12):
    key_patterns[i] = np.roll(key_pattern, i)

### Utilities for MIDI processing

The following is a lot of functions for processing MIDI, can be ignored for the purposes of this project.

In [25]:
step_to_index = {
    'C': 0,
    'D': 2,
    'E': 4,
    'F': 5,
    'G': 7,
    'A': 9,
    'B': 11
}

index_to_step = ['C', 'D-', 'D', 'E-', 'E', 'F', 'G-', 'G', 'A-', 'A', 'B-', 'B']

def lettername_to_base_index(lettername):
    index = step_to_index[lettername[0]]
    if len(lettername) > 1:
        adjuster = 1 if lettername[1] == '#' else -1
        index += adjuster * (len(lettername) - 1)
    return index

def pitch_str_to_pitch_index(pitch_str):
    pitch, octave = pitch_str[:-1], int(pitch_str[-1])
    pitch_index = lettername_to_base_index(pitch) + octave * 12 - 9
    if pitch_index < 0 or pitch_index > 87:
        return None
    return pitch_index

def pitch_index_to_pitch_str(pitch_index):
    pitch_index += 9
    return index_to_step[pitch_index % 12] + str(pitch_index // 12)

In [26]:
def compact_note(note_):
    return CompactNote(
        pitch=pitch_str_to_pitch_index(str(note_.pitch)),
        offset=note_.offset,
        duration=note_.quarterLength
    )

In [27]:
def get_note_frequencies(arr):
    arr_sum = np.sum(arr, axis=0)
    note_sums = np.zeros(12)
    for i in range(len(arr_sum)):
        note_sums[(i - 3) % 12] += arr_sum[i]
    note_frequencies = note_sums / np.sum(arr)
    return note_frequencies

In [28]:
def estimate_key(arr):
    note_frequencies = get_note_frequencies(arr)
    best_dist = None
    best_index = None
    for i in range(12):
        dist = np.linalg.norm(note_frequencies - key_patterns[i])
        if not best_dist or dist < best_dist:
            best_dist = dist
            best_index = i
    return best_index

In [29]:
def get_center(arr):
    arr_sum = np.sum(arr, axis=0)
    center = np.average(np.arange(88), weights=arr_sum)
    return center

In [30]:
def transpose_to_key(arr, target_key):
    key = estimate_key(arr)
    target_key = 0
    diff = (key - target_key) % 12
    if diff == 0:
        return arr
    center = get_center(arr)
    shift_down_amt = -diff
    shift_up_amt = 12 - diff
    shift_up_is_more_centered = abs((center + shift_up_amt) - 44) < abs((center + shift_down_amt) - 44)
    shift_amt = shift_up_amt if shift_up_is_more_centered else shift_down_amt
    buffer = np.zeros((len(arr), abs(shift_amt)))
    if shift_amt > 0:
        transposed = np.hstack((buffer, arr[:,:-shift_amt]))
    else:
        transposed = np.hstack((arr[:,-shift_amt:], buffer))
    assert estimate_key(transposed) == 0
    assert transposed.shape[1] == 88
    return transposed

In [31]:
def convert_midi_to_compact_notes(midi_file):
    score = converter.parse(midi_file).flatten()
    all_notes = []
    for element in score.notes:
        if element.isNote:
            all_notes.append(element)
        else:
            chord_notes = element.notes
            for chord_note in chord_notes:
                new_note = note.Note(str(chord_note.pitch), quarterLength=element.quarterLength)
                new_note.offset = element.offset
                all_notes.append(new_note)
    compact_notes = []
    for note_ in all_notes:
        compact_note_ = compact_note(note_)
        if compact_note_.pitch is not None:
            compact_notes.append(compact_note_)
    return compact_notes

In [32]:
def percent_of_notes_in_middle_registers(arr):
    return np.sum(arr_c3_to_c6(arr)) / np.sum(arr)

In [33]:
def arr_c3_to_c6(arr):
    return arr[:, 27:63]

In [34]:
def convert_compact_notes_to_array(compact_notes, resolution=12):
    num_timesteps = max(compact_notes, key=lambda x: x.end).end * resolution
    arr = np.zeros((int(num_timesteps), 88), dtype=np.uint8)
    for note_ in compact_notes:
        note_start, note_end = int(note_.offset * resolution), int(note_.end * resolution)
        arr[note_start: note_end, note_.pitch] = 1
    return arr

In [35]:
def convert_midi_to_array(midi_file):
    compact_notes = convert_midi_to_compact_notes(midi_file)
    arr = convert_compact_notes_to_array(compact_notes)
    return arr

In [36]:
def compress_array(arr, allow_empty=True):
    new_arr = np.zeros(arr.shape, dtype=np.uint8)
    i = 0
    for j, timestep in enumerate(arr):
        if j == 0 or not np.array_equal(timestep, arr[j-1]) and (allow_empty or np.any(timestep)):
            new_arr[i] = timestep
            i += 1
    return new_arr[:i]

In [37]:
def arr_to_txt(arr):
    return ' '.join([''.join([str(x) for x in timestep]) for timestep in arr])

In [38]:
def arr_to_chords(arr):
    chords = []
    for timestep in arr:
        note_indices = np.where(timestep == 1)[0]
        chords.append([pitch_index_to_pitch_str(x) for x in note_indices])
    return chords

In [39]:
def note_combinations(chord_notes):
    if len(chord_notes) < 2:
        return []
    combinations = []
    for i, note_a in enumerate(chord_notes[:-1]):
        for j, note_b in enumerate(chord_notes[i+1:]):
            combinations.append([note_a, note_b])
    return combinations

In [40]:
def textify_chord_by_note_pairs(arr):
    compressed_arr = compress_array(arr)
    chords = arr_to_chords(compressed_arr)
    all_pairs = [note_combinations(chord) for chord in chords]
    txt = ''
    for chord_pairs in all_pairs:
        chord_pairs_txt = ' '.join([','.join([x for x in pair]) for pair in chord_pairs])
        chord_pairs_txt = '<chord> ' + chord_pairs_txt + ' </chord>' if chord_pairs_txt else '<nochord>'
        txt += ' ' + chord_pairs_txt
    txt = re.sub(' +', ' ', txt)
    return txt.strip()

### Create datasets

Create raw `txt` files containing text representations of each non-repeated timestep

In [283]:
for file in tqdm(MIDI_FILES):
    # Convert file to array
    arr = convert_midi_to_array(file)
    # Remove duplicate timesteps
    arr_compressed = compress_array(arr)
    with open(TXT_FILEPATH / f"{file.name[:-4]}.txt", "w") as f:
        f.write(arr_to_txt(arr_compressed))

100%|████████████████████████████████████████████████████████████████████████████████████████████████| 284/284 [12:52<00:00,  2.72s/it]


Create note pairs text dataset

In [398]:
for file in tqdm(MIDI_FILES):
    # Convert file to array
    arr = convert_midi_to_array(file)
    # Convert array to note pairs
    text = textify_chord_by_note_pairs(arr)
    with open(TXT_PAIRS_FILEPATH / f"{file.name[:-4]}.txt", "w") as f:
        f.write(text)

100%|████████████████████████████████████████████████████████████████████████████████████████████████| 284/284 [03:05<00:00,  1.53it/s]


Save the raw, unprocessed numpy files converted from the midi files

In [108]:
for file in tqdm(MIDI_FILES):
    arr = convert_midi_to_array(file)
    np.save(NP_FILEPATH / f"{file.name[:-4]}.npy", arr)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 284/284 [03:42<00:00,  1.28it/s]


Save the cleaned numpy files (transposed to the key of C, middle 3 octaves only)

In [107]:
for file in tqdm(MIDI_FILES):
    # Convert file to array
    arr = convert_midi_to_array(file)
    # Transpose to C
    transposed = transpose_to_key(arr, target_key=0)
    # Cut out low and high registers
    cut = arr_c3_to_c6(transposed)
    np.save(NP_CLEANED_FILEPATH / f"{file.name[:-4]}.npy", cut)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 284/284 [04:51<00:00,  1.03s/it]


Save the same cleaned numpy files, but transpose to all 12 keys to augment data

In [22]:
for file in tqdm(MIDI_FILES):
    # Convert file to array
    arr = convert_midi_to_array(file)
    for i in range(12):
        # Transpose to given key
        transposed = transpose_to_key(arr, target_key=i)
        cut = arr_c3_to_c6(transposed)
        np.save(NP_AUGMENTED_FILEPATH / f"{file.name[:-4]}_key{i}.npy", cut)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 284/284 [15:21<00:00,  3.25s/it]
