# From MIDI Files to Binary Images

This notebook presents an implementation of a function designed to process a compressed file named ```midi_dataset.zip```. This compressed file contains a directory consisting of MIDI files. The output of this function is intended to result in a compressed file named ```dataset.zip```, containing a dataset of binary representations of piano rolls in the form of images.

Additionally, this proposal incorporates the consideration of employing data augmentation techniques to enhance the richness of the aforementioned dataset.

 We acknowledge that a significant portion of the development effort invested in this notebook is derived from [here](https://medium.com/analytics-vidhya/convert-midi-file-to-numpy-array-in-python-7d00531890c).


In [None]:
pip install mido

In [None]:
import mido
import string
import numpy as np
from PIL import Image
import os

In [None]:
!unzip midi_dataset.zip

## Auxiliary Functions

The following functions are used to convert MIDI files to binary arrays. These functions have been adapted for our purposes. For further details on what exactly they do, interested readers are directed to consult the original work [here](https://medium.com/analytics-vidhya/convert-midi-file-to-numpy-array-in-python-7d00531890c).

In [None]:
def msg2dict(msg,step=60):
    result = dict()
    if 'note_on' in msg:
        on_ = True
    elif 'note_off' in msg:
        on_ = False
    else:
        on_ = None
    result['time'] = round(int(msg[msg.rfind('time'):].split(' ')[0].split('=')[1].translate(
        str.maketrans({a: None for a in string.punctuation})))/step)

    if on_ is not None:
        for k in ['note', 'velocity']:
            result[k] = int(msg[msg.rfind(k):].split(' ')[0].split('=')[1].translate(
                str.maketrans({a: None for a in string.punctuation})))
    return [result, on_]

In [None]:
def correct_velocity(vel):
  if vel==0:
    return 0
  else: return 1

def switch_note(last_state, note, velocity, on_=True,offset=0):
    # piano has 88 notes, corresponding to note id 21 to 108, any note out of this range will be ignored
    result = [0] * 88 if last_state is None else last_state.copy()
    if 21 <= note+offset <= 108:
        result[note-21+offset] = correct_velocity(velocity) if on_ else 0
    return result

In [None]:
def get_new_state(new_msg, last_state,offset=0,step=60):
    new_msg, on_ = msg2dict(str(new_msg),step=step)
    new_state = switch_note(last_state, note=new_msg['note'], velocity=new_msg['velocity'], on_=on_,offset=offset) if on_ is not None else last_state
    return [new_state, new_msg['time']]
def track2seq(track,offset=0,step=60):
    # piano has 88 notes, corresponding to note id 21 to 108, any note out of the id range will be ignored
    result = []
    last_state, last_time = get_new_state(str(track[0]), [0]*88,offset,step=step)
    result += ([0]*88)*last_time
    for i in range(1, len(track)):
        new_state, new_time = get_new_state(track[i], last_state,offset)
        if new_time > 0:
            result += [last_state]*new_time
        last_state, last_time = new_state, new_time
    return result

In [None]:
def mid2arry(mid, min_msg_pct=0.1,offset=0,step=60):
    tracks_len = [len(tr) for tr in mid.tracks]
    min_n_msg = max(tracks_len) * min_msg_pct
    # convert each track to nested list
    all_arys = []
    for i in range(len(mid.tracks)):
        if len(mid.tracks[i]) > min_n_msg:
            ary_i = track2seq(mid.tracks[i],offset=offset,step=step)
            all_arys.append(ary_i)
    # make all nested list the same length
    max_len = max([len(ary) for ary in all_arys])
    for i in range(len(all_arys)):
        if len(all_arys[i]) < max_len:
            all_arys[i] += [[0] * 88] * (max_len - len(all_arys[i]))
    all_arys = np.asarray(dtype=np.dtype('uint8'),a=all_arys)
    all_arys = all_arys.max(axis=0)
    return all_arys

## Function ```make_dataset```

The function that produces the binary image dataset. The parameters are used to determine different aspects of the images produced, as well as some data augmenting techniques. The parameters are:
- **step:** Midi files have a certain number of ticks per note (in the dataset provided there are 120 ticks in a semiquaver - or the sixteenth note). The step indicates the accuracy of the midi file that we want to maintain. The default value of the step is 60, meaning having accuracy up to the demisemiquaver - or the thirty-second note.
- **span:** The length of the sequence used for each image - the number of pixels of the time axis in the piano roll. Default value is 192, which means that an image contains 6 measures of a piece written in 4/4, when the step is 60.
- **overlap:** This is a parameter for data augmentation. It indicates the degree that the images overlap. If overlap=0, there is no overlap. For overlap=1 then each image starts from the half of the previous image, for overlap=2 from the one third etc. Default value is 0 - no overlap.
- **offsets:** This is a parameter for data augmentation. It is a list of the intervals (in number of semitones) to transpose each midi file before making the images. Default value is [0] - just one time without transposing.
- **halfs:** This is a parameter for data augmentation. It is a list of the times that we want to half the speed (or double the length of the piece), before making the images. Default value is [0] - just one time at the original speed.

In [None]:
def make_dataset(step=60,span=192,overlap=0,offsets=[0],halfs=[0]):
  os.makedirs('dataset')
  for midifile in os.listdir('midi_dataset'):
      f = os.path.join('midi_dataset', midifile)
      mid = mido.MidiFile(f, clip=True)

      arrays=[]
      for offset in offsets:
        array = mid2arry(mid=mid,offset=offset,step=step)
        for half in halfs:
          if half==0:arrays.append(array)
          else:
            arrays.append(np.repeat(array,[2**(half)]*array.shape[0],axis=0))

      ofc=0
      hac=0
      for result_array in arrays:
        for i in range(0,result_array.shape[0]-1,int(span/(overlap+1))):
          if i+span>result_array.shape[0]: continue
          temp=((result_array)[i:i+span,0:88])*255
          img = Image.fromarray(temp,mode='L').convert('1')
          img.save('dataset/'+midifile[0:len(midifile)-4]+'Of'+str(offsets[ofc])+'Ha'+str(halfs[hac])+'Seg'+str(int(i/(span/(overlap+1))))+'.png')
        hac+=1
        if hac==len(halfs):
          ofc+=1
          hac=0

  !zip -r '/content/dataset.zip' './dataset'