# Extracting Time Series Data from MIDI Files

Let's start by loading all the file names and their channel mappings.

In [1]:
import os
import midi
import pickle

In [2]:
with open('channel_mappings.pkl', 'r') as f:
    channel_mappings = pickle.load(f)

In [3]:
midi_path = './midi/pop/'
midi_files = os.listdir(midi_path)

In [4]:
assert len(midi_files) == len(channel_mappings)

Before we begin, we'll want to denote which instruments actually play melodies (basically anything that isn't a toneless percussion instrument, sound effect, or "pad" instrument). See the chart in [this](./flattening_tracks.ipynb) notebook for reference. If a given channel uses a "non-melody instrument", we will only extract the rhythmic information from that channel.

In [5]:
melody_instruments = range(88) + range(104, 112)

Now we'll build new object types to contain the time series objects.

In [18]:
class EventSequence(object):
    """A container for a sequence of events. 
    
    Abstract class, to be implemented via MelodySequence
    or RhythmSequence.
    """
    
    def __init__(self, num_events = 0, mpqn = 500000):
        """Initialize object with default of zero events and 
        tempo at 120 bpm (500000 ms per beat).
        """
        
        self.num_events = num_events
        self.mpqn = mpqn
    
    def add_event(self):
        """Add one new event."""
        
        self.num_events += 1
        
    def set_mpqn(self, mpqn):
        """Change tempo in units of ms per beat."""
        
        self.mpqn = mpqn
        
class MelodySequence(EventSequence):
    """A container for a melodic sequence. 
    Inherits from EventSequence.
    """
    
    def __init__(self, num_events = 0, mpqn = 500000):
        super(MelodySequence, self).__init__(num_events, mpqn)
        self.notes = []
        
    def add_note(self, note):
        super(MelodySequence, self).add_event()
        self.notes.append(note)
        
class RhythmSequence(EventSequence):
    """A container for a rhythmic sequence.
    Inherits from EventSequence.
    """
    
    def __init__(self, num_events = 0, mpqn = 500000):
        super(RhythmSequence, self).__init__(num_events, mpqn)
        self.ticks = []
        
    def add_tick(self, tick):
        super(RhythmSequence, self).add_event()
        self.ticks.append(tick)

A couple of helper functions to actually extract the sequence of information from the given channel in the given MIDI file.

In [25]:
def get_rhythm_sequence(rhythm, mfile, channel):
    """Pull out the rhythms from the given channel in mfile.
    
    Inputs: rhythm is an empty RhythmSequence object
            mfile is a midi.containers.Pattern object
            channel is an integer channel number from 0-15
            
    Output: RhythmSequence object filled with ticks from mfile channel
    """
    
    mfile.make_ticks_abs()
    for track in mfile:
        for event in track:
            if isinstance(event, midi.events.NoteOnEvent) and event.channel == channel and event.data[1] != 0:
                rhythm.add_tick(event.tick)
    rhythm.ticks = sorted(rhythm.ticks)
    return rhythm

def get_melody_sequence(melody, mfile, channel):
    """Pull out the melodies from the given channel in mfile.
    
    Inputs: melody is an empty MelodySequence object
            mfile is a midi.containers.Pattern object
            channel is an integer channel number from 0-15 (shouldn't be 9)
            
    Output: MelodySequence object filled with notes from mfile channel
    """
    
    mfile.make_ticks_abs()
    for track in mfile:
        for event in track:
            if isinstance(event, midi.events.NoteOnEvent) and event.channel == channel and event.data[1] != 0:
                melody.add_note((event.data[0], event.tick))
    melody.notes = sorted(melody.notes, key = lambda x: x[1])
    melody.notes = list(zip(*melody.notes)[0])
    return melody




The main function below will loop through all of the assigned channels, create the relevant time series objects, and call the two helper functions above to get a rhythm and/or melody sequence for each instrument in the song.

In [29]:
def get_sequences(mfile, mapping):
    """Extract a list of melody sequences and rhythm sequences from mfile.
    
    Inputs: mfile is a midi.containers.Pattern object
            mapping is a dictionary associating """
    melodies = []
    rhythms = []
    for channel in mapping:
        instruments = mapping[channel]
        if instruments == None:
            continue
        
        rhythm = RhythmSequence()
        melody = MelodySequence()
        melodic_channel = True

        for instrument in instruments:
            if instrument not in melody_instruments:
                melodic_channel = False
                break

        if melodic_channel and channel != 9:
            melodies.append(get_melody_sequence(melody, mfile, channel).notes)

        rhythms.append(get_rhythm_sequence(rhythm, mfile, channel).ticks)

    return melodies, rhythms

We'll do a test run on the first two files in the directory.

In [33]:
mfile1 = midi.read_midifile(midi_path + midi_files[0])
mapping1 = channel_mappings[0]
mfile2 = midi.read_midifile(midi_path + midi_files[1])
mapping2 = channel_mappings[1]

melodies1, rhythms1 = get_sequences(mfile1, mapping1)
melodies2, rhythms2 = get_sequences(mfile2, mapping2)

Note that in general, we have more rhythm sequences than we have melody sequences. This is due to the fact that we extract rhythmic events for **all** instruments, but we only extract melodic information for the melodic subset of instruments defined above.

In [34]:
len(rhythms1) - len(melodies1)

4

In [35]:
len(rhythms2) - len(melodies2)

3

In [36]:
mapping1, mapping2

({0: [1],
  1: [35],
  2: [49],
  3: None,
  4: [5],
  5: [25],
  6: [49],
  7: [119],
  8: [124],
  9: [1],
  10: [98],
  11: [24],
  12: None,
  13: None,
  14: None,
  15: None},
 {0: [0],
  1: [0],
  2: [48],
  3: [33],
  4: [73],
  5: [66],
  6: [27],
  7: [89],
  8: [125],
  9: [0],
  10: None,
  11: None,
  12: None,
  13: None,
  14: None,
  15: None})

We still need to be careful about how to account for tempo changes within these songs. The rhythm sequences are in units of "ticks", which change from file to file and depend on the global resolution and the locally defined tempo of the song. This will be addressed in the following notebook.