# Introduction

This tutorial will introduce you to the basics of music manipulation in Python, partiularly through the MIDI file format, which is an easily parseable format for analyzing the structure of music files without performing complex tasks such as frequency analysis. We will cover ways of reading in MIDI files and extracting the relevant information regarding the notes in a given song.

The next part of the tutorial will be an introduction to Markov chains, a way of modeling stochastic processes. We will model a song as a Markov model, and write a program that will generate musical notes based on the works of a particular composer, with the ultimate goal of generating music in the style of that composer. We will be using some collected works of Mozart as our training set.

Finally, we will discuss the results of using Markov chains for generating music, and discuss some of the limitations of the Markov model in this context.

## Installing the Libraries

We will be using the `mido` library for processing MIDI data. For the Markov models, we will be using numpy arrays and matrices.

You can install the libraries with the following command:

```
pip install mido numpy
```

In [1]:
import numpy as np
from scipy.sparse import coo_matrix
from mido import Message, MidiFile, MidiTrack
import os

## The MIDI File Format

MIDI is a structured file format for encoding electronic music data.

As opposed to other music file formats such as mp3 that contain actual audio data, a MIDI file is encoded as a sequence of **events**. Some examples of MIDI events are **note on**, **note off**, **set tempo**, **time signature**, and **key signature**.

MIDI files also can contain multiple **tracks**, which are each an independent sequence of events. There are multiple types of MIDI files - *type 0* files have a single track, *type 1* files have multiple synchronous tracks that play at the same time, and *type 2* files have multiple asynchronous tracks that may not be played at the same time. 

Each MIDI file has a tempo field that encodes the number of microseconds per quarter note. This is the inverse of the common tempo measurement, bpm (beats per minute). The *time* field of each MIDI event is a relative offset from the previous event, in microseconds.

Here is an example of a simple song (Twinkle Twinkle Little Star, which interestingly enough, was also composed by Mozart!).

![MIDI Example](midi_pianotrack_screenshot.jpg)

This is a piece of software called Synthesia that is commonly used for visualizing MIDI files. The bars coming down towards the piano are the notes in the MIDI file. The **blue** and **green** notes are part of two different **tracks**. 

Each individual bar has two events - a **note on** event which happens when the bottom edge of the bar hits the piano, and a **note off** event which happens when the top edge of the bar hits the piano. The speed of the bars as they descend is controlled by the **tempo** and speed settings of the MIDI file.

You can watch the full video [here](https://www.youtube.com/watch?v=KKCsujeeu8o) for a more intuitive visual expression of a MIDI file.

## Loading the Data

The dataset we will be using is a collection of Mozart Piano Sonatas. The ultimate goal will be to write a program that can generate a sequence of notes that sounds "Mozart-like". 

We will load the MIDI files and store them as lists of notes. There will be two lists per song - the right hand and left hand notes. For our input data, these are encoded in channels 0 and 1 of the MIDI file, respectively. These sequences do not encode any information about time or duration, and we will discuss this assumption in more detail later.

Opening a MIDI file is simple - just provide the filename to the library.

In [2]:
test_midi = MidiFile('Mozart/mozk281a.mid')

The MidiFile object contains all the events in the file. Metadata such as MIDI file type and number of tracks can be accessed directly from this object.

In [3]:
print("MIDI file type: " + str(test_midi.type))
print("Number of tracks: " + str(len(test_midi.tracks)))

MIDI file type: 1
Number of tracks: 16


We can see that this is a *type 1* file, which has multiple synchronous tracks. MIDI files by default have 16 tracks. However, in our case, only two of them are being used - the rest contain basic header data and are otherwise empty.

You can access the *type* of a message directly as well, which is helpful for us to find the note events. Iterating over all the messages in a file is equally simple. To prevent large unnecessary output, the example loop here does nothing.

In [4]:
for msg in test_midi:
    pass

To get a sequence of notes, we need to look at the **note_on** and **note_off** events. Each event has a *time* attribute that represents a relative time offset to the previous event in the sequence. When *time* is zero, the event occurs at the same time as the most recent note with a non-zero time value, which we will use to determine chords versus single notes. 

Essentially, to extract a chord from a MIDI file, we take all the events between two events with a nonzero *time* field.

In [5]:
# Construct a dictionary mapping MIDI note numbers into human-readable notes
all_notes = ["C", "C#/Db", "D", "D#/Eb", "E", "F", "F#/Gb", "G", "G#/Ab", "A", "A#/Bb", "B"]
midi_note_map = {}
for i in range(128):
    midi_note_map[i] = all_notes[i % 12]

# Loads an individual MIDI file into a list of notes
def load_midifile(filename):
    mid = MidiFile(filename)
    
    right_hand_notes = []
    left_hand_notes = []
    
    # When parsing MIDI notes into a note sequence:
    # 1) We only care about note_on messages to get a sequence of notes
    # 2) Treat notes with a time value of 0 as part of the same chord as the most recent note with a non-zero time value
    # 3) Use a set to discard duplicate note events (i.e. noisy/faulty MIDI encoding)
    
    current_note_right = set()
    current_note_left = set()
    for msg in mid:
        if ((msg.type == 'note_on') or (msg.type == 'note_off')):
            if (msg.time > 0):
                right_hand_notes.append(list(current_note_right))
                left_hand_notes.append(list(current_note_left))
                
                # Time value is zero, reset the current note
                current_note_right = set()
                current_note_left = set()
            
            if (msg.type == 'note_on'):
                if (msg.channel == 0):
                    current_note_right.add(msg.note)
                if (msg.channel == 1):
                    current_note_left.add(msg.note)

    # Clean data of empty notes
    # Convert to tuple for immutability later
    right_hand_notes = [tuple(x) for x in right_hand_notes if x != [] ]
    left_hand_notes = [tuple(x) for x in left_hand_notes if x != [] ]
    
    return (right_hand_notes, left_hand_notes)
    
# Process all the songs in the Mozart folder
def load_all_midifiles():
    right_hand_sequences = []
    left_hand_sequences = []
    
    for filename in os.listdir('Mozart'):
        if filename.endswith(".mid"):
            (right_hand, left_hand) = load_midifile('Mozart/' + filename)
            right_hand_sequences.append(right_hand)
            left_hand_sequences.append(left_hand)
            
    return (right_hand_sequences, left_hand_sequences)

(all_right_hand, all_left_hand) = load_all_midifiles()

# Print out, to verify, the first 20 notes/chords of the first file, right hand
# Use the dictionary constructed earlier to map MIDI note numbers into human-readable notes
for chord in all_right_hand[0][0:20]:
    human_readable = ""
    for note in chord:
        human_readable += midi_note_map[note] + ' '
    print(human_readable)

A#/Bb 
C 
A#/Bb 
C 
A#/Bb 
C 
A#/Bb 
C 
A#/Bb 
C 
A 
A#/Bb 
C 
D 
A#/Bb 
A 
C 
A#/Bb 
A 
D#/Eb 


This matches the first measure right-hand notes from Mozart's Sonata no. 3 in B-flat Major, which is the first file in the list. If you don't know how to read sheet music, trust me - it matches.

![Mozart Sonata](mozart_measure1.jpg)

Finally, we will write a quick function that takes a sequence of notes and generates a MIDI file from it. Each MIDI file needs a track, so we will construct a single track and append all our note messages to it.

Earlier, we stated that we were not parsing any timing information from the MIDI file when reading in the note sequences. For outputing a file, we need some timing information - for the sake of this tutorial, we will the default bpm (beats per minute) and speed of the MIDI format.

In [6]:
def sequence_to_MIDI(note_sequence, output_filename):
    outfile = MidiFile()
    track = MidiTrack()
    outfile.tracks.append(track)
    
    track.append(Message('program_change', program=1, time=0))
    
    for chord in note_sequence:
        if (len(chord) > 1):
            # This is a chord - the first note should have a nonzero time, the rest should have a time of zero
            # Append all the 'note_on' messages first, then all the 'note_off' messages
            for i in range(len(chord)):
                if (i == 0):
                    noteOnMessage = Message('note_on', note=chord[i], velocity=60, time=100)
                    track.append(noteOnMessage)
                else:
                    noteOnMessage = Message('note_on', note=chord[i], velocity=60, time=0)
                    track.append(noteOnMessage)
            for i in range(len(chord)):
                if (i == 0):
                    noteOnMessage = Message('note_off', note=chord[i], velocity=60, time=100)
                    track.append(noteOnMessage)
                else:
                    noteOnMessage = Message('note_off', note=chord[i], velocity=60, time=0)
                    track.append(noteOnMessage)
        else:
            # Single note - play it
            noteOnMessage = Message('note_on', note=chord[0], velocity=60, time=100)
            noteOffMessage = Message('note_off', note=chord[0], velocity=60, time=100)
            track.append(noteOnMessage)
            track.append(noteOffMessage)
            
    outfile.save(output_filename)
    
# Here, I will generate a MIDI file from the right-hand melody sequence of the first Mozart piece
sequence_to_MIDI(all_right_hand[0], 'mozart_rh_midi.mid')

You can use Windows Media Player to directly play the MIDI file, if you are using Windows.

VLC Media Player on Mac should have a codec able to play MIDI as well.

If not, I have included all the MIDI files generated from this project (this test sequence of the first Mozart piece, as well as a generated test sequence from the end of the tutorial) as .mp3 files in the folder as well, which can be played with any media player.

## Markov Models

We will now discuss Markov models. 

A **Markov chain** is a stochastic model that acts as a sort of state machine, with transitions between states behaving according to probabilistic rules. Essentially, it is a system where each state has a certain probability of transitioning to some other state(s), or itself. 

Here is a simple Markov chain, with four states and various probabilities of transitioning between them.

![Markov Chain](markov_chain.png)

State 3, for example, has equal probability (0.25) of transitioning to itself or any of the 3 other states.

The key property of a Markov model is that the probability of transitioning to a future state depends only on the current state. This means we do not need to maintain a history of state transitions to compute what state we transition to next.

Transition probabilities of Markov models can be expressed as a **transition matrix**, which is a matrix containing the probability of transitioning from one state to another. The element *(i,j)* of a transition matrix contains the probability of transitioning from state *i* to state *j*.

We can construct the transition matrix as a standard numpy matrix. The transition matrix for the above Markov chain example is:

In [7]:
example_Tmatrix = np.matrix([[0.4, 0.6, 0, 0], [0.6, 0.4, 0, 0], [0.25, 0.25, 0.25, 0.25], [0, 0, 0, 1]])
print(example_Tmatrix)

[[ 0.4   0.6   0.    0.  ]
 [ 0.6   0.4   0.    0.  ]
 [ 0.25  0.25  0.25  0.25]
 [ 0.    0.    0.    1.  ]]


## Markov Models for Music

We will now construct a Markov model representing our musical sequence. The states in our model will be the notes and chords that exist in the song. We will compute the transition probabilities using the counts of notes from our training data, in a similar way to computing the counts of words in an n-gram model approach to text classification.

In the code, when referring to a "note", we are referring to either a note or a chord, since we processed the MIDI file in a way that preserved chord information separately from individual note information.

We will create our music model by first constructing the counts of notes following other notes from the training data.

In [8]:
def compute_note_counts(note_sequences):
    # Construct a dictionary mapping {note: dictionary of counts of following notes}
    note_counts = {}
    
    # This stores the number of times a note follows any other note, from our training data
    for song in note_sequences:
        for i in range(len(song) - 1):
            current_note = song[i]
            next_note = song[i+1]
            if current_note in note_counts:
                if next_note in note_counts[current_note]:
                    note_counts[current_note][next_note] += 1
                else:
                    note_counts[current_note][next_note] = 1
            else:
                note_counts[current_note] = {}
                note_counts[current_note][next_note] = 1

    return note_counts

right_hand_counts = compute_note_counts(all_right_hand)

Next, we will convert this dictionary of counts into a Markov transition matrix. We do this first by computing a mapping of notes to indices. We will use this mapping for the indices of the transition matrix, so we have a consistent ordering of notes (which our initial counts dictionary does not have).

In [9]:
# Compute maps between notes and indices of the transition matrix
def compute_note_idx_maps(note_counts):
    # Get a set of all possible unique notes
    all_notes = set(note_counts.keys())
    for chord, following_notes in note_counts.items():
        for note in following_notes.keys():
            all_notes.add(note)
            
    # Compute a mapping from unique notes to indices
    # Also compute the inverse mapping - useful when generating notes
    note_to_idx = {}
    idx_to_note = {}
    i = 0
    for chord in all_notes:
        note_to_idx[chord] = i
        idx_to_note[i] = chord
        i += 1
        
    return (note_to_idx, idx_to_note)

# Compute the transition matrix in a sparse matrix representation
def compute_transition_matrix(note_counts, note_to_idx):
    row_idx = []
    col_idx = []
    mat_data = []
    # Using this map, compute the transition matrix
    for chord, following_notes in note_counts.items():
        next_note_sum = sum([v for v in following_notes.values()])
        
        # Construct a row of the transition matrix
        for next_note, count in following_notes.items():
            note_prob = count / next_note_sum
            
            # Only append nonzero entries to the matrix - construct it sparsely
            if (note_prob > 0):
                row_idx.append(note_to_idx[chord]) # Row index is the current note
                col_idx.append(note_to_idx[next_note]) # Column index is the next note
                mat_data.append(note_prob) # Transition probability
                
    # Construct the transition matrix
    T = coo_matrix((mat_data, (row_idx, col_idx)))
    
    return T

(note_to_idx, idx_to_note) = compute_note_idx_maps(right_hand_counts)
T = compute_transition_matrix(right_hand_counts, note_to_idx)

Finally, we will use the Markov transition probability matrix to generate a sequence of notes. We start by selecting a random note using the overall distribution of notes (i.e. the first note we select is based on how often notes appear in general). Then, we will use the transition matrix to select probabilities for the next notes that follow.

In [10]:
# Generate a sequence of notes 
def generate_note_sequence(num_notes, note_counts, T, note_to_idx, idx_to_note):
    # Compute the sums of each note - lets us compute the probabilities easier
    note_count_sums = {}
    for key, value in note_counts.items():
        note_count_sums[key] = sum([v for v in value.values()])
        
    total_counts = sum([v for v in note_count_sums.values()])
    
    # Compute the overall probability of any given note being chosen
    note_probability = np.array(list(note_count_sums.values())) / total_counts
    
    # Select the first note to start with
    first_note = np.random.choice(np.array(list(note_count_sums.keys())), p=note_probability)
    
    # Construct a sequence of notes with a random note to start, using the overall probabilities to select the note
    new_notes = []
    new_notes.append(np.random.choice(np.array(list(note_count_sums.keys())), p=note_probability))
    
    while(len(new_notes) < num_notes):
        last_note = new_notes[-1]
        last_note_idx = note_to_idx[last_note]
        
        # Get the row of the transition matrix corresponding to the last note
        next_note_probs = T.todense()[last_note_idx,:]
        next_possible_notes = np.linspace(0, T.shape[0]-1, T.shape[0])
        
        next_possible_notes = np.reshape(next_possible_notes, next_note_probs.shape).flatten()
        next_note_probs = np.array(next_note_probs).flatten()
        
        # Select the next note
        next_note_idx = np.random.choice(next_possible_notes, p=next_note_probs)
        next_note = idx_to_note[next_note_idx]
        
        new_notes.append(next_note)
            
    return new_notes

# Generate a test sequence and 
test_sequence = generate_note_sequence(50, right_hand_counts, T, note_to_idx, idx_to_note)
sequence_to_MIDI(test_sequence, 'test_sequence.mid')

## Conclusions

After listening to the test sequence generated by our Markov chain generator, we can see that it has some hints of Mozart, but in general, sounds fairly haphazard and random when compared to the original sonatas. There are a few reasons for this, and they involve the limitations of Markov models for processing this kind of data.

1. **A Markov model has no notion of 'history'.**

    Music inherently has a notion of 'history'. Chord progressions are just one example - specific sequences of chords. A Markov model has no way of encoding this kind of history, since the probability of the next note depends only on the note before it.

    A possible solution to this would be to employ a similar approach as the n-gram word model, which uses multiple words (or in this case, notes) to predict the next note to be played. However, this has a tendency to overfit, and whereas it might work with words, would not work as well with music due to the next limitation.

2. **Music has structure.**

    Just as a story has a beginning, middle, and end, musical pieces have distinct sections. They also have individual themes (short melodic sequences) that are often varied upon and repeated in different ways throughout a piece. This kind of big-picture history cannot be captured by a Markov model, even if it were to use multiple notes to predict the next note.

3. **We made no assumptions about timing when parsing our training data.**

    This is an assumption we made at the start - that every note has equal duration. Obviously, this is not the case with actual music. However, this becomes a problem of size when encoding our data set. Take, for example, a piano. There are 88 unique keys. Up to 10 of these can be pressed at any given time (as a pianist has ten fingers). Therefore, to uniquely encode all possible chords that can be played on a piano, we require over 4 trillion unique states. This becomes intractable very quickly, and that's still ignoring how long each note is played for! 
    
    In addition, the MIDI file format is limited in recording high-level information about a song, such as overall structure and themes.
    
Overall, we have found that Markov models are limited in their ability to generate musical sequences. In addition, we have noticed that the MIDI file format is also limited in encoding important information about music. 

## Possible Further Work

Markov models can be very useful for generating themes and shorter sequences, however. Listening to the generated sequences, they do sound reasonably "Mozart-y", and with some improvements, a Markov chain-based approach could be used to generate individual themes or shorter melodic sequences of a song.

Then, a secondary algorithm could be used to assemble these themes into an actual song, taking into account knowledge of music structure and pacing - as well as coming up with variations on the themes produced by the Markov model. 

## Additional Resources and References

* [MIDO - Python MIDI library](https://mido.readthedocs.io/en/latest/)
* [The MIDI File Format](https://www.csie.ntu.edu.tw/~r92092/ref/midi/)
* [Visual Markov chain explanation](http://setosa.io/ev/markov-chains/)
* [Recurrent Neural Networks for Music Generation](https://medium.com/artists-and-machine-intelligence/neural-nets-for-generating-music-f46dffac21c0)