# <center>Generating Music Using LSTM Cells</center>

## Preface

This workbook will implement modified code from [this](https://github.com/corynguyen19/midi-lstm-gan) GitHub repo.

The idea is to read in MIDI files and convert them to arrays of notes. Then an RNN will be trained to predict the next note. Finally, music is generated by feeding a random string of notes to the RNN and having it iteratively predict the next note to form a song one note at a time.

The idea for this style of generating music comes from [The Unreasonable Effectiveness of RNNs](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) where Andrej Karpathy demonstrates that an RNN trained on characters is capable of writing coherent sentences. This is why we believed it would be effective for generating music one note at a time.

We chose to use videogame music primarily because it has a catchy, but simple structure with mostly piano music. This means that it is reasonable to assume each file has 1 core instrument and it has been shown by other work that piano music is easiest to learn due to its mathematical nature.

## Introduction

In recent times, generative neural networks have been seen to be applied for artistic pursuits such as image adjusting, style transformation and photo retouching. A GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers.  Sticking to the original proposition of GANs, a form of generative model for unsupervised learning, our goal is to train a RNN, recurrent neural network, to produce music in the style of the input music, that sounds at least superficially authentic to human observers.<br>
Our project will be an extension from the works of Cory Nguyen and co in their project, Generating Pokemon-Inspired Music from Neural Networks. By treating notes and chords within MIDI files as discrete sequential data, we were able to train a RNN to produce new MIDI files whose content sounds as if it was recorded by a human. The new generated music can be outputted in a variety of musical instruments.

Keywords: Generative Neural Networks, Recurrent Neural networks, MIDI files.

In [9]:
# Imports
import glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import time
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' # Suppresses tensorflow INFO and depreciation warnings 
# !pip install music21
from music21 import converter, instrument, note, chord, stream, duration
from keras import Input
from keras.models import load_model, Model, Sequential
from keras.callbacks import Callback, ModelCheckpoint, History
from keras.layers import CuDNNLSTM, LSTM, Bidirectional, Dense, Dropout, Activation
from keras.utils import np_utils

## Loading and Cleaning the Data

First, all of the MIDI files should be read and parsed. We will take information on the offset, duration and pitch of each note and chord used in all MIDI files. This assumes that each song only has 1 primary instrument, which seems to hold up for the pokemon dataset.

Choose the dataset by setting the *input_dir_choice* variable to or by manually choosing the *input_path* variable.

**NOTE:** If you are planning on using the pretrained model, then select *input_dir_choice* = 0 for the *Pokemon GSC* MIDIS<br>
This is because the Pokemon GSC MIDIs have been shown to produce good results quickly since it is only ~60 songs and they are of the same game so they have a fairly consistent sound

In [3]:
def get_notes(path):
    """
        Gets all notes and chords from midi files
    """
    notes = []

    for file in glob.glob(path + "*.mid"):        
        song = []
        midi = converter.parse(file)
        
        print("Parsing %s" % file)
        
        notes_to_parse = None

        try: # file has instrument parts
            s2 = instrument.partitionByInstrument(midi)
            notes_to_parse = s2.parts[0].recurse() 
        except: # file has notes in a flat structure
            notes_to_parse = midi.flat.notes

        for element in notes_to_parse:
            if isinstance(element, note.Note):
                song.append([str(element.pitch), element.offset, element.duration])
            elif isinstance(element, chord.Chord):
                song_note = '.'.join(str(n) for n in element.normalOrder)
                song.append([song_note, element.offset, element.duration])
        notes.append(song)

    return notes

def get_notes_with_key(path, filter_key, mode):
    """
        Gets all notes and chords from midi files
    """
    notes = []

    for file in glob.glob(path + "*.mid"):        
        song = []
        midi = converter.parse(file)
        
#         Only use music of the same key
        key = midi.analyze('key')
        if(mode==0):
            key_string = str(key.tonic.name)
        elif(mode==1):
            key_string = str(key.mode)
        else:
            key_string = str(key.tonic.name + key.mode)
            
        if(key_string==filter_key):
            print("Parsing %s" % file)

            notes_to_parse = None

            try: # file has instrument parts
                s2 = instrument.partitionByInstrument(midi)
                notes_to_parse = s2.parts[0].recurse() 
            except: # file has notes in a flat structure
                notes_to_parse = midi.flat.notes

            for element in notes_to_parse:
                if isinstance(element, note.Note):
                    song.append([str(element.pitch), element.offset, element.duration])
                elif isinstance(element, chord.Chord):
                    song_note = '.'.join(str(n) for n in element.normalOrder)
                    song.append([song_note, element.offset, element.duration])
            notes.append(song)

    return notes

""" Train a Neural Network to generate music """
# Get notes from midi files
input_dir_choice = 0
input_dir_names = ["Pokemon GSC", "Pokemon", "LoZ OOT", "Pokemon Route", "Undertale"]

input_path = "../MIDIs/" + input_dir_names[input_dir_choice] + " MIDIs/"
# example of each mode: 0 - C, 1 - major, 2 - Cmajor
# notes = get_notes_with_key(input_path, "Cmajor", 2)
notes = get_notes(input_path)

Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon Gold, Silver, Crystal - Cinnabar Island (HGSS Version).mid
Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon Gold, Silver, Crystal - S.S. Aqua .mid
Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon GoldSilverCrystal - Azalea TownBlackthorn City.mid
Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon GoldSilverCrystal - Bicycle.mid
Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon GoldSilverCrystal - Bug Catching Contest.mid
Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon GoldSilverCrystal - Burned Tower.mid
Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon GoldSilverCrystal - Champion Battle.mid
Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon GoldSilverCrystal - Cherrygrove CityMahogany Town.mid
Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon GoldSilverCrystal - Dance Theatre.mid
Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon GoldSilverCrystal - Dark Cave.mid
Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon GoldSilverCrystal - Dragons Den.mid
Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon GoldSilverCrystal -

Now each MIDI file read in can be analyzed for its key. We found that training on all songs of a similar key lead to good sounding music, but it hangs around the core note too much for it to seem realistic and interesting.

In [4]:
def print_key(path):
    key_count = dict()
    for file in glob.glob(path + "*.mid"):
        print("Parsing %s" % file)
        
        song = []
        midi = converter.parse(file)
        
        key = midi.analyze('key')
        key_string = key.tonic.name + key.mode
        if (key_string in key_count): 
            key_count[key_string] += 1
        else: 
            key_count[key_string] = 1
        print(key.tonic.name, key.mode)
    return key_count

key_count = print_key(input_path)
# key_count = print_key("../Undertale MIDIs/")
key_count

Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon Gold, Silver, Crystal - Cinnabar Island (HGSS Version).mid
G major
Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon Gold, Silver, Crystal - S.S. Aqua .mid
G major
Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon GoldSilverCrystal - Azalea TownBlackthorn City.mid
C# major
Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon GoldSilverCrystal - Bicycle.mid
E minor
Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon GoldSilverCrystal - Bug Catching Contest.mid
E minor
Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon GoldSilverCrystal - Burned Tower.mid
E minor
Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon GoldSilverCrystal - Champion Battle.mid
G# minor
Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon GoldSilverCrystal - Cherrygrove CityMahogany Town.mid
F major
Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon GoldSilverCrystal - Dance Theatre.mid
A minor
Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon GoldSilverCrystal - Dark Cave.mid
A- major
Parsing ../MIDIs/Pokemon GSC MIDIs\Pokemon GoldSilverCryst

{'Gmajor': 5,
 'C#major': 1,
 'Eminor': 3,
 'G#minor': 3,
 'Fmajor': 3,
 'Aminor': 3,
 'A-major': 4,
 'C#minor': 3,
 'Cmajor': 8,
 'Emajor': 4,
 'Dmajor': 8,
 'B-minor': 4,
 'E-major': 3,
 'Bmajor': 2,
 'Amajor': 2,
 'Fminor': 1,
 'Cminor': 1,
 'Dminor': 1,
 'F#major': 1,
 'E-minor': 1}

For the Pokemon GSC dataset, there is a good spread of keys. Though this requires the model to train more to learn various patterns, since it is a small dataset it can be learned quickly.

Next, we find all possible notes, offsets and durations. We also convert the offsets into relative offsets to the next note, rather than absolute offsets from the start of the song. This is because the songs will be split up into many segments and each segment should be equally valid data.

The durations are also converted from music21's Duration objects to floats, which can be more easily converted to a machine-learnable format.

In [5]:
possibleNotes = set([item[0] for sublist in notes for item in sublist])

# Processing for offsets
possibleOffsets = []
possibleDurations = []

# For each song
for index, song in enumerate(notes):
    song_length = len(song)
    
    # For each note, calculate the difference in offset between this and the previous note
    song_offsets = []
    song_durations = []
    for idx in range(song_length):
        offset = offset = round(song[idx][1] - song[idx - 1][1], 3) if idx > 1 else 0.0
        song_offsets.append(offset)
        if offset not in possibleOffsets:
            possibleOffsets.append(offset)
        
        duration = song[idx][2].quarterLength
        song_durations.append(duration)
        if duration not in possibleDurations:
            possibleDurations.append(duration)
            
    # Update the notes to reflect this
    for idx in range(song_length):
        notes[index][idx][1] = song_offsets[idx]
        notes[index][idx][2] = song_durations[idx]

n_notes = len(possibleNotes)
n_offset = len(possibleOffsets)
n_duration = len(possibleDurations)


possibleNotes = np.array(list(possibleNotes))
possibleOffsets = np.array(list(possibleOffsets))
possibleDurations = np.array(list(possibleDurations))
notes = np.array([list([list(subsublist) for subsublist in sublist]) for sublist in notes])
len(possibleNotes), len(possibleOffsets), len(possibleDurations)

(306, 25, 29)

Now we will prepare the sequences of notes by looking at each song individually. We will first grab an arrays of size **sequence_length** from each song. I have selected 100 as the sequence length because then it is able to learn patterns across 20-30 seconds. Then I will map the notes, offsets and durations to integers and normalize between 0-1 so that they can be more easily learned by the model.

In [6]:
def prepare_sequences(notes, possibleNotes, possibleOffsets, possibleDurations):
    """ Prepare the sequences used by the Neural Network """
    song_end_indices = []
    sequence_length = 100
    step_size = 1

    # create a dictionary to map pitches to integers
    pitchnames = sorted(possibleNotes)
    note_to_int = dict((note, number) for number, note in enumerate(pitchnames))
    
    # create a dictionary to map offset to integers
    offsetnames = sorted(possibleOffsets)
    offset_to_int = dict((offset, number) for number, offset in enumerate(offsetnames))
    
    # create a dictionary to map duration to integers
    durationnames = sorted(possibleDurations)
    duration_to_int = dict((duration, number) for number, duration in enumerate(durationnames))
    
    # find number of each possible choice for normalization
    n_notes = len(possibleNotes)
    n_offset = len(possibleOffsets)
    n_duration = len(possibleDurations)

    network_input = []
    network_output_notes = []
    network_output_offset = []
    network_output_duration = []


    # create input sequences and the corresponding outputs
    for song in notes:
        for i in range(0, len(song) - sequence_length, step_size):
            sequence_in = song[i:i + sequence_length]
            sequence_out = song[i + sequence_length]
            network_input.append([np.array([note_to_int[row[0]] / float(n_notes), offset_to_int[row[1]] / float(n_offset), duration_to_int[row[2]] / float(n_duration)]) for row in sequence_in])
            network_output_notes.append(np.array([note_to_int[sequence_out[0]]]))
            network_output_offset.append(np.array([offset_to_int[sequence_out[1]]]))
            network_output_duration.append(np.array([duration_to_int[sequence_out[2]]]))
        song_end_indices.append(len(network_input)-1)


    # reshape the input into a format compatible with LSTM layers
    n_patterns = len(network_input)
    network_input = np.reshape(network_input, (n_patterns, sequence_length, 3))

    # Make one-hot-encoding
    network_output_notes = np_utils.to_categorical(network_output_notes, num_classes=n_notes)
    network_output_offset = np_utils.to_categorical(network_output_offset, num_classes=n_offset)
    network_output_duration = np_utils.to_categorical(network_output_duration, num_classes=n_duration)


    return (network_input, network_output_notes, network_output_offset, network_output_duration, song_end_indices)

network_input, network_output_notes, network_output_offset, network_output_duration, song_end_indices = prepare_sequences(notes, possibleNotes, possibleOffsets, possibleDurations)
network_input.shape

(19486, 100, 3)

## Constructing the model

I will now construct the model using CuDNNLSTM cells because they are significantly faster than regular LSTM cells due to being optimized for CuDA. I will have three CuDNNLSTM layers, followed by two dense layers and a final softmax activation layer for each output relating to pitch, offset and duration to output the most probable result.

This was build using keras' Model API to allow multiple inputs and outputs and so that we could better manage the complexity of the structure as shown in the image below:


![RNN Model Image][model_image]

[model_image]: RNNModel.png "RNN Model"

**Hyperparameters:**
* Optimizer - ADAM because it is considered one of the best
* Loss - categorical_crossentropy because it penalizes wrong predictions of multi-class problems best
* Epochs - More epochs are generally better as long as they don't overfit. I track loss over time and have checkpoints every 5 epochs so this will not be a problem
* Batch Size - This determines how many instances should be considered in each batch.
    * We found that a smaller batch size tends to work better, but also takes longer to train so there is a tradeoff between batch_size and number of epochs in terms of time efficiency.

In [7]:
def create_network(network_input, n_notes, n_offset, n_duration):
    """ create the structure of the neural network """
    input = Input(shape=(network_input.shape[1], network_input.shape[2]))
    lstm_1 = CuDNNLSTM(512, input_shape=(network_input.shape[1], network_input.shape[2]), return_sequences=True)(input)
    dropout_1 = Dropout(0.3)(lstm_1)
    lstm_2 = Bidirectional(CuDNNLSTM(512, return_sequences=True))(dropout_1)
    dropout_2 = Dropout(0.3)(lstm_2)
    lstm_3 = Bidirectional(CuDNNLSTM(512))(dropout_2)
    dropout_3 = Dropout(0.3)(lstm_3)
    dense_1 = Dense(128)(dropout_3)
    dropout_4 = Dropout(0.3)(dense_1)
    dense_2 = Dense(128)(dropout_4)
    dropout_5 = Dropout(0.3)(dense_2)
    output_notes = Dense(n_notes, activation='softmax')(dropout_5)
    output_offset = Dense(n_offset, activation='softmax')(dropout_5)
    output_duration = Dense(n_duration, activation='softmax')(dropout_5)
    
    model = Model(inputs=input, outputs=[output_notes, output_offset, output_duration])
    model.compile(loss=['categorical_crossentropy', 'categorical_crossentropy', 'categorical_crossentropy'], optimizer='adam', loss_weights=[1., 1., 1.])
    return model

# Set up the model
model = create_network(network_input, n_notes, n_offset, n_duration)
history = History()

# Save on every 10 epoches (because training isn't cheap!!!) and can use this to generate music for each checkpoint
outputDest = '../output/LSTM_' + input_dir_names[input_dir_choice] + '_' + str(int(time.time())) + '/'
if not os.path.exists(outputDest):
    os.makedirs(outputDest)
cp_callback = ModelCheckpoint(filepath=outputDest + "LSTMmodel_weights_{epoch:02d}.hdf5",
                              save_weights_only=True,
                              verbose=1,
                              period=10)

# Set parameters
n_epochs = 200
batch_size = 80
model.summary()

W1024 21:03:42.147915 18712 deprecation_wrapper.py:119] From C:\Users\Michael\Anaconda3\envs\CITS4404\lib\site-packages\keras\backend\tensorflow_backend.py:66: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W1024 21:03:42.164830 18712 deprecation_wrapper.py:119] From C:\Users\Michael\Anaconda3\envs\CITS4404\lib\site-packages\keras\backend\tensorflow_backend.py:541: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W1024 21:03:43.961166 18712 deprecation_wrapper.py:119] From C:\Users\Michael\Anaconda3\envs\CITS4404\lib\site-packages\keras\backend\tensorflow_backend.py:4432: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W1024 21:03:44.342514 18712 deprecation_wrapper.py:119] From C:\Users\Michael\Anaconda3\envs\CITS4404\lib\site-packages\keras\backend\tensorflow_backend.py:148: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 100, 3)       0                                            
__________________________________________________________________________________________________
cu_dnnlstm_1 (CuDNNLSTM)        (None, 100, 512)     1058816     input_1[0][0]                    
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, 100, 512)     0           cu_dnnlstm_1[0][0]               
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, 100, 1024)    4202496     dropout_1[0][0]                  
____________________________________________________________________________________________

## Training the Model

Now for training the model, there is the option of training from scratch using the data loaded earlier or loading the pretrained model that uses the Pokemon GSC data.

Training from scratch will save the weights of the model every 10 epoches as checkpoints and will allow a function implemented later to generate music using each epoch to see the progression over time. This option will also save a plot of the loss over time once training has been completed.

Loading the pretrained model will **REQUIRE** loading the Pokemon GSC dataset earlier because it uses samples from this to generate new songs.

In [10]:
# Set to true for training from scratch
train_model = False

# If not training, select which model to use (USES DATA FROM POKEMON GSC)
trained_epoches = ["20", "40", "60", "80", "100"]
chosen_epoch = trained_epoches[4]

if(train_model):
    model.fit(network_input, [network_output_notes, network_output_offset, network_output_duration], callbacks=[history, cp_callback], epochs=n_epochs, batch_size=batch_size)
    model.save(outputDest + 'LSTMmodel_final.h5')
    
    # Plot the model losses
    pd.DataFrame(history.history).plot()
    plt.savefig(outputDest + 'LSTM_Loss_per_Epoch.png', transparent=True)
    plt.close()
else:
    model.load_weights("../output/Pokemon GSC Trained/LSTMmodel_weights_" + chosen_epoch + ".hdf5")

## Generating Music

I will now use the model to generate music by feeding it a random string of notes and have it predict the next one, then have it predict the one after that until a full song has been generated.

There are 3 types of inputs to start the generation:
1. Choosing a random series of notes from anywhere in the input<br>
    This tends to lead to the better result, however the model sounds more like the original song. This isn't a significant issue as is shown in the *SimilarityTest* notebook, but is more familiar.
2. Choosing a random series from the end of any song<br>
    This leads to fairly good results also and seems more random.
3. Choosing a random series of notes from anywhere in the input and shuffling the order<br>
    This leads to rather chaotic results because LSTMs are good at predicting from a patterns and when the pattern is random, it has difficulty prediting the next note. This tends to correct after 30-60seconds, however it has the worst results overall.

In [12]:
def generate_notes(model, network_input, possibleNotes, possibleOffsets, possibleDurations, song_end_indices, input_type):
    """ Generate notes from the neural network based on a sequence of notes """
    # create a dictionary to map pitches to integers
    pitchnames = sorted(possibleNotes)
    int_to_note = dict((number, note) for number, note in enumerate(pitchnames))
    
    # create a dictionary to map offset to integers
    offsetnames = sorted(possibleOffsets)
    int_to_offset = dict((number, offset) for number, offset in enumerate(offsetnames))
    
    # create a dictionary to map duration to integers
    durationnames = sorted(possibleDurations)
    int_to_duration = dict((number, duration) for number, duration in enumerate(durationnames))
    
    # find number of each possible choice for normalization
    n_notes = len(possibleNotes)
    n_offset = len(possibleOffsets)
    n_duration = len(possibleDurations)
    
    # choose the starting string
    if(input_type==1): # To choose a random series from any song
        start = np.random.randint(0, len(network_input)-1)
        pattern = network_input[start]
    elif(input_type==2): # To choose a random series from the end of any song
        start = np.random.randint(0, len(song_end_indices)-1)
        pattern = network_input[song_end_indices[start]]
    elif(input_type==3): # To choose a truely random sequence by selecting a sequence from any of the songs and shuffling its order
        start = np.random.randint(0, len(network_input)-1)
        pattern = network_input[start]
        np.random.shuffle(pattern)
        
    sequence_length = pattern.shape[0]
    n_dim = pattern.shape[1]
    
    prediction_output = []
    
    # generate 500 notes
    for note_index in range(500):
        prediction_input = np.reshape(pattern, (1, sequence_length, n_dim))
        prediction_input = prediction_input

        prediction = model.predict(prediction_input, verbose=0)
        
        note_int = np.argmax(prediction[0])
        note_normalized = note_int / float(n_notes)
        note = int_to_note[note_int]
        
        offset_int = np.argmax(prediction[1])
        offset_normalized = offset_int / float(n_offset)
        offset = int_to_offset[offset_int]
                
        duration_int = np.argmax(prediction[2])
        duration_normalized = duration_int / float(n_duration)
        duration = int_to_duration[duration_int]
        
        result = np.array([note_normalized, offset_normalized, duration_normalized])
        full_prediction = np.array([note, offset, duration])
        
        prediction_output.append(full_prediction)
        pattern = np.append(pattern, result)
        pattern = pattern[3:len(pattern)]
        
    print([str(x[0]) for x in prediction_output])
    
    return prediction_output

input_type = 1
prediction_output = generate_notes(model, network_input, possibleNotes, possibleOffsets, possibleDurations, song_end_indices, input_type)

['5', '4.7.9', 'F5', '0.5', 'D5', '7.9', 'C5', '5.9', '9', '7', '4', 'F3', '5.9', '0.5', '7.9', 'A5', 'F3', 'A3', '7.0', '5.9', '7.10', '9.10', 'B-5', 'D3', 'C6', 'B-3', 'D6', 'B-3', '4.9', '2.7', '0.4', '10.2', '9.0.4', '7.10', '0.4', '10.2.5', '3.6.9', '4.7', 'F5', '5.9', '5.9', '0', '5.9', '5.9', '5.9', '0', '5.9', '5.9', '5.9', '0', '5.9', '5.9', '0.5', 'F5', '5', '4.9', 'F5', 'F3', '9', '8.0', '4.9', '0.5', '9.0.4', '11.0.5', '7.9.0', '5.9', 'F3', 'A3', 'C3', 'A3', 'C5', '5.9', '8.9', '4.9', '0.5', 'D5', 'F4', 'D3', 'A3', 'A2', '9.1', '9.2', '7.9.1', '2.5', '4.7', 'F5', 'F3', '9', '8.0', '9.0', 'A4', 'F4', 'F3', 'C3', 'C4', 'D3', 'E3', 'D4', 'F3', '10', '10.2', '10.2', 'C5', '5', '4.10', 'B-4', '2', '5.10', 'E4', 'G3', '0', '0.4', '0.4', 'D5', '7', '0.5', 'C5', '4', '7', 'F4', 'A3', '2', '2.5', '2.5', '7.10', '2.5.9', '4.5.10', '5.9.10', '7.0', '0', 'F4', 'A4', '9.2', 'G4', 'F4', '4.7', 'E4', 'G4', 'F3', '5.9', '0.5', '7.9', 'A5', 'F3', 'A3', '7.0', '6.9', '7', 'B-4', '5.10', 'G5'

The pitches of each note have been printed. This helps to show the relative variation in each song, without the overload of information for the offsets and duration.

Finally, this generated music can be put to use by creating a midi from it. As in the *ChangeInstrument* notebook, the output instrument can be chosen 

In [13]:
from music21 import duration as D

def create_midi(prediction_output, filename, instrument_choice):
    """ convert the output from the prediction to notes and create a midi file
        from the notes """
    offset = 0
    output_notes = []
    output_notes.append(instrument_choice)

    # create note and chord objects based on the values generated by the model
    count = 0
    for pattern in prediction_output:
        note_str = pattern[0]
        offset_str = pattern[1]
        duration_str = pattern[2]
        if "#-" in note_str:# To fix a rare exception using 2 accidentals
            continue
        # pattern is a chord
        if ('.' in note_str) or note_str.isdigit():
            notes_in_chord = note_str.split('.')
            notes = []
            for current_note in notes_in_chord:
                new_note = note.Note(int(current_note))
                notes.append(new_note)
            new_chord = chord.Chord(notes)
            new_chord.offset = offset
            new_note.duration = D.Duration(float(duration_str))
            output_notes.append(new_chord)
        # pattern is a note
        else:
            new_note = note.Note(note_str)
            new_note.offset = offset
            new_note.duration = D.Duration(float(duration_str))
            output_notes.append(new_note)
        # increase offset each iteration so that notes do not stack
        offset += (float(prediction_output[count + 1][1])) if (count + 1 < len(prediction_output)) else 0
        count += 1

    midi_stream = stream.Stream(output_notes)
    midi_stream.write('midi', fp='{}.mid'.format(filename))
    
# Select instrument
instruments = {
    'piano': instrument.Piano(),
    'flute': instrument.Flute(),
    'clarinet': instrument.Clarinet(),
    'ocarina': instrument.Ocarina(),
    'harmonica': instrument.Harmonica(),
    'steel_drum': instrument.SteelDrum(),
    'vocals': instrument.Vocalist(),
    'soprano': instrument.Soprano(),
    'guitar': instrument.Guitar(),
    'elec_guitar': instrument.ElectricGuitar(),
    'violin': instrument.Violin(),
    'saxophone': instrument.Saxophone(),
    'trombone': instrument.Trombone(),
    'trumpet': instrument.Trumpet(),
    'english_horn': instrument.EnglishHorn(),
}
instrument_string = 'piano'
instrument_choice = instruments[instrument_string]

create_midi(prediction_output, outputDest + 'LSTM_output_' + instrument_string + '_X', instrument_choice)

The following function creates an album of 10 songs by iteratively generating and labelling songs. This shows the versitility of the model by generating many different sounding songs by providing a different input.

In [15]:
album_length = 10
input_type = 1
for count_output in range(album_length):
    prediction_output = generate_notes(model, network_input, possibleNotes, possibleOffsets, possibleDurations, song_end_indices, input_type)
    create_midi(prediction_output, outputDest + 'LSTM_output_' + instrument_string + '_'+ str(count_output), instrument_choice)
    print(f"Created at {outputDest + 'LSTM_output_final' + str(count_output)}")

['3.7', '7.0', 'C5', '5.9', '0.2.7', 'E-5', '5.9', '0.5', '3.5.8', '0.5', 'F5', 'E-5', '5.8', '0.5', 'F5', '7.10', 'G5', '7.10', '8.0', '7.10', '6.9', '5.8', '7.0', '2.3.7.10', '7.0', '3.7.10', '7.0', '3.7', '7.0', 'C5', '5.9', '0.2.7', 'E-5', '5.9', '8.0', '5.8', '8.0', '0.3', '10.2', '0.3', '2.5', '7.0', 'C2', '3.7', 'C2', '7.0', 'C2', '3.7', '7.0', 'C2', '0.3.7', '7.0', 'C2', '5.9', '7.0', 'C2', '5.9', 'C2', 'C2', '7.0', 'C2', '3.7', 'C2', '7.0', '0.3.7', '7.0', 'C2', '7.10.0', 'C2', 'G3', '5.9.0', 'E-4', '3.5.7', '5.9', 'C2', 'C5', 'F5', '7.0', 'C2', '3.7', 'C2', '7.0', 'C2', '3.7', '7.0', 'C2', '0.3.7', '7.0', 'C2', '5.9', '7.0', 'C2', '5.9', 'C2', 'C2', '7.0', 'C2', '3.7', 'C2', '7.0', '0.3.7', '7.0', 'C2', '7.10.0', 'C2', 'G3', '5.9.0', 'E-4', '3.5.7', '5.9', 'C2', 'C5', 'F5', '7.0', '2.3.7.10', '7.0', '3.7.10', '7.0', '3.7', '7.0', 'C5', '5.9', '0.2.7', 'E-5', '5.9', '0.5', '3.5.8', '0.5', 'F5', 'E-5', '5.8', '0.5', 'F5', '7.10', 'G5', '7.10', '8.0', '7.10', '6.9', '5.8', '7.0'

Created at ../output/LSTM_Pokemon GSC_1571922225/LSTM_output_final2
['A2', '2.4', 'C5', '11.2.4', 'A4', '2.4', '1.3.5', 'E4', 'A2', '4.9', 'D4', '4.9', 'F4', '4.9', '10.2.5', 'E4', 'A2', '4.9', '4.9', '4.9', '10.3', 'A2', '4.9', '4.9', '4.9', '3.5.10', 'A2', '2.4', '2.4', '2.4', '10.1.3', 'A2', '4.9', '4.9', '4.9', '5.8.10', 'A4', 'A2', 'G4', 'A4', '4.9', 'C5', 'E5', '4.9', '4.9', '3.7.10', 'F#4', '6.11', 'E4', '6.11', 'G4', '6.11', '5.7.11', 'C#5', 'B2', '6.11', 'D5', '11.1.6', 'B4', '6.11', '3.5.7', 'F#4', 'B2', '6.11', 'E4', '6.11', 'G4', '6.11', '0.4.7', 'F#4', 'B2', '6.11', '6.11', '6.11', '0.5', 'A2', '4.9', '4.9', '4.9', '3.5.10', 'A2', '2.4', '2.4', '2.4', '10.1.3', 'A2', '4.9', '4.9', '4.9', '4.9', '4.9', '4.9', 'A2', '4.9', 'D4', '4.9', 'F4', '2.4', '1.3.5', 'E4', 'A2', '4.9', '4.9', '4.9', '10.3', 'A2', '4.9', '4.9', '4.9', '3.5.10', 'A2', '2.4', '2.4', '2.4', '10.1.3', 'A2', '4.9', '4.9', '4.9', '5.8.10', 'A4', 'A2', 'G4', 'A4', '4.9', 'C5', 'E5', '4.9', '4.9', '3.7.10', 'F

Created at ../output/LSTM_Pokemon GSC_1571922225/LSTM_output_final5
['C3', 'C6', 'B3', 'C4', 'G#5', 'G#3', 'A5', 'B-5', 'B-2', 'B-3', 'B-2', 'C#6', 'B-2', 'B-5', 'A3', 'B-3', 'G#3', 'C6', 'G#2', 'G#3', 'G#2', 'E-6', '8', 'B-5', 'B5', 'C3', 'E-3', 'C6', 'G3', 'G#3', 'E-3', 'C#6', 'C#3', 'C#4', 'C#3', 'F6', 'C#3', 'C#6', 'C4', 'C#4', 'G#3', 'C6', 'C3', 'C4', 'C3', 'E-6', 'C3', 'C6', 'B3', 'C4', 'G#5', 'G#3', 'A5', 'B-5', 'B-2', 'B-3', 'B-2', 'C#6', 'B-2', 'B-5', 'A3', 'B-3', 'G#3', 'C6', 'G#2', 'G#3', 'G#2', 'E-6', '5.8', '3.8', '1.6', '0.3', '8', '1.5', 'C#5', 'G#4', 'C#5', '8.1', '0.3', 'C5', 'G#4', 'C5', '8.0', '10.1', 'B-2', '0.5', '10.1', '10.0.5', 'G#4', 'E-4', 'G#2', 'C4', 'C#4', 'G#4', '3.8', 'A4', '6.10', 'F#4', 'C#4', 'C#5', '1.6', '6.10', 'E-3', '10.1', '6.8', '3.6.10', '1.5', 'G#3', '3.6', '8', '1.5.8', 'E-5', 'C5', 'G#2', 'G#2', 'G#4', '8.0', '1.5', 'C#5', 'G#4', 'C#5', '8.1', '0.3', 'C5', 'G#4', 'C5', '8.0', '10.1', 'B-2', '0.5', '10.1', '10.0.5', 'G#4', 'E-4', 'G#2', 'C4',

['10.0', 'C5', 'G#3', '1.4', '4.10', 'E3', 'C#5', '8.1', '4', 'F4', 'C4', 'B-4', 'G#3', '1.4', '1.4', 'E3', '1.5.8', '4', '0.5', '10.0', 'C5', 'G#3', '1.4', 'E4', 'E3', 'C#5', '8.1', '4', 'C5', 'F4', 'C4', 'C#5', 'A3', '2.5', '5.11', 'F3', '9.2', '4.5', '1.6', '11.1', 'C#5', 'A3', '2.5', '5.11', 'F3', 'D5', '9.2', '5', 'F#4', 'C#4', 'B4', 'A3', '2.5', '2.5', 'F3', '2.6.9', '5', '1.6', '11.1', 'C#5', 'A3', '2.5', 'F4', 'F3', 'D5', '9.2', '5', 'C#5', 'F#4', 'C#4', '0.5', 'C4', 'B3', '1.6', 'B-3', 'A3', '2.7', 'C#4', 'C4', '1.6', 'B3', 'B-3', 'E5', 'G4', 'A3', 'G3', 'F#4', 'B-3', 'C4', 'A3', 'G3', 'E-5', 'F4', 'A3', 'G3', 'E4', 'B-3', 'C4', 'A3', 'G3', 'B-5', 'G4', 'A3', 'G3', 'F#4', 'B-3', 'C4', 'A3', 'G3', 'A5', 'C5', 'A3', 'G3', '9.10', 'G#3', 'C#4', 'E4', 'G#3', 'C#4', 'E4', '7.8', '0.1', '3.4', '7.8', '0.1', '3.4', 'C5', 'G#3', 'C#4', '4.10', 'C#5', 'E-5', 'C5', 'B-4', 'C5', 'G#3', 'C#4', '4.10', 'C#5', 'B-4', 'G#3', 'C#4', '1.4', 'F5', 'E5', 'C5', 'B-4', 'C5', 'G#3', 'C#4', 'E4', 'C

Alternatively, this script will read through all of the saved weights for the model in the directory and will create a song for each of them. This helps to show the progress of the model as it learns.

In [19]:
# Have each model make a song
count = 0
filepaths = glob.glob(outputDest + "*.hdf5")
for model_path in filepaths:
    print("Composing from %s" % model_path)
    model.load_weights(model_path)
    prediction_output = generate_notes(model, network_input, possibleNotes, possibleOffsets, possibleDurations, song_end_indices, input_type)
    create_midi(prediction_output, outputDest + 'LSTM_output_' + str(count), instrument_choice)
    print(outputDest + 'LSTM_output_' + str(count))
    count += 1

Composing from ../output/LSTM_Pokemon GSC_1571922225\LSTMmodel_weights_20.hdf5
['7', '9', '7', '9', '9', '9', '3.8', '3.8', 'C#5', 'G2', 'E-5', 'C#5', 'A5', 'G2', 'A5', 'A5', 'G2', 'F#5', 'F#5', 'G2', 'E-5', 'E-5', '8.11.1.4', 'E-5', 'B5', 'G2', 'E-5', 'B5', '8.11.1.4', 'B5', '8.11.1.4', 'C#5', 'B5', 'B2', '4.8.11', '9.1.4', '4.8.11', '8.11.1.4', 'E-5', 'A5', 'G2', 'G2', 'B5', 'G2', 'G2', 'G2', 'F#5', 'G2', 'G2', '9.11', 'G2', 'G2', 'F#5', 'A2', 'F#5', 'E5', 'G2', 'E5', 'G2', 'E5', 'G2', 'F#5', 'A5', 'G2', 'F#5', 'G2', 'F#5', 'F3', 'E5', 'C#5', 'F3', 'F3', '7.0', 'G2', 'A5', '9.1.4', 'A5', 'A2', 'E5', 'C#3', 'G#5', 'G2', 'E5', 'B5', 'D3', 'G2', '7.0', 'G2', 'F#5', 'G2', 'G2', '7.11', 'G3', 'B2', '9.1', 'G3', 'B3', 'E3', '11.4', 'C#4', '1.7', 'C#6', 'C#6', 'F3', 'C#5', 'F3', 'C#5', 'F3', 'G2', 'F3', 'G2', 'B5', 'G2', 'G2', 'C6', 'F#2', 'G2', 'F#2', 'G2', 'C6', 'G2', 'E-5', 'C#5', 'C6', 'G2', 'G2', 'G#5', 'E-5', 'F#2', 'C6', 'F#2', 'G2', 'G#5', 'G2', 'E-5', 'C#5', 'D6', 'G#5', 'G#5', 'G#

## Conclusion

The result of this notebook is a midi file, which uses an instrument of the user's choice to play a unique track which reflects the style of the set of tracks provided. Extentions in the parameters read and analysed by the model have allowed a better output to be generated compared to previous works.
To improve further upon our creation, a system to produce a continuous output stream would be immensely advantagious for some potential uses such as extended, non-looping audio sequences.

Constructing this upgraded model taught us a lot about the importance of feature selection and data selection in the training of a neural network and we all thoroughly enjoyed the experience.