
#TKBot

Machine learning music generation with LSTM's. Based heavily on [this article](https://towardsdatascience.com/how-to-generate-music-using-a-lstm-neural-network-in-keras-68786834d4c5) by Towards Data Science. The main improvement is the addition of some rhythmic features to the music generation.

Trained on guitar tracks by Ling Tosite Sigure.

The midi tracks are located in a Drive folder, so Google drive must be mounted first.

In [1]:
from google.colab import drive
# This will prompt for authorization.
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
from music21 import converter, instrument, note, chord, stream
import numpy
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM, Activation
from keras.utils import np_utils
from keras.callbacks import ModelCheckpoint, Callback
import glob
from fractions import Fraction 



# Creating Network Inputs/Outputs from Dataset

We begin by creating the network input. We use the library [music21](https://web.mit.edu/music21/doc/index.html) to parse the midi files and isolate the notes and chords.

To improve the accuracy and performance of the network, we only consider the following quarter note values, and round all other values to the nearest available:

* 0 : x (dead note)

* 1: 16th

* 2 : 8th

* 3 : quatrter

* 4 : half

The inputs can be either chords or notes. In the case of notes, we represent them in the form "note dur", for example "B- 2" is a b flat with a duration of an 8th. In the case of chords we use their [normal order](https://web.mit.edu/music21/doc/moduleReference/moduleChord.html#music21.chord.Chord.normalOrder), where each note is seperated by a '.'. For example, the normal order of a C chord is [0, 3, 7], so we represent it as "0.3.7 dur". Finally, the dead notes are represented by a "R".



In [None]:
path = "/content/drive/MyDrive/lts_dataset/*.*"
notes = []
for filepath in glob.iglob(path):
    print(filepath)
    midi = converter.parse(filepath)
    notes_to_parse = midi.flat.notes
    for element in notes_to_parse:
        res = ""
        
        if isinstance(element, note.Note):
            res += str(element.pitch)
        elif isinstance(element, chord.Chord):
            res += '.'.join(str(n) for n in element.normalOrder)
        if (element.duration.quarterLength == Fraction(2,3)):
            res += " 2"
        if (element.duration.quarterLength == 0.0):
            res = "R"
        elif (element.duration.quarterLength < 0.5):
            res += " 1"
        elif (element.duration.quarterLength < 1):
            res += " 2"
        elif (element.duration.quarterLength == 1):
            res += " 3"
        else:
            res += " 4"
        notes.append(res)



The inputs are sequences of 100 notes, followed by a single output note. We use a hot one representation for the output. 

In [None]:
# getting the unique note-duration values, and creating a map
pitches = sorted(set(x for x in notes))
n_classes = len(set(notes))

noteMap = dict((pitches[i], i) for i in range(len(pitches)))

network_input = []
network_output = []


# generating sequences of 100 notes and the corresponding output
sequence_length = 100

for i in range(0, len(notes) - sequence_length, 1):
    sequence_in = notes[i:i + sequence_length]
    network_input.append([noteMap[x] for x in sequence_in])

    sequence_out = notes[i + sequence_length]
    network_output.append(noteMap[sequence_out])

# reshaping for input and normalizing
n_in = len(network_input)
network_input = numpy.reshape(network_input, (n_in, sequence_length, 1))
network_input = network_input / float(n_classes)

network_output = np_utils.to_categorical(network_output)
print("Input size : "+str(len(network_input)))
print("Unique classes : " + str(n_classes))

#Model

The model we use contains three LSTM layers, followed by two fully connected layers, with the nodes shown below.

In [5]:
model = Sequential()
model.add(LSTM(
    256,
    input_shape=(network_input.shape[1], network_input.shape[2]),
    return_sequences=True
))
model.add(Dropout(0.3))
model.add(LSTM(512, return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(256))
model.add(Dense(256))
model.add(Dropout(0.3))
model.add(Dense(n_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

In [None]:
weights = "/content/drive/MyDrive/weights/57-0.2068.h5"
if(len(weights)>0): model.load_weights(weights)

Callback function to save the state of the model after every 30 epochs. We train the model for 200 epochs.

In [16]:
class CustomSaver(Callback):
    def on_epoch_end(self, epoch, logs={}):
        if epoch % 30 == 0:  
            self.model.model("/content/drive/My Drive/weights/model_{}_v4.hd5".format(epoch))

In [None]:
#saving
saver = CustomSaver()

model.fit(network_input, network_output, callbacks = [saver], epochs=200, batch_size=64)

#Generating Music

Finally, we get to make some music! We generate 500 note sequences based on the network's output. Starting at a random network input, we generate a note based on the model prediction. To get the next sequence, we cut the first note and append the generated note to the end of the current input.

In [46]:
def generate_sequence(net_input):
    start = numpy.random.randint(0, len(network_input)-1)
    print(start)
    revNoteMap = dict((i, pitches[i]) for i in range(len(pitches)))
    pattern = net_input[start]
    prediction_output = []
    # generate 500 notes
    for note_index in range(500):
        prediction_input = numpy.reshape(pattern, (1, len(pattern), 1))
        prediction_input = prediction_input / float(n_classes)
        prediction = model.predict(prediction_input, verbose=0)
        index = numpy.argmax(prediction)
        result = revNoteMap[index]
        prediction_output.append(result)
        pattern.append(index)
        pattern = pattern[1:len(pattern)]
    return prediction_output

In [40]:
# Recreating input

network_input = []
network_output = []

sequence_length = 50
for i in range(0, len(notes) - sequence_length, 1):
    sequence_in = notes[i:i + sequence_length]
    network_input.append([noteMap[x] for x in sequence_in])

    sequence_out = notes[i + sequence_length]
    network_output.append(noteMap[sequence_out])

prediction_output = generate_sequence(network_input)

Finally, we create a midi file based on the generated output. For each prediction, we seperate the note value and the duration. We then create the appropriate object (Note/Chord), with the correct duration. Finally, we increase the offset, so that the next note is in its right place. 

In [54]:
offset = 0
output_notes = []
duration_dict = {'0' : 0.0, '1':0.25, '2':0.5, '3':1, '4':2}
# create note and chord objects based on the values generated by the model
for pattern in prediction_output:

    if (pattern == "R"):
        new_note = note.Note('A', quarterLength=0.0)
        new_note.offset = offset
        new_note.storedInstrument = instrument.Piano()
        output_notes.append(new_note)
        offset += 0.25
        continue

    sp = pattern.split(" ")
    val = sp[0]
    dur = duration_dict[sp[1]]
    # pattern is a chord
    if ('.' in val) or val.isdigit():

        notes_in_chord = val.split('.')
        notes = []
        for current_note in notes_in_chord:
            new_note = note.Note(int(current_note))
            new_note.storedInstrument = instrument.Piano()
            notes.append(new_note)
        new_chord = chord.Chord(notes, quarterLength=dur)
        new_chord.offset = offset
        output_notes.append(new_chord)
    else:
        new_note = note.Note(val, quarterLength=dur)
        new_note.offset = offset
        new_note.storedInstrument = instrument.Piano()
        output_notes.append(new_note)
    # increase offset each iteration so that notes do not stack
    offset += dur

midi_stream = stream.Stream(output_notes)
midi_stream.write('midi', fp='output.mid')