# RATM Music Generator

The purpose of this project is to develop a recurrent neural network (RNN) model that generates music, specifically, it generates music after having been trained on popular 1990's band Rage Against the Machine (RATM). 

Why RATM? Partly because it's a bit different than the classical music that's typical of similar projects, and partly because we don't have an ear for classical music.     

This project is based in large part on a very similar project by Sigurður Skúli in which he designs a model that learns to create piano based music. Much of the code that we use is based on his work. His project can be found here: https://towardsdatascience.com/how-to-generate-music-using-a-lstm-neural-network-in-keras-68786834d4c5

A couple of other projects also borrow Skúli's code and put their own twist on the model. Those can be found here: https://becominghuman.ai/generating-music-using-lstm-neural-network-545f3ac57552
and here: https://medium.com/@leesurkis/how-to-generate-techno-music-using-deep-learning-17c06910e1b3

One significant change that we're going to be making to our model compared to the aforementioned models is that we'll be using GRU layers rather than LSTM layers. A GRU layer is a simplified version of an LSTM layer but often gives similar results. Our hope is that this will reduce the training time required since some of the other projects mentioned took upwards of 18 hours of training time. 

We'll start by downloading several libraries and layers. 

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

In [None]:
import glob
import pickle
import numpy
from music21 import converter, instrument, midi, note, stream, chord
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import GRU
from keras.layers import Activation
from keras.layers import BatchNormalization as BatchNorm
from keras.utils import np_utils
from keras.callbacks import ModelCheckpoint

In [None]:
# Random seeds from both numpy and tensorflow
from numpy.random import seed
seed(99)
tf.random.set_seed(99)  

Our music files are in MIDI (Musical Instrument Digital Interface) format, which allows for small files that contain information related to notes, tempo, instruments, duration, and pitch but without the sound. These patterns can then be run through software such as Logic, Garageband, or Audacity to generate music.

The original MIDI files are located at this site: https://freemidi.org/artist-826-rage-against-the-machine-P-0

Since we're running this on Colab, we've conveniently stored the files on Google Drive, and we'll access them from there.

In [None]:
# Mount Drive to Colab
from google.colab import drive
drive.mount('/content/drive')

For the following two functions, more information can be found on Skúli's post. Essentially, what they'll allow us to do is extract notes form our MIDI files.  To do this, we're going to be using a Python toolkit called Music21. More details, including documentation can be found here: https://web.mit.edu/music21/doc/about/what.html

In [None]:
def train_network():
    """ Train a Neural Network to generate music """
    notes = get_notes()

    # get amount of pitch names
    n_vocab = len(set(notes))

    network_input, network_output = prepare_sequences(notes, n_vocab)

    model = create_network(network_input, n_vocab)

    train(model, network_input, network_output)

In [None]:
def get_notes():
    """ Get all the notes and chords from the midi files in the ./midi_songs directory """
    notes = []

    for file in glob.glob("/content/drive/MyDrive/ratm midi/*.mid"):
        midi = converter.parse(file)

        print("Parsing %s" % file)

        notes_to_parse = None

        try: # file has instrument parts
            s2 = instrument.partitionByInstrument(midi)
            notes_to_parse = s2.parts[0].recurse() 
        except: # file has notes in a flat structure
            notes_to_parse = midi.flat.notes

        for element in notes_to_parse:
            if isinstance(element, note.Note):
                notes.append(str(element.pitch))
            elif isinstance(element, chord.Chord):
                notes.append('.'.join(str(n) for n in element.normalOrder))

    with open('/content/drive/MyDrive/ratm midi/data/notes', 'wb') as filepath:
        pickle.dump(notes, filepath)

    return notes

Listening to the songs, we notice that there are more instruments present in every song than in the piano music that Skúli used. That's going to make our task potentially more difficult because we'll need to extract the notes of multiple instruments. 

We'll need to take a look at which instruments are present in each song. We can use some code referenced here to do so: https://www.kaggle.com/wfaria/midi-music-data-extraction-using-music21

In [None]:
# List of instruments per song
for file in glob.glob("/content/drive/MyDrive/ratm midi/*.mid"):
    midi = converter.parse(file)
    s2 = instrument.partitionByInstrument(midi) 
    print("List of instruments found on:" + str(file)) 
    partStream = s2.parts.stream()
    for p in partStream:
        aux = p
        print (p.partName)
    

List of instruments found on:/content/drive/MyDrive/ratm midi/AshesInTheFall.mid
None
Electric Bass
Piano
List of instruments found on:/content/drive/MyDrive/ratm midi/Bombtrack.mid
None
Electric Bass
Piano
List of instruments found on:/content/drive/MyDrive/ratm midi/BornOfABrokenMan.mid
None
Electric Bass
Piano
List of instruments found on:/content/drive/MyDrive/ratm midi/BornAsGhosts.mid
None
Acoustic Bass
Piano
List of instruments found on:/content/drive/MyDrive/ratm midi/BulletInTheHead.mid
None
Piano
List of instruments found on:/content/drive/MyDrive/ratm midi/BullsOnParade.mid
None
List of instruments found on:/content/drive/MyDrive/ratm midi/CalmLikeABomb.mid
None
Piano
List of instruments found on:/content/drive/MyDrive/ratm midi/DownOnTheStreet.mid
None
Electric Bass
Piano
List of instruments found on:/content/drive/MyDrive/ratm midi/DownRodeo.mid
None
Electric Bass
Piano
List of instruments found on:/content/drive/MyDrive/ratm midi/FistfulOfSteel.mid
None
Electric Bass
Pian

Looking through this list, piano and electric bass are present in just about every song. Interestingly, RATM generally doesn't use piano in their music. The three primary instruments that they use are electric guitar, electric bass, and drums. Drums aren't listed for any of the songs and electric guitar is only mentioned for a couple. There's also a value of 'None' listed for most of the songs. Unfortunately, what that means is that there are unidentified instrument notes that we won't be able to extract. As our result, the music that we eventually generate won't be as rich and complex. 

Next, we're going to prepare the notes to be fed into our model. 

In [None]:
def prepare_sequences(notes, n_vocab):
    """ Prepare the sequences used by the Neural Network """
    sequence_length = 100

    # get all pitch names
    pitchnames = sorted(set(item for item in notes))

     # create a dictionary to map pitches to integers
    note_to_int = dict((note, number) for number, note in enumerate(pitchnames))

    network_input = []
    network_output = []

    # create input sequences and the corresponding outputs
    for i in range(0, len(notes) - sequence_length, 1):
        sequence_in = notes[i:i + sequence_length]
        sequence_out = notes[i + sequence_length]
        network_input.append([note_to_int[char] for char in sequence_in])
        network_output.append(note_to_int[sequence_out])

    n_patterns = len(network_input)

    # reshape the input into a format compatible with LSTM layers
    network_input = numpy.reshape(network_input, (n_patterns, sequence_length, 1))
    # normalize input
    network_input = network_input / float(n_vocab)

    network_output = np_utils.to_categorical(network_output)

    return (network_input, network_output)

We're going to use a an RNN structure similar to that used by Skúli (if we choose to revisit this project in the future, we'll likely try other model configurations). However, we are making a couple of changes. First, we're going to use GRU layers rather than LSTM and we're not going to keep the dropout layers (but we've left them in the code as comments in case others may want to running the network with the dropout layers). The main reason for these changes is to speed up the learning process. While this may lead to music that doesn't sound as interesting as that generated by Skúli, the training should be quite a bit faster. Hopefully, the trade off is worth it. 

In [None]:
def create_network(network_input, n_vocab):
    """ create the structure of the neural network """
    model = Sequential()
    model.add(GRU(
        512,
        input_shape=(network_input.shape[1], network_input.shape[2]),
        return_sequences=True
    ))
    model.add(GRU(512, return_sequences=True))
    model.add(GRU(512))
    model.add(BatchNorm())
    #model.add(Dropout(0.2))
    model.add(Dense(256))
    model.add(Activation('relu'))
    model.add(BatchNorm())
    #model.add(Dropout(0.2))
    model.add(Dense(n_vocab))
    model.add(Activation('softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam')

    return model

It's expected that the more epochs that we let the model run, the better the results. However, we're also going to use ModelCheckpoint so that we can end the process early if we see the loss decrease stalling. 

In [None]:
def train(model, network_input, network_output):
    """ train the neural network """
    filepath = "weights-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5"
    checkpoint = ModelCheckpoint(
        filepath,
        monitor='loss',
        verbose=0,
        save_best_only=True,
        mode='min'
    )

    callbacks_list = [checkpoint]

    model.fit(network_input, network_output, epochs=100, batch_size=128, callbacks=callbacks_list)

if __name__ == '__main__':
    train_network()

We ran the model for about 70 epochs before the loss decrease stalled and actually began increasing. 

Now, let's move to the second half of this project - generating music. We need to define functions that load the notes that were used to train the model and prepare the sequences.

In [None]:
def generate():
    """ Generate a midi file """
    #load the notes used to train the model
    with open('/content/drive/MyDrive/ratm midi/data/notes', 'rb') as filepath:
        notes = pickle.load(filepath)

    # Get all pitch names
    pitchnames = sorted(set(item for item in notes))
    # Get all pitch names
    n_vocab = len(set(notes))

    network_input, normalized_input = prepare_sequences(notes, pitchnames, n_vocab)
    model = create_network(normalized_input, n_vocab)
    prediction_output = generate_notes(model, network_input, pitchnames, n_vocab)
    create_midi(prediction_output)


In [None]:
def prepare_sequences(notes, pitchnames, n_vocab):
    """ Prepare the sequences used by the Neural Network """
    # map between notes and integers and back
    note_to_int = dict((note, number) for number, note in enumerate(pitchnames))

    sequence_length = 100
    network_input = []
    output = []
    for i in range(0, len(notes) - sequence_length, 1):
        sequence_in = notes[i:i + sequence_length]
        sequence_out = notes[i + sequence_length]
        network_input.append([note_to_int[char] for char in sequence_in])
        output.append(note_to_int[sequence_out])

    n_patterns = len(network_input)

    # reshape the input into a format compatible with LSTM/GRU layers
    normalized_input = numpy.reshape(network_input, (n_patterns, sequence_length, 1))
    # normalize input
    normalized_input = normalized_input / float(n_vocab)

    return (network_input, normalized_input)

We'll now load out best weights. To do this, we'll need to go to our Drive where the weights are saved and copy the filepath.

In [None]:
def create_network(network_input, n_vocab):
    """ create the structure of the neural network """
    model = Sequential()
    model.add(GRU(
        512,
        input_shape=(network_input.shape[1], network_input.shape[2]),
        return_sequences=True
    ))
    model.add(GRU(512, return_sequences=True))
    model.add(GRU(512))
    model.add(BatchNorm())
    #model.add(Dropout(0.2))
    model.add(Dense(256))
    model.add(Activation('relu'))
    model.add(BatchNorm())
    #model.add(Dropout(0.2))
    model.add(Dense(n_vocab))
    model.add(Activation('softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam')

    # Load the weights to each node (copy filepath from Drive)
    model.load_weights('/content/weights-improvement-51-0.0653-bigger.hdf5')

    return model

In [None]:
def generate_notes(model, network_input, pitchnames, n_vocab):
    """ Generate notes from the neural network based on a sequence of notes """
    # pick a random sequence from the input as a starting point for the prediction
    start = numpy.random.randint(0, len(network_input)-1)

    int_to_note = dict((number, note) for number, note in enumerate(pitchnames))

    pattern = network_input[start]
    prediction_output = []

    # generate 500 notes
    for note_index in range(500):
        prediction_input = numpy.reshape(pattern, (1, len(pattern), 1))
        prediction_input = prediction_input / float(n_vocab)

        prediction = model.predict(prediction_input, verbose=0)

        index = numpy.argmax(prediction)
        result = int_to_note[index]
        prediction_output.append(result)

        pattern.append(index)
        pattern = pattern[1:len(pattern)]

    return prediction_output

For our model, piano and electric bass were the two most common instruments, having been found in almost every song. So, we'll generate 500 notes for each instrument, which comes out to about two minutes of music. We'll generate music for both instruments separately and combine them into a single song using Audacity (you can use any music editing software that allows song layering). 

In [None]:
def create_midi(prediction_output):
    """ convert the output from the prediction to notes and create a midi file
        from the notes """
    offset = 0
    output_notes = []

    # create note and chord objects based on the values generated by the model
    for pattern in prediction_output:
        # pattern is a chord
        if ('.' in pattern) or pattern.isdigit():
            notes_in_chord = pattern.split('.')
            notes = []
            for current_note in notes_in_chord:
                new_note = note.Note(int(current_note))
                new_note.storedInstrument = instrument.Piano()
                notes.append(new_note)
            new_chord = chord.Chord(notes)
            new_chord.offset = offset
            output_notes.append(new_chord)
        # pattern is a note
        else:
            new_note = note.Note(pattern)
            new_note.offset = offset
            new_note.storedInstrument = instrument.Piano()
            output_notes.append(new_note)

        # increase offset each iteration so that notes do not stack
        offset += 0.5

    midi_stream = stream.Stream(output_notes)

    midi_stream.write('midi', fp='/content/drive/MyDrive/ratm midi/data/test_output1.mid')

if __name__ == '__main__':
    generate()


In [None]:
def create_midi(prediction_output):
    """ convert the output from the prediction to notes and create a midi file
        from the notes """
    offset = 0
    output_notes = []

    # create note and chord objects based on the values generated by the model
    for pattern in prediction_output:
        # pattern is a chord
        if ('.' in pattern) or pattern.isdigit():
            notes_in_chord = pattern.split('.')
            notes = []
            for current_note in notes_in_chord:
                new_note = note.Note(int(current_note))
                new_note.storedInstrument = instrument.ElectricBass()
                notes.append(new_note)
            new_chord = chord.Chord(notes)
            new_chord.offset = offset
            output_notes.append(new_chord)
        # pattern is a note
        else:
            new_note = note.Note(pattern)
            new_note.offset = offset
            new_note.storedInstrument = instrument.ElectricBass()
            output_notes.append(new_note)

        # increase offset each iteration so that notes do not stack
        offset += 0.5

    midi_stream = stream.Stream(output_notes)

    midi_stream.write('midi', fp='/content/drive/MyDrive/ratm midi/data/test_output2.mid')

if __name__ == '__main__':
    generate()


You can find the result of the best combination of instruments after multiple attempts in the same GitHub folder as this notebook.

It certainly doesn't sound like Rage Against the Machine, but the result is understandable given that we weren't able to account for electric guitar and drums. In addition, we also trained the model fairly rapidly, using GRU rather than LSTM and not implementing dropout, which may have decreased the quality of the music. But it was an attempt worth trying simply to see how well faster training methods can compare and how much information is actually lost. In our case, the training time was about a quarter of that mentioned in the other projects. If, in the future, we can generate music for the missing instruments, we may be able to create songs that sound more complex and robust and may allow us to claim that the trade off between speed and quality is worth it.   