In [1]:
# 3rd-Party Modules 
import music21
import tqdm
from keras.utils import np_utils
import numpy as np
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import BatchNormalization as BatchNorm
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Activation
from tensorflow.keras.models import Sequential
import IPython.display as ipd

# Built-In Modules 
import glob
import pickle

music21: Certain music21 functions might need the optional package matplotlib;
                  if you run into errors, install it by following the instructions at
                  http://mit.edu/music21/doc/installing/installAdditional.html


## Background Information

LSTM Networks are more developed versions of simple Recurrent Neural Networks. Normally when we want to backpropagate an error in a nerual net (i.e. flow through the layers of the model and use the error function you defined to make the layers better), we have a Gradient Problem. The Gradient Problem occurs when the backpropagation has a gradient for the error. Meaning that (usually) the eariler layers will not recive that 'learning'.

<img src="images/gradient_problem.jpeg" width="400">


This then causes 'short-term-memory' and impairs the ability to create longer sequences as outputs. LSTM's are a response to this problem. Thus they are used in many sequence applications such as speach recognition, translation etc. LSTM's models keep relevent information in order to make predictions. Information is passed along through the sequencer through this idea of 'hidden states'. The following image shows the hidden state being passed along, 'Tanh Function' ensures that the values says between $-1$ and $1$. 

<img src="images/hidden_state.gif" width="400">

The internals of a LSTM layer can be quite complex and involve a lot of math as seen in this graphic:

<img src="images/inside_lstm_layer.png" width="400">

But thanks to the **open-source community** we can use the TensorFlow library to build our own, simpler LSTM model. 


## Set Up

Initally when I ran the `fit` function it was estimating that it would take about 3 hours per Epoch which would have taken an incredibly long amount of time. To expedite this I repurposed my gaming desktop to become my machine learning compiler. 

_Note:_ The 3 hours was taken with a batch size of 124 on the original unmodified model, but this number changed significantly once I changed the model. The sentiment still carries though, utilizing the GPU drastically reduced the runtime. 

I installed Ubuntu on a new partition and installed the NVidia Cuda. Cuda piece of software created by NVidia to enable developers to use NVidia graphics cards for general purpose processing (instead of being restricted to video processing). After that I installed `openssh` so I could ssh remotly into my computer and finally `TensorFlow` along with the other needed pip packages to run my code. 

After 3 re-installs of Ubuntu I was finally able to get everything working. After utilizing my desktops graphics card I was able to reduce the processing time for one Epoch to 30 minutes, so about a $83.33\%$ decrese in processing time.

## Local Environment

Setting up Ubuntu and my desktop actaully turned out to be one of my biggest challenges. I understood that my results would eventually be determined by the content of my model, but being able to develop in a time effient manor was something I also valued. Following what a list of some of the things I was able to accomplish on my desktop PC during my limited time:

* `Ubuntu`

    * Installing Ubuntu seemed like an easy enough task, but it took some time to understand boot loaders and boot devices. In a previous job I had worked with Fedora so I decided to use that as my Linux distribution, but I quickly realized that there was limited (if any) documentation on how to set up my Nvidia drivers on Fedora. So I had to switch to Ubuntu in order to have access to that documentation. This lead to another mistake; not formatting the partition before re-installing Linux. This lead to some limited memory issues and I quickly realized (after I had set up my Nvidia drivers) that I would have to re-format the drive again and re-install Ubuntu. 
    
* `Cuda` & `Cudnn`

    * My knowledge of Cuda lead me to believe that it was the only thing needed to utilize my graphics card for machine learning. But in reality there was actually another library called [Cudnn](https://developer.nvidia.com/cudnn) that was actually doing a lot of the heavy machine learning lifting. What was particularly challenging about this library was there was actually no installer (that I could find) that would do the installation for you. This resulted in me figuring out where the `Cuda` library files were stored, and manually copying in the relevent `Cudnn` files. 
    

* `Jupyter Notebook` & `OpenSSH`

    * These two were definitly _optional_ but things that I wanted :). Initializing a Jupyter Notebook that would be accessable through anyone in the local network to connect to was something on my TODO list. Learning about how local IP address differ from external (exposed) IP address was another thing I was able to learn about. It was not enought to start the notebook on `127.0.0.1` (AKA `localhost`), instead I needed to start the notebook on my computers external local IP address (AKA `10.0.1.40`). `OpenSSH` was a little easier to get started with as most of that configuration was done through the package installation.


## Prepare the Data

Now we can start using our data. I found [this](https://www.kaggle.com/jembishop1/classical-music-piano-rolls?select=music.pk) database on Kaggle and decided it to use it for this project as it seemed to have a wide verity of music. 

Now we must loop over our data that we have and convert them to music21 objects so we can work with the `.mid` files easier. 

**Note:** This can take a while so I've included a .pkl file that can be opened in the next step. 

In [2]:
# Final list of all notes
notes = []

files = glob.glob("input_data/*/*.mid")
for x in tqdm.tqdm(range(len(files))):
    file = files[x]
    # Convert the file into a music21 objects
    midi = music21.converter.parse(file)
    
    # Variable to keep how many different notes will be needed to parse
    notes_to_parse = None
    
    # Seporate our any differnt parts if they exists
    parts = music21.instrument.partitionByInstrument(midi)
    if parts: 
        # File has differnt parts
        notes_to_parse = parts.parts[0].recurse()
    else: 
        # File does not have multiple parts
        notes_to_parse = midi.flat.notes
    
    # Loop over the notes we extracted
    for element in notes_to_parse:
        if isinstance(element, music21.note.Note):
            # If its a Note object -> add its pitch
            notes.append(str(element.pitch))
        elif isinstance(element, music21.chord.Chord):
            # Its its a Chord object -> Loop and add the ID of every note in the chord
            notes.append('.'.join(str(n) for n in element.normalOrder))

100%|██████████| 292/292 [19:05<00:00,  3.92s/it]


In [4]:
# Optional: Pickle the output, and save it for next time 
with open('checkpoints/notes_pickle.pkl', 'wb') as f:
    pickle.dump(notes, f)

## Process the data
Now we have our input data stored as a mix of words and numbers. We need to convert these words to numbers, so lets encode out data by creating a mapping from words to integers. 

In [2]:
# Optional: Open up the pickled file
with open('checkpoints/notes_pickle.pkl', 'rb') as f:
    notes = pickle.load(f)

In [3]:
# Number of 'vocab' words is the number of notes
n_vocab = len(set(notes))

# Variable sequence_length: This controls how much data is needed before a note, to predict the note itself
sequence_length = 100 

# Grab all of the different pitch names and map the piches to integers
pitchnames = sorted(set(item for item in notes))
note_to_int = dict((note, number) for number, note in enumerate(pitchnames))

# Create the corresponging input/output sequences
network_input = []
network_output = []
for i in range(0, len(notes) - sequence_length, 1):
    # Calculate the increment_index (i.e. where we are now + sequenec_length)
    increment_index = i + sequence_length
    
    # Input is the notes leading up till i+sequence_length 
    sequence_in = notes[i:increment_index]
    
    # Output is the first note that comes after 
    sequence_out = notes[increment_index]
    
    # Convert the notes to int for our model
    network_input.append([note_to_int[char] for char in sequence_in])
    network_output.append(note_to_int[sequence_out])

# Reshape the input array so it can be used in our model
# Our new shape is [length of input X number of notes in each input X 1]
new_shape_tuple = (len(network_input), sequence_length, 1) 
network_input = np.reshape(network_input, new_shape_tuple)

# Normalize the input by dividing by the number of words
network_input = network_input / float(n_vocab)

# Finally convert out integers into binary values to feed into our model
network_output = np_utils.to_categorical(network_output)

## Create the Model

Creating the model was a bit difficult. I originally was trying to mimic the model that was used in [this](https://arxiv.org/pdf/1804.07300.pdf) paper. But it proved to be too hard to mimic exactly due to the added complexity of time. I ended up modifying [this](https://github.com/Skuldur/Classical-Piano-Composer) Keras model to better mimic the model in the paper. 

In [4]:
def create_model(network_input, n_vocab):
    """This needs to be a function so we can load in our weights after the fact"""
    # Declare that we are using a Sequential model
    model = Sequential()
    
    # Add our first LSTM layer to indicate model input
    model.add(LSTM(
        512,
        input_shape=(network_input.shape[1], network_input.shape[2]),
        recurrent_dropout=0.75,
        return_sequences=True
    ))
    
    # All LSTM layers will have a 0.75 dropout as refrenced in the original paper [15]
    model.add(LSTM(512, return_sequences=True, recurrent_dropout=0.75,))
    model.add(LSTM(512, recurrent_dropout=0.75))
    model.add(BatchNorm())
    model.add(Dropout(0.3))
    model.add(Dense(256))
    
    # The paper refers to using a softmax activation layer
    model.add(Activation('softmax'))
    
    model.add(BatchNorm())
    model.add(Dropout(0.3))
    
    # The last layer needs to have the same number of nodes as our possible outputs
    model.add(Dense(n_vocab))
    
    # Declare our activation function
    model.add(Activation('softmax'))
    
    # Declare our loss function 
    model.compile(loss='categorical_crossentropy', optimizer='Adadelta')

    return model

model = create_model(network_input, n_vocab)



## Train Model

This again takes an incredibly long amount of time. Originally I had run it for 25 hours, only to realize my partition has run out of memory and all my checkpoints were currupted. Skip to the next cell to load the weights of the last model I trained. 

Couple of takeaways from this part of the project:

* `epochs` Can be set to an unrealistic number. As long as you wrap it in a `except KeyboardInterrupt` there isn't anything wrong with having a huge `epoches` value. 

* Model checkpoints are necessary, not an option. This ensures that even if the model were to crash, things will still be okay. 

* `batch_size` is really important when you run into out-of-memory exepctions. I originally had it set to `124`, but realized that once I changed up the model to mimic the paper this was too large of a value and had to reduce it. I decided to set it to 10 after consoluting the mighty [stack overflow](https://stackoverflow.com/questions/35050753/how-big-should-batch-size-and-number-of-epochs-be-when-fitting-a-model-in-keras) for my most memory intensive model.

In [None]:
filepath = "checkpoints/weights-improvement-model2-2-{epoch:02d}-{loss:.4f}-bigger.hdf5"
checkpoint = ModelCheckpoint(filepath)
callbacks_list = [checkpoint]

try:
    model.fit(network_input, network_output, epochs=50, batch_size=64, callbacks=callbacks_list)
except KeyboardInterrupt:
    print("Ending training...")

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50

In [5]:
# Optional: Use this cell to load weights in that have already been trained
model = create_model(network_input, n_vocab)
model.load_weights("checkpoints/weights-improvement-model2-2-13-5.4803-bigger.hdf5")



In [6]:
# Need a random starting point from somewhere in the input
start = np.random.randint(0, len(network_input)-1)

# Create a helper function to convert to a note 
INT_TO_NOTE = dict((number, note) for number, note in enumerate(pitchnames))
def int_to_note(int_to_convert):
    return INT_TO_NOTE[int_to_convert]

# Initilize some starter variables
pattern = network_input[start]
output = []

# We want our song to be 250 notes long, this is dependent on sequence_length defined above 
for note_index in range(250):
    # Shapre our input to our model, which are all the notes we have predicted up until now
    prediction_input = np.reshape(pattern, (1, len(pattern), 1))
    
    # Normalize our input, as we did prior to feeding out model 
    prediction_input = prediction_input / float(n_vocab)
    
    # Ask the model: Whats next?
    prediction = model.predict(prediction_input, verbose=0)

    # Use the prediction with the highest weight
    index = np.argmax(prediction)
    
    # Convert it to a note and add it to our output
    result = int_to_note(index)
    output.append(result)
    
    # Increment pattern
    pattern += index
    pattern = pattern[1:len(pattern)]




















ValueError: in user code:

    /home/sid/.pyenv/versions/ml-music/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:1147 predict_function  *
        outputs = self.distribute_strategy.run(
    /home/sid/.pyenv/versions/ml-music/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:951 run  **
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /home/sid/.pyenv/versions/ml-music/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:2290 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /home/sid/.pyenv/versions/ml-music/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:2649 _call_for_each_replica
        return fn(*args, **kwargs)
    /home/sid/.pyenv/versions/ml-music/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:1122 predict_step  **
        return self(x, training=False)
    /home/sid/.pyenv/versions/ml-music/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:927 __call__
        outputs = call_fn(cast_inputs, *args, **kwargs)
    /home/sid/.pyenv/versions/ml-music/lib/python3.7/site-packages/tensorflow/python/keras/engine/sequential.py:277 call
        return super(Sequential, self).call(inputs, training=training, mask=mask)
    /home/sid/.pyenv/versions/ml-music/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py:719 call
        convert_kwargs_to_constants=base_layer_utils.call_context().saving)
    /home/sid/.pyenv/versions/ml-music/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py:888 _run_internal_graph
        output_tensors = layer(computed_tensors, **kwargs)
    /home/sid/.pyenv/versions/ml-music/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py:654 __call__
        return super(RNN, self).__call__(inputs, **kwargs)
    /home/sid/.pyenv/versions/ml-music/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:927 __call__
        outputs = call_fn(cast_inputs, *args, **kwargs)
    /home/sid/.pyenv/versions/ml-music/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent_v2.py:1138 call
        zero_output_for_mask=self.zero_output_for_mask)
    /home/sid/.pyenv/versions/ml-music/lib/python3.7/site-packages/tensorflow/python/keras/backend.py:4088 rnn
        [inp[0] for inp in flatted_inputs])
    /home/sid/.pyenv/versions/ml-music/lib/python3.7/site-packages/tensorflow/python/keras/backend.py:4088 <listcomp>
        [inp[0] for inp in flatted_inputs])
    /home/sid/.pyenv/versions/ml-music/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py:984 _slice_helper
        name=name)
    /home/sid/.pyenv/versions/ml-music/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py:1150 strided_slice
        shrink_axis_mask=shrink_axis_mask)
    /home/sid/.pyenv/versions/ml-music/lib/python3.7/site-packages/tensorflow/python/ops/gen_array_ops.py:10179 strided_slice
        shrink_axis_mask=shrink_axis_mask, name=name)
    /home/sid/.pyenv/versions/ml-music/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:744 _apply_op_helper
        attrs=attr_protos, op_def=op_def)
    /home/sid/.pyenv/versions/ml-music/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py:595 _create_op_internal
        compute_device)
    /home/sid/.pyenv/versions/ml-music/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:3327 _create_op_internal
        op_def=op_def)
    /home/sid/.pyenv/versions/ml-music/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1817 __init__
        control_input_ops, op_def)
    /home/sid/.pyenv/versions/ml-music/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1657 _create_c_op
        raise ValueError(str(e))

    ValueError: slice index 0 of dimension 0 out of bounds. for '{{node sequential_1/lstm_3/strided_slice_2}} = StridedSlice[Index=DT_INT32, T=DT_FLOAT, begin_mask=0, ellipsis_mask=0, end_mask=0, new_axis_mask=0, shrink_axis_mask=1](sequential_1/lstm_3/transpose, sequential_1/lstm_3/strided_slice_2/stack, sequential_1/lstm_3/strided_slice_2/stack_1, sequential_1/lstm_3/strided_slice_2/stack_2)' with input shapes: [0,?,1], [1], [1], [1] and with computed input tensors: input[1] = <0>, input[2] = <1>, input[3] = <1>.


In [9]:
# This was taken mainly from: https://github.com/Skuldur/Classical-Piano-Composer/blob/master/predict.py#L104

def create_mid_file(output, output_name="test_output"):
    offset = 0
    output_notes = []

    # Loop through the output, and create a mid file
    for pattern in output:
        # If our pattern is a cord
        if ('.' in pattern) or pattern.isdigit():
            # Split up the chord to their individual notes
            notes_split = pattern.split('.')
            final_notes = []
            for current_note in notes_split:
                # Loop and convery one by one
                new_note = music21.note.Note(int(current_note))
                # We're going to use a Piano for all our sounds
                new_note.storedInstrument = music21.instrument.Piano()
                final_notes.append(new_note)
            new_chord = music21.chord.Chord(final_notes)
            new_chord.offset = offset
            output_notes.append(new_chord)
        # If our pattern is a note
        else:
            # Similarly if we see a single note
            new_note = music21.note.Note(pattern)
            new_note.offset = offset
            new_note.storedInstrument = music21.instrument.Piano()
            output_notes.append(new_note)

        # Offset each iteration, so we don't have stacking
        offset += 0.5

    midi_stream = music21.stream.Stream(output_notes)

    midi_stream.write('midi', fp=f"{output_name}.mid")

create_mid_file(output)

['7', '7', 'C4', 'G#3', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'E4', 'G#3', 'G#3', 'G#3', 'G4', 'G4', 'G4', 'G4', 'C5']


### Play the music created!
This is the fun part :). Unfortunatly I could not find a plugin that could render the `mid` files in Jupyter Notebook (Music21 developer said he was [occupied](https://github.com/jupyterlab/jupyterlab/issues/5615)). But I used [this](https://onlinesequencer.net/import) website to render the `mid` files for me. 

**Disclaimer:** All my models that I created suffered from the middle of the song being the same note over and over. So this was something consistent throughout all my iterations. All the "music" is centered towards the start and end of the song. So when listening to the song, skip to the start and end to get the actual "music"

#### Model 1
I went through a couple of differnt trial and error cycles. Initally I ran the orginal algorithum I got from the tutorial I found and got this as a result: 

In [1]:
import IPython.display as ipd
ipd.Audio('audio/output_simple_model.mp3')

If we visulize it we can see that there basically no chords being formed.

<img src="images/simple_model_visulization1.jpeg" width="400">
<img src="images/simple_model_visulization2.jpeg" width="400">

#### Model 2

Then I made some slight modifications and tweaked the algorithum based on the paper but ended up getting horrible runtime result. Each `epoch` ended up taking about 15 hours to run and I was only able to run it one time. I also had to set my `batch_size` to 10 as if I set it to anything higher I would get an `OOM` (Out of memory exeption and my grahics card would crash).

In [3]:
ipd.Audio('audio/model2_output.mp3')

If you visulize the keys together however you can see that there are clearly some chords being formed and some hints of a song being created:

<img src="images/simple_model2_visulization1.jpeg" width="400">
<img src="images/simple_model2_visulization2.jpeg" width="400">

#### Model 3

After consulting my friend Anthony Ter-Saakov, he gave me some advice on how I could speed up my algorithum. It turns out that my `sequence_length` was far too long, and as that was being treated as the input I was getting really bad runtimes. I reduced the `sequence_length` and I was then able to increase the `batch_size` which resulted in `epoch` times of about 1 hour. But again I got similar (if not worse) results.

In [4]:
ipd.Audio('audio/model3_output.mp3')

If we visulize it, we can see that the results are actually quite poor, nothing really resembaling a "chord"

<img src="images/simple_model3_visulization1.jpeg" width="400">
<img src="images/simple_model3_visulization2.jpeg" width="400">

### Takeaways

1. Machine learning takes a really *really* long time. A lot of my time was spent just running my algorithum. The model taken from the tutorial took about ~3 hours per `epoch`, the second model took about ~15 hours per `epoch` and the last model took about ~1 hour per `epoch`. Learning how to optmize your code and the input to the network can really make a huge differece not only in terms of runtime but also in terms of accuracy. We can see that between model 2 and 3, the runtime did decrease significantly, but the accuracy also did. There was no sense of a "chord" in model 3.

2. Learning how to utilize the tools around you can make a big difference. Utilizing my gaming computer not only saved me a lot of money (instead of running it on a remote, cloud computer), but also made it a lot easier to develop. I was the one who dictated when the computer would start or stop and I decided when I wanted to run my code. Combine that with the fact that you get the benefit of using standard functions such as `scp`, made it a lot easier to debug and try differnt things. Using an online Jupyter Notebook running on a cloud computer with 10 GPU's might have been faster, but it would have been hard to copy over files back and forth between my local computer in order to explore my options. 

3. The open-source community has fundimentaly changed the way people learn. The algorithums and methodology that I was using in this project was more-or-less cutting edge. This is the best of the best algorithums that can be applied to neural nets in order to generate music, and it was all available for me and you to use for **FREE**. This has changed the way we can learn and interact with these abstract topics, no longer do you need an advanced math degree to play around with these high-level concepts. Now anyone who has a gaming computer or access to a GPU can get started and create amazing pieces of code. 

### Works Cited
* https://github.com/Skuldur/Classical-Piano-Composer
* https://youtu.be/8HyCNIVRbSU
* https://medium.com/@alexissa122/generating-original-classical-music-with-an-lstm-neural-network-and-attention-abf03f9ddcb4
* Anthony Ter-Saakov
