# Music Generation with LSTM Network : training

I- Introduction
--------
Deep learning has always been a hard and challenging task for us, and probably for all data science students. This is due to the complexity of its network and the multitude of its hyperparameters that need to be tunned according to each case of study. However, its use is still attractive thanks to the promosing results that it shows on so many fields.

In addition to traditional tasks such as prediction, classification and translation, deep learning is receiving growing attention as an approach for music generation

This work describes an algorithmic approach to the generation of music based on LSTM model. The key goal is to model and learn musical notes, then generate new musical content.


II- Import necessary libraries
------------
* **Music21** is a Python toolkit that allows us to teach the fundamentals of music theory, generate music examples and study music. The purpose of using it here is to extract the contents of our dataset and to take the output of the neural network and translate it to musical notation.
* **Keras** is a free open source Python library for developing and evaluating deep learning models

In [0]:
import pickle
import numpy
from music21 import converter, instrument, note, chord
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.layers import Activation
from keras.layers import BatchNormalization as BatchNorm
from keras.utils import np_utils
from keras.callbacks import ModelCheckpoint
from mido import MidiFile
import os

In [0]:
def train_network():
    """ Train a Neural Network to generate music """
    notes = get_notes()

    # get amount of pitch names
    n_vocab = len(set(notes))

    network_input, network_output = prepare_sequences(notes, n_vocab)

    model = create_network(network_input, n_vocab)

    train(model, network_input, network_output)

III- Data Preparation
-----------

We work with audios in MIDI format. The dataset we worked with is ***midi_songs***. It is composed from 99 audios of piano music

In [0]:
midi_songs=os. listdir('midi_songs')

### Extract notes from the dataset

The **get_notes()** function is used to convert the audio files into Music21 objects with the statment *converter.parse()*. Then it devides each parsed file to streams. Each stream represents a different isntrument

Now the streams are prepared, we extract notes from them. Since we need just one instrument to extract notes we work just with one stream (parts[0]).

Here the Music21 library needs to distinguish between pitches and chords in order to extract notes properly:

* A Pitch (also called note) refers to the frequency of the sound, or how high or low it is and is represented with the letters [A, B, C, D, E, F, G], with A being the highest and G being the lowest.

* A chord, in music, is any harmonic set of pitches consisting of multiple notes that are heard as if sounding simultaneously.

By the end, we keep the notes extracted in a filepath.

In [0]:
def get_notes():
    """ Get all the notes and chords from the midi files in the ./midi_songs directory """
    notes = []
    
    for file in midi_songs:
        file = r'midi_songs/' + file
        #midi = MidiFile(file, clip=True)
        midi = converter.parse(file)

        print("Parsing %s" % file)

        notes_to_parse = None

        try: # file has instrument parts
            s2 = instrument.partitionByInstrument(midi) # devide the audio streams by instruments
            notes_to_parse = s2.parts[0].recurse() #devide every instrument stream to "sub streams" recursively
        except: # file has notes in a flat structure
            notes_to_parse = midi.flat.notes  # in case of  one instrument

        for element in notes_to_parse:
            if isinstance(element, note.Note): # collect notes names ex: B-2
                notes.append(str(element.pitch))
            elif isinstance(element, chord.Chord):  # collect chords (notes played simultaniously) ex: 8.0.11
                notes.append('.'.join(str(n) for n in element.normalOrder))

    with open('data/notes', 'wb') as filepath:
        pickle.dump(notes, filepath)

    return notes

### Prepare sequences used by the Neural Network
 The function **prepare_sequences(notes, n_vocab)** allows us to create input sequences of the network and their corresponding outputs
 
 Here we have chosen to work with a sequence of length=100. That means that we will train our model in a way that it will be allowed to figure out the next appropriate note or chord based on the previous 100 notes / chords.
 
 First, we create a dictionary called *note_to_int*. It contains the sorted notes generated from the audios. We provide each note/chord with an index in order to work with integers (the index of the notes) instead of chatarcters (aka the notes) as inputs of the LSTM network
 
 The variable *network_input* is a list that contains sub-lists. Each one represents a sequence (100 indexes referring to notes). And the variable *network_output* is a list of the indexes of notes corresponding to every sequence in the network_input
 
*n_patterns* corresponds to the number of sequences that will be introduced to the LSTM model
 
 It is more efficient to normalize the data before training any neural network because it speeds up learning and lead to faster convergence. To do so, network_input was devided by the number of the notes (n_vocabs)

 Now that the list of sequences is prepared, the last step is to reshape it into a format that is compatible with LSTM network. *The input_shape* argument in the network takes besides the batch size two arguments, the time step and the number of input units. The time step reffers to how many points (here notes) are in the sequence. The number of input units is the number of sequences, here called n_patterns
 
 The network_output is converted to zeros vectors with 1 in the the index referring to the note by the **np_utils.to_categorical()** function

In [0]:
def prepare_sequences(notes, n_vocab):
    """ Prepare the sequences used by the Neural Network """
    sequence_length = 100

    # get all pitch names
    pitchnames = sorted(set(item for item in notes)) # list of unrepeated notes  organised by lexical order

     # create a dictionary to map pitches to integers
    note_to_int = dict((note, number) for number, note in enumerate(pitchnames)) # dictionary of pitchnames indexed

    network_input = []
    network_output = []

    # create input sequences and the corresponding outputs
    for i in range(0, len(notes) - sequence_length, 1):
        sequence_in = notes[i:i + sequence_length]
        sequence_out = notes[i + sequence_length]
        network_input.append([note_to_int[char] for char in sequence_in])
        network_output.append(note_to_int[sequence_out])

    n_patterns = len(network_input)

    # reshape the input into a format compatible with LSTM layers
    network_input = numpy.reshape(network_input, (n_patterns, sequence_length, 1))
    # normalize input
    network_input = network_input / float(n_vocab)

    network_output = np_utils.to_categorical(network_output) #convert output notes to zeros vectors with 1 in the the index referring to the note

    return (network_input, network_output)



VI- Create the LSTM Neural Network
------
### Why LSTM?
Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture used in deep learning.  It has feedback connections that can  process sequences of data instead of one single data points. LSTMs were created in order to manage the exploding and vanishing gradient problems caused while using traditional RNNs. 

This type of neural networks gives the best results when it comes to sequential data such as time series and texts. Since musical notes can be seen as long-term patterns where historical information matters, LSTM networks will be extremly useful in our situation

### Network Architecture
* Here the network trained is composed of two LSTM layers with 512 neurons for each one.
* In order to avoid overfitting, two dropout layers are used. 
* And since data values are adjusted with weights and parameters of the network, making them too big or too small, by adding batch normalization layers, this issue is largely avoided.
* Each dense (fully connected) layer is followed by an activation function. RELU is used in the hidden layer and SOFTMAX is used so that the model gives by the end the probalility of appearance of each single note
* rmsprop optimizer is heavily used with LSTM network. It helps to deal with the problems of exploding and vanishing gradient because it uses a moving average of squared gradients to normalize the gradient itself. 

In [0]:
def create_network(network_input, n_vocab):
    """ create the structure of the neural network """
    model = Sequential()
    model.add(LSTM(
        512,
        input_shape=(network_input.shape[1], network_input.shape[2]),
        recurrent_dropout=0.3,
        return_sequences=True
    ))
    model.add(LSTM(512, return_sequences=True, recurrent_dropout=0.3,))
    model.add(LSTM(512))
    model.add(BatchNorm())
    model.add(Dropout(0.3))
    model.add(Dense(256))
    model.add(Activation('relu'))
    model.add(BatchNorm())
    model.add(Dropout(0.3))
    model.add(Dense(n_vocab))
    model.add(Activation('softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

    return model

### Fitting the model
The **train()** function is called to fit the network with the data already prepared (network_input and network_output). Here we trained the model with *100 epochs and batches of size 64*

For every epoch, the weights of the network are collected and saved

In [0]:
def train(model, network_input, network_output):
    """ train the neural network """
    filepath = "weights-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5"
    checkpoint = ModelCheckpoint(
        filepath,
        monitor='loss',
        verbose=0,
        save_best_only=True,
        mode='min'
    )
    callbacks_list = [checkpoint]

    model.fit(network_input, network_output, epochs=100, batch_size=64, callbacks=callbacks_list)

In [0]:
if __name__ == '__main__':
    train_network()