# FIET Codescrum Workshop

## Music Generation using Recurrent Neural Networks (RNN)

The purpose of this workshops is to show an exciting application of recurrent neural networks. These types of neural networks excel at processing natural language and creating sequential models for __NLP applications__. The RNNs can also be used for __machine translation__, __speech recognition__, __sentiment analysis__, __DNA sequences analysis__, etc. We hope to spark your curiosity so that you can play with them and think novel ways to apply neural networks in your fields of interest.

Deep Learning and in particular RNNs can also be applied to Telecommunications, there are some articles about __call volume forecasting__ using RNNs, __learning interconnection networks structures__ and __modelling server workloads__ using time series data and RNNs sequence models.

### Overview

### 1. Deep Learning

![Deep Learning](./images/deep_learning.png)

Deep learning is a machine learning technique that teaches computers to do what comes naturally to humans: __learn by example__. Deep Learning is based on one of the early machine learning algorithms: __Artificial Neural Networks__.

Neural Networks are inspired by our understanding of the biology of our brains – all those interconnections between the neurons. But, unlike a biological brain where any neuron can connect to any other neuron within a certain physical distance, these artificial neural networks have __discrete layers__, __connections__, and __directions of data propagation__.



### 2. Artificial Neural Networks for Supervised Learning

![Types of Neural Networks](./images/types_nn.png)

Currently there are a lot of types of Artificial Neural Networks, each of them devised with a particular purpose. For example the Standard NN are very useful at classic __regression and classification__ tasks using structured data. However, Convolutional NN are best suited for working with unstructured visual data such as images or video for tasks like __image recognition__, __tagging__, etc. The Recurrent NN are used mostly for __Natural Language Processing__ and __sequential data__ (time series, music, etc.).

### 3. Core Concepts

#### 3.1 Logistic Regression

_Regression analysis_ estimates the relationship between statistical input variables in order to predict an outcome variable. Logistic regression is a regression model that uses input variables to predict a categorical outcome variable that can take on one of a limited set of class values, for example “__cancer__” / “__no cancer__”.

Logistic regression applies the logistic sigmoid function to weighted input values to generate a prediction of which of two classes the input data belongs to.

![Sigmoid Function](./images/sigmoid.png)

Logistic regression is similar to a non-linear __perceptron__ or a neural network without _hidden layers_. The main difference from other basic models is that logistic regression is easy to interpret and reliable if some statistical properties for the input variables hold.

#### 3.2 Activation Function

An activation function takes in weighted data (matrix multiplication between input data and weights) and outputs a non-linear transformation of the data.

![Activation Functions](./images/activations.png)

#### 3.3 Cost Function (Error Function)

In _Machine Learning_, cost functions are used to estimate how badly models are performing. Put simply, __a cost function is a measure of how wrong the model is in terms of its ability to estimate the relationship between X and y__. This cost function (you may also see this referred to as loss or error.) can be estimated by iteratively running the model to compare estimated predictions against “ground truth” — the known values of y.

The objective of a ML model, therefore, is to find parameters, weights or a structure that __minimises__ the cost function.

#### Cross Entropy Loss

![Log Loss](./images/log_loss.png)

Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label.

#### 3.4 Optimization Algorithm

#### Gradient Descent

Is a technique to minimize __loss__ by computing the gradients of loss with respect to the model's parameters, conditioned on training data. Informally, gradient descent iteratively adjusts parameters, gradually finding the best combination of __weights__ and bias to minimize loss.

![Gradient Descent](./images/gradient_descent.png)

#### Other Optimization Algorithms

* __Gradient Descent__ Variants
    - __Batch__ gradient descent
    - __Stochastic__ gradient descent
    - __Mini-batch__ gradient descent
* __RMSProp__
* __ADAM__
* __Momentum__

### 4. Recurrent Neural Networks

![Types of RNNs](./images/rnn_types.png)

__Sequential processing in absence of sequences__. You might be thinking that having sequences as inputs or outputs could be relatively rare, but an important point to realize is that even if your inputs/outputs are fixed vectors, it is still possible to use this powerful formalism to process them in a sequential manner. For instance, the figure below shows results from two very nice papers from [DeepMind](http://deepmind.com/). On the left, an algorithm learns a recurrent network policy that steers its attention around an image; In particular, it learns to read out house numbers from left to right ([Ba et al.](http://arxiv.org/abs/1412.7755)). On the right, a recurrent network generates images of digits by learning to sequentially add color to a canvas ([Gregor et al.](http://arxiv.org/abs/1502.04623)):

<img src="./images/house_read.gif" alt="drawing" width="300px"/> <img src="./images/house_generate.gif" alt="drawing" width="400px" height="400px"/>

### Workshop !

Now we are going to see a RNN in action. This workshop is based on the _Classical-Piano-Composer_ Github Repo by [Skuldur](https://github.com/Skuldur) (Sigurður Skúli Sigurgeirsson). To generate music using RNNs one of the most important steps is the audio pre-processing, meaning transforming the songs that we are going to use to train the model into a appropriate format for a RNN (i.e. In a sequence of notes and chords).


#### Extracting the Notes from the music files

To train the RNN we are going to use music files in an special format. The most convenient format to work with is the __midi__ format. This format is used to describe electronic instruments musical features likes notes, chords, duration, tempo, etc. In this case the music files consists from piano songs from video games like Final Fantasy and The Legend Of Zelda.

Using the __music21__ python package developed by the _MIT_ we can extract the musical features we need to represent the songs as a list of strings representing the notes and chords. Running the following cell will parse the songs and then append them to a list called notes. The songs must be in _midi_ format and be located in the __midi_songs__ folder.

In [1]:
import music21
from lstm import *

notes = get_notes()
len(notes)

Using TensorFlow backend.


Parsing midi_songs/0fithos.mid
Parsing midi_songs/8.mid
Parsing midi_songs/ahead_on_our_way_piano.mid
Parsing midi_songs/AT.mid
Parsing midi_songs/balamb.mid
Parsing midi_songs/bcm.mid
Parsing midi_songs/BlueStone_LastDungeon.mid
Parsing midi_songs/braska.mid
Parsing midi_songs/caitsith.mid
Parsing midi_songs/Cids.mid
Parsing midi_songs/cosmo.mid
Parsing midi_songs/costadsol.mid
Parsing midi_songs/dayafter.mid
Parsing midi_songs/decisive.mid
Parsing midi_songs/dontbeafraid.mid
Parsing midi_songs/DOS.mid
Parsing midi_songs/electric_de_chocobo.mid
Parsing midi_songs/Eternal_Harvest.mid
Parsing midi_songs/EyesOnMePiano.mid
Parsing midi_songs/ff11_awakening_piano.mid
Parsing midi_songs/ff1battp.mid
Parsing midi_songs/FF3_Battle_(Piano).mid
Parsing midi_songs/FF3_Third_Phase_Final_(Piano).mid
Parsing midi_songs/ff4-airship.mid
Parsing midi_songs/Ff4-BattleLust.mid
Parsing midi_songs/ff4-fight1.mid
Parsing midi_songs/ff4-town.mid
Parsing midi_songs/FF4.mid
Parsing midi_songs/ff4_piano_collec

59020

The songs get splitted into two object types: __Notes__ and __Chords__. Note objects contain information about the pitch, octave, and offset of the Note.

* __Pitch__ refers to the frequency of the sound, or how high or low it is and is represented with the letters [A, B, C, D, E, F, G], with A being the highest and G being the lowest.

* __Octave__ refers to which set of pitches you use on a piano.

* __Offset__ refers to where the note is located in the piece.

In [6]:
print(notes[200:250])

['C5', 'A4', 'C5', 'A4', '6', 'C5', 'E5', 'F#5', 'E5', '5', 'C5', 'A4', 'C5', 'A4', '4', 'C5', 'E5', 'F5', 'E5', '5', 'C5', 'A4', 'C5', 'A4', '6', 'C5', 'E5', 'F#5', 'E5', '5', 'C5', 'A4', 'C5', 'A4', 'A2', 'C5', 'E5', 'F5', 'A2', 'E5', 'A2', 'C5', 'B4', 'C5', 'A4', 'A2', 'F4', 'E4', 'F4', 'A2']


![Music Sheet Representation](./images/music.jpg)

We can see that to generate music accurately our neural network will have to be able to predict which note or chord is next. That means that our prediction array will have to contain every note and chord object that we encounter in our training set. 

Next, we will create a mapping function to map from string-based categorical data to integer-based numerical data. This is done because neural network perform much better with integer-based numerical data than string-based categorical data.

![Categorical Encoding](./images/categorical.png)

Now, we need to generate the input and output sequences for the model. In this case the __sequence_lenght__ was set to 100. This means that the RNN uses the previous 100 notes to predict the next note. The final task in this step is to normalize and encode the outputs to input them to the RNN.

#### RNN Model

The RNN architecture used to train the model is the following:

![RNN Architecture](./images/rnn_architecture_workshop.png)

#### LSTM Layer

__LSTM__ stands for __Long Short Term Memory__ this is a special Recurrent Neural Network that has the ability to learn long range connections in a sequence by using a "_memory cell_" that can store a relation betwen elements in a sequence. This helps to solve the problem of __vanishing gradients__ that is very common in deep neural networks.

![Vanishing Gradients](./images/vanishing.png)

#### Dropout

__Dropout__ is a regularisation technique that "shutdown" some of the layer inputs according to a probability. This helps to prevent __overfitting__ improving the generalization power of the input/output mapping of the model. 

![Dropout](./images/dropout.png)

#### Dense Layer (Fully Connected Layer)

In a __Dense__ layer each neuron is connected to every neuron in the previous layer, and each connection has it's own weight. This is layer is usually used with the output of a non-linear activation from the previous layer and is also used before the classifier.

![Dense](./images/dense.png)

#### Softmax Layer

![Softmax](./images/softmax.png)

#### Training the Model

The presented RNN architecture is then coded using __Keras__ and __Tensorflow__. Keras allows to easily construct Neural Networks using pre defined high level abstractions of the components mentioned before.

```
model = Sequential()
model.add(LSTM(512, input_shape=(network_input.shape[1], network_input.shape[2]), return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(512, return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(512))
model.add(Dense(256))
model.add(Dropout(0.3))
model.add(Dense(n_vocab))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
```

The model uses __Categorical Cross Entropy__ as _Loss Function_ and the __RMSProp__ algorithm as the optimizer, the optimizing algorithm can be chosed empirically.

![Categorical Cross Entropy](./images/cat_log_loss.png)

Now we can train the model, for this task we must define the last training parameters. Which are the __epochs__ and __batch size__. The number of epochs mean the number of times the training dataset is passed through the network and then the parameters are optimized. The batch size is used for the optimization algorithm. The recommended batch size is usually a power of 2 less than or equal to 64. If we set the batch size to 1 then the optimization algorithm becomes __Stochastic Gradient Descent__.

```
model.fit(network_input, network_output, epochs=200, batch_size=64, callbacks=callbacks_list)
```

![Training the RNN](./images/loss.png)

#### Generating Music!

Generating the musical sequence require the reverse steps. This is converting the predicted sequence to notes and converting theses notes to a midi file to generate the final song. To generate the predictions we select a random ponit in the sequence and then start predicting what is the next note. The program is hard coded to predict the next 500 notes.

Calling the __generate()__ method will use the same network that we trained, load the parameters and process the sequences to generate a _midi_ file called __test_output.midi__. In the following cell we generate the song and print the details about the trained network.

In [7]:
from predict import *

generate()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 100, 512)          1052672   
_________________________________________________________________
dropout_1 (Dropout)          (None, 100, 512)          0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 100, 512)          2099200   
_________________________________________________________________
dropout_2 (Dropout)          (None, 100, 512)          0         
_________________________________________________________________
lstm_3 (LSTM)                (None, 512)               2099200   
_________________________________________________________________
dense_1 (Dense)              (None, 256)               131328    
_________________________________________________________________
dropout_3 (Dropout)          (None, 256)               0         
__________

In [1]:
from midi2audio import FluidSynth
import IPython.display as ipd

FluidSynth().midi_to_audio('test_output.mid', 'output.wav')

In [2]:
ipd.Audio('output.wav') # load a local WAV file

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.


### References

* https://developer.nvidia.com/deep-learning
* https://www.mathworks.com/discovery/deep-learning.html
* [Coursera Deep Learning Specialization, Andrew Ng](https://www.deeplearning.ai/)
* https://rdipietro.github.io/friendly-intro-to-cross-entropy-loss/
* https://towardsdatascience.com/machine-learning-fundamentals-via-linear-regression-41a5d11f5220
* https://developers.google.com/machine-learning/crash-course/glossary
* https://www.quora.com/Does-Gradient-Descent-Algo-always-converge-to-the-global-minimum
* http://ruder.io/optimizing-gradient-descent/index.html
* https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6
* http://karpathy.github.io/2015/05/21/rnn-effectiveness/
* https://towardsdatascience.com/how-to-generate-music-using-a-lstm-neural-network-in-keras-68786834d4c5