## LSTM and Music Generation

Credits: https://towardsdatascience.com/how-to-generate-music-using-a-lstm-neural-network-in-keras-68786834d4c5

### RNN: Classic Unrolling
![rnn](https://chunml.github.io/images/projects/creating-text-generator-using-recurrent-neural-network/vanilla_RNN.png)



Hidden layer at time step $t$ ($h_t$), is computed from using $h_{t-1}$ (from previous time step) ($\sigma$ = sigmoid):

$$h_t=\sigma(W_{xh}x_t+W_{hh}h_{t−1})$$

Output (only based on latest $h_t$):

$$y_t = softmax(W_{hy}h_t)$$

### RNN problem: Vanishing Gradient
- As we back-propagate, gradient of the cost function w.r.t. weights tends to diminish.
- This means we forget stuff from earlier time steps.
- Training becomes very slow because RNN doesn't remember much 

![vanishing gradient](https://cdn-images-1.medium.com/max/2000/1*FWy4STsp8k0M5Yd8LifG_Q.png)

### LSTM

![lstm](https://chunml.github.io/images/projects/creating-text-generator-using-recurrent-neural-network/LSTM.png)

- $h_{t-1}$ is the output at time step $t-1$
- Cell state ($C_t$) holds the "long-short term memory" and is controlled by 3 gates:
  - Input gate: decides which values to update ($i_t$)
  - Forget gate: decides which values to forget ($f_t$)
  - Output gate: decides which values to output ($o_t$)
  
- $h_t$ is the output at time step $t$

## Attention vs. LSTM / RNNs

In recent months, LSTMs and RNNs have fallen out of favour (machine learning is like the fashion industry). 

We won't address this in this workbook, but may cover it in the future.

Some background if you are curious:

https://towardsdatascience.com/the-fall-of-rnn-lstm-2d1594c74ce0

https://arxiv.org/abs/1502.03044

https://github.com/philipperemy/keras-attention-mechanism

## Music Generation

Train a neural network to generate midi files

Repository: https://github.com/Skuldur/Classical-Piano-Composer

Blog post: https://towardsdatascience.com/how-to-generate-music-using-a-lstm-neural-network-in-keras-68786834d4c5

### Setup

[Music21](http://web.mit.edu/music21/doc/about/what.html) is a Python toolkit for computer-aided musicology (study of music, editing and composing music)

Install to your environment:

```
(mldds03) pip install music21
```


In [1]:
# clone the repository
!git clone https://github.com/Skuldur/Classical-Piano-Composer

fatal: destination path 'Classical-Piano-Composer' already exists and is not an empty directory.


### Predict

As a starting point, we won't modify the script or setup, but try out the demo to make sure it still works.

Later on, we can add our own midi files, tweak the LSTM network, etc.

In [2]:
# !cd is needed because the ! syntax always evaluates from the current directory
!cd Classical-Piano-Composer & dir

 Volume in drive D is DATA
 Volume Serial Number is B200-6E0E

 Directory of D:\mldds-courseware\03_TextImage\Classical-Piano-Composer

04/08/2018  08:49 AM    <DIR>          .
04/08/2018  08:49 AM    <DIR>          ..
02/08/2018  02:43 PM                66 .gitattributes
02/08/2018  04:44 PM    <DIR>          data
04/08/2018  08:42 AM    <DIR>          logs
02/08/2018  02:43 PM             3,978 lstm.py
04/08/2018  08:42 AM             4,241 lstm_exercise.py
04/08/2018  08:41 AM    <DIR>          midi_songs
02/08/2018  05:25 PM    <DIR>          midi_songs2
04/08/2018  08:40 AM    <DIR>          midi_songs_bak
02/08/2018  02:43 PM             4,771 predict.py
02/08/2018  02:43 PM               917 README.md
02/08/2018  05:05 PM             8,536 test_output.mid
02/08/2018  02:55 PM             4,752 test_output.mid.back
04/08/2018  08:42 AM        44,081,840 weights-improvement-01-5.1664-bigger.hdf5
04/08/2018  08:43 AM        44,081,840 weights-improvement-02-5.0590-bigger.hdf5
04/08

There is a pre-trained LSTM network (`weights.hdf5`) already present. Let's run it to generate music.

In [3]:
!cd Classical-Piano-Composer & python predict.py

Using TensorFlow backend.
Traceback (most recent call last):
  File "predict.py", line 134, in <module>
    generate()
  File "predict.py", line 24, in generate
    model = create_network(normalized_input, n_vocab)
  File "predict.py", line 70, in create_network
    model.load_weights('weights.hdf5')
  File "C:\Users\issohl\AppData\Local\conda\conda\envs\mldds03\lib\site-packages\keras\engine\network.py", line 1152, in load_weights
    with h5py.File(filepath, mode='r') as f:
  File "C:\Users\issohl\AppData\Local\conda\conda\envs\mldds03\lib\site-packages\h5py\_hl\files.py", line 312, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "C:\Users\issohl\AppData\Local\conda\conda\envs\mldds03\lib\site-packages\h5py\_hl\files.py", line 142, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py\h5f.pyx",

In [4]:
# if you have Visual Studio Code installed, you can run this to inspect predict.py
# if you don't have Visual Studio Code installed, you can replace
# code with notepad or your text editor
!cd Classical-Piano-Composer & code predict.py


This produces the following file:

`test_output.mid`

In [5]:
# Reference: https://blog.ouseful.info/2016/09/13/making-music-and-embedding-sounds-in-jupyter-notebooks/

def play_midi(filename):
    """Plays a midi file
    Args:
        filename - path to the midi file
    """
    from music21 import midi
    
    mf = midi.MidiFile()
    mf.open(filename)
    mf.read()
    mf.close()
    stream = midi.translate.midiFileToStream(mf)
    stream.show('midi')

play_midi('Classical-Piano-Composer/test_output.mid')

### Train

Now that we've verified that the pre-trained network works, let's try training using our own midi files.

Note that the original network takes about 20 hours to train, so we'll just be training a smaller version of 
the network for demonstration purposes.

### Training with GPU

If you have a machine with an NVidia GPU, you can use keras-gpu to speed up training (about 6x speedup).

This requires uninstalling keras (cpu version) and install keras-gpu:

```
(mldds03) conda uninstall keras
(mldds03) conda install keras-gpu
``` 

Steps:

1. Rename the midi_songs folder to midi_songs_original

2. Rename `weights.hdf5` to `weights.hdf5.original`

3. Create an empty midi_songs folder.  Download about 5-10 .mid files of your choice into it.  The original network was trained on a single instrument.
- As an experimentation, you can add more instruments to see what happens (maybe gibberish),
- Or if you want to play it safe, pick midi files from just 1 instrument.

   - Example sources: 
    - http://meteorheaven.tripod.com/frame/mchi_male.htm
    - http://sanjeevmusic.com/


4. Create a copy of lstm.py called lstm_exercise.py. Edit this file:

   a. Edit the `prepare_sequences` functions to reduce the sequence length from 100 to 25. This will speed up training.

   b. Add a TensorBoard callback to monitor training progress.

    ```
    from keras.callbacks import TensorBoard
    from time import time

    ...

    tensorboard = TensorBoard(log_dir='./logs/{}'.format(time()),
                              histogram_freq=0,
                              batch_size=64,
                              write_graph=True)

    ...

    callbacks_list = [checkpoint, tensorboard]

    ```

5. Launch tensorboard:
    ```
   (mldds03) D:\mldds-courseware\03_TextImage\Classical-Piano-Composer>tensorboard --logdir logs --host=0.0.0.0
    
    2018-08-02 15:32:56.198899: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
    TensorBoard 1.8.0 at http://0.0.0.0:6006 (Press CTRL+C to quit)
    
    ```
   Open a browser window to http://localhost:6006

6. Start training:
    ```
    python lstm_exercise.py
    ```

If all goes to plan you should see output like this (depending on whether you are running with or without GPU):
   
### CPU-only
```

    (mldds03) D:\mldds-courseware\03_TextImage\Classical-Piano-Composer>python lstm_exercise.py
    Using TensorFlow backend.
    Parsing midi_songs\aidehuhuan.mid
    Parsing midi_songs\kewang.mid
    Parsing midi_songs\parapara.mid
    Parsing midi_songs\shuinenggaoshuwo.mid
    Parsing midi_songs\zhaomi.mid
    2018-08-02 15:35:15.560156: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
    Epoch 1/10
    5344/5344 [==============================] - 223s 42ms/step - loss: 4.4112
    Epoch 2/10
     832/5344 [===>..........................] - ETA: 3:06 - loss: 4.1681

```

### GPU

```
    (mldds03) D:\mldds-courseware\03_TextImage\Classical-Piano-Composer>python lstm_exercise.py
    Using TensorFlow backend.
    Parsing midi_songs\aidehuhuan.mid
    Parsing midi_songs\kewang.mid
    Parsing midi_songs\parapara.mid
    Parsing midi_songs\shuinenggaoshuwo.mid
    Parsing midi_songs\zhaomi.mid
    2018-08-02 15:48:55.160750: I C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
    2018-08-02 15:48:55.635377: I C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:01:00.0
    totalMemory: 11.00GiB freeMemory: 9.01GiB
    2018-08-02 15:48:55.722389: I C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 1 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:02:00.0
    totalMemory: 11.00GiB freeMemory: 9.01GiB
    2018-08-02 15:48:55.727757: I C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0, 1
    2018-08-02 15:48:57.789074: I C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
    2018-08-02 15:48:57.793350: I C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929]      0 1
    2018-08-02 15:48:57.796512: I C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0:   N N
    2018-08-02 15:48:57.800165: I C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 1:   N N
    2018-08-02 15:48:57.805051: I C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8713 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
    2018-08-02 15:48:58.227726: I C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 8713 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)

    Epoch 1/10
    5344/5344 [==============================] - 37s 7ms/step - loss: 4.3308
    Epoch 2/10
    1856/5344 [=========>....................] - ETA: 22s - loss: 4.1595
```

Tensorboard should show the network graph, but it will take some time before a loss curve is shown.

LSTM graph:

![tensorboard](assets/lstm/tensorboard_1.png)

Initial loss values:

![tensorboard](assets/lstm/tensorboard_2.png)

### Predict (again)

Copy the lowest cost weights to weights.hdf5, then run predict.py.

1. copy weights-improvement-<iteration>-<cost>-bigger.hdf5 weights.hdf5
2. python predict.py

In [6]:
play_midi('Classical-Piano-Composer/test_output.mid')

## Appendix

How input and output sequences are generated for LSTM training.

In [8]:
# Annotated versions of the same converter functions in lstm.py

def get_notes(filename):
    """ Get all the notes and chords from a given filename """
    from music21 import converter, instrument, note, chord
    
    notes = []

    midi = converter.parse(filename)

    print("Parsing %s" % file)

    notes_to_parse = None

    try: # file has instrument parts
        s2 = instrument.partitionByInstrument(midi)
        notes_to_parse = s2.parts[0].recurse() 
    except: # file has notes in a flat structure
        notes_to_parse = midi.flat.notes

    for element in notes_to_parse:
        if isinstance(element, note.Note):
            notes.append(str(element.pitch))
        elif isinstance(element, chord.Chord):
            notes.append('.'.join(str(n) for n in element.normalOrder))
    return notes


file = 'Classical-Piano-Composer/midi_songs/AANE WALA PAL.mid' # change to your midi file
notes = get_notes(file)
print('len(notes):', len(notes))
print(notes)

Parsing Classical-Piano-Composer/midi_songs/AANE WALA PAL.mid
len(notes): 1328
['F5', 'F4', 'F2', 'D5', 'F5', 'C#5', 'C5', 'F5', 'A4', 'F2', 'G#4', 'F5', 'A4', 'F4', 'F5', 'F4', 'F2', 'D5', 'F5', 'C#5', 'C5', 'F5', 'A4', 'F2', 'G#4', 'F5', 'A4', 'F4', 'F5', 'F4', 'E4', 'D2', 'D4', 'E4', 'D4', 'D5', 'E4', 'D4', 'E4', 'D4', 'F5', 'C#5', 'E5', 'D5', 'E5', 'D5', 'C5', 'E5', 'D5', 'E5', 'D5', 'F5', 'A4', 'E4', 'D2', 'D4', 'E4', 'D4', 'G#4', 'E4', 'D4', 'E4', 'D4', 'F5', 'A4', 'E5', 'D5', 'E5', 'F2', 'D5', 'F4', 'E5', 'D5', '4.5', 'G2', '2.7', 'A4', 'A2', 'B-4', 'B-2', 'F3', '10.2', 'B-4', 'F4', 'F3', 'D5', '2.4', 'E3', '0.4', '11.4', 'D3', '4.9', 'B-4', '5.10', '5.10', '2', 'B-4', 'F4', '6.10.0', '10.3', '10.0', '1.5', '9.10.2.5', '4.9.10', 'D4', 'D5', 'B-5', 'C4', '6.10', 'F#3', '10.1.2.5', 'E5', 'C2', '6.10', 'B-3', '10.3', '5', 'B-2', '6.10.0', '10.3', 'B-5', 'F3', '2.4.5.6.9.10', 'C#2', '10.2', 'F5', 'E5', 'B-5', 'C4', '6.10', '10.1.2.5', 'D5', 'F3', '0.6', 'B-5', 'B-3', '10.3', 'B-4', 

In [9]:
# Annotated versions of the same converter functions in lstm.py

def prepare_sequences(notes, n_vocab):
    """ Prepare the sequences used by the Neural Network """
    sequence_length = 25

    import numpy as np
    from keras.utils import np_utils

    # get all pitch names
    pitchnames = sorted(set(item for item in notes))

     # create a dictionary to map pitches to integers
    note_to_int = dict((note, number) for number, note in enumerate(pitchnames))
    print('note_to_int mapping:', note_to_int)

    network_input = []
    network_output = []

    # create input sequences and the corresponding outputs
    for i in range(0, len(notes) - sequence_length, 1):
        sequence_in = notes[i:i + sequence_length]
        sequence_out = notes[i + sequence_length]
        print('sequence_in', sequence_in)
        print('sequence_out', sequence_out)
        
        network_input.append([note_to_int[char] for char in sequence_in])
        network_output.append(note_to_int[sequence_out])

    n_patterns = len(network_input)
    print('n_patterns:', n_patterns)

    # reshape the input into a format compatible with LSTM layers
    network_input = np.reshape(network_input, (n_patterns, sequence_length, 1))
    print('network_input.shape:', network_input.shape)

    # normalize input
    print('network_input:', network_input)
    network_input = network_input / float(n_vocab)
    print('network_input (normalized to 0-1):', network_input)

    print('output (labels):', network_output)
    network_output = np_utils.to_categorical(network_output)
    print('output (categorical):', network_output)

    return (network_input, network_output)

n_vocab = len(set(notes))
print('n_vocab', n_vocab)
X, y = prepare_sequences(notes, n_vocab)

n_vocab 150


Using TensorFlow backend.


note_to_int mapping: {'0': 0, '0.1': 1, '0.2': 2, '0.2.4': 3, '0.2.4.6': 4, '0.2.4.7': 5, '0.3': 6, '0.3.4.6': 7, '0.3.4.6.7': 8, '0.3.5.7': 9, '0.4': 10, '0.4.5': 11, '0.4.6': 12, '0.4.7': 13, '0.5': 14, '0.6': 15, '1.4': 16, '1.5': 17, '1.5.9': 18, '10': 19, '10.0': 20, '10.0.1.2.4.6': 21, '10.0.2.4.6': 22, '10.0.2.5.6': 23, '10.0.2.6': 24, '10.0.3.6': 25, '10.1': 26, '10.1.2.5': 27, '10.11.1': 28, '10.11.1.3.6': 29, '10.2': 30, '10.2.3.5': 31, '10.2.4': 32, '10.3': 33, '10.3.4': 34, '11.4': 35, '2': 36, '2.4': 37, '2.4.5.6.9.10': 38, '2.4.6.7.9.10': 39, '2.4.6.9.10': 40, '2.5': 41, '2.5.6.9': 42, '2.5.6.9.10': 43, '2.5.9': 44, '2.6.9.10': 45, '2.7': 46, '3.4.6.7.10.0': 47, '3.4.6.9.10.0': 48, '3.5.10': 49, '3.5.6.9.10': 50, '3.5.9': 51, '3.6.10': 52, '3.6.7.10': 53, '3.7': 54, '4.5': 55, '4.5.6.10.0': 56, '4.5.6.9.10.0': 57, '4.5.6.9.10.11.0': 58, '4.6.10': 59, '4.6.10.0': 60, '4.6.11': 61, '4.6.7.9.10.0.1': 62, '4.7': 63, '4.7.10': 64, '4.7.9.10.0': 65, '4.7.9.10.0.1': 66, '4.9': 6

sequence_in ['0.6', 'B-5', '5.7', 'A3', '10.3', 'D2', 'D3', '2.5', 'A4', '6.10.0', '10.3', 'F4', 'F3', '10.0', 'A2', '9.10.2.4.5', 'C#2', 'D4', 'B-5', '5.6.10', 'C4', 'F#3', 'D2', '2.5.9', '10.1']
sequence_out G4
sequence_in ['B-5', '5.7', 'A3', '10.3', 'D2', 'D3', '2.5', 'A4', '6.10.0', '10.3', 'F4', 'F3', '10.0', 'A2', '9.10.2.4.5', 'C#2', 'D4', 'B-5', '5.6.10', 'C4', 'F#3', 'D2', '2.5.9', '10.1', 'G4']
sequence_out D3
sequence_in ['5.7', 'A3', '10.3', 'D2', 'D3', '2.5', 'A4', '6.10.0', '10.3', 'F4', 'F3', '10.0', 'A2', '9.10.2.4.5', 'C#2', 'D4', 'B-5', '5.6.10', 'C4', 'F#3', 'D2', '2.5.9', '10.1', 'G4', 'D3']
sequence_out C2
sequence_in ['A3', '10.3', 'D2', 'D3', '2.5', 'A4', '6.10.0', '10.3', 'F4', 'F3', '10.0', 'A2', '9.10.2.4.5', 'C#2', 'D4', 'B-5', '5.6.10', 'C4', 'F#3', 'D2', '2.5.9', '10.1', 'G4', 'D3', 'C2']
sequence_out 6.10
sequence_in ['10.3', 'D2', 'D3', '2.5', 'A4', '6.10.0', '10.3', 'F4', 'F3', '10.0', 'A2', '9.10.2.4.5', 'C#2', 'D4', 'B-5', '5.6.10', 'C4', 'F#3', 'D2',

sequence_in ['0.2.4.7', 'C#2', '2.6.9.10', 'B-5', 'C4', 'C4', '6.10', 'C2', 'E4', 'C3', '10.1', 'G4', 'E2', 'D3', '0.6', 'B-5', 'F3', '10.3', '3.6.7.10', 'F2', 'A3', '0.5', 'A4', '6.10.0', 'A4']
sequence_out 10.0
sequence_in ['C#2', '2.6.9.10', 'B-5', 'C4', 'C4', '6.10', 'C2', 'E4', 'C3', '10.1', 'G4', 'E2', 'D3', '0.6', 'B-5', 'F3', '10.3', '3.6.7.10', 'F2', 'A3', '0.5', 'A4', '6.10.0', 'A4', '10.0']
sequence_out C3
sequence_in ['2.6.9.10', 'B-5', 'C4', 'C4', '6.10', 'C2', 'E4', 'C3', '10.1', 'G4', 'E2', 'D3', '0.6', 'B-5', 'F3', '10.3', '3.6.7.10', 'F2', 'A3', '0.5', 'A4', '6.10.0', 'A4', '10.0', 'C3']
sequence_out 9.10.0.4.5
sequence_in ['B-5', 'C4', 'C4', '6.10', 'C2', 'E4', 'C3', '10.1', 'G4', 'E2', 'D3', '0.6', 'B-5', 'F3', '10.3', '3.6.7.10', 'F2', 'A3', '0.5', 'A4', '6.10.0', 'A4', '10.0', 'C3', '9.10.0.4.5']
sequence_out C#2
sequence_in ['C4', 'C4', '6.10', 'C2', 'E4', 'C3', '10.1', 'G4', 'E2', 'D3', '0.6', 'B-5', 'F3', '10.3', '3.6.7.10', 'F2', 'A3', '0.5', 'A4', '6.10.0', 'A