# Generating music with RNN's the simple way

Whilst most of the RNN applications required large amount of training time and resources, music generation is surprisingly easy to achieve *good* results. In this tutorial heavily based on \cite{blog} we will apply simple RNN's networks to produce traditional folk tunes.

## The data

Perhaps the most important question we must answer before getting in greater detail is which data representation we will use, since this will determine the complexity of the model required. Here, since we don't want to waste weeks training our models we will use a very simple approach: The ABC notation.

### ABC notation

Citing from the wikipedia \cite{wikipedia}:

>ABC notation is a shorthand form of musical notation. In basic form it uses the letters A through G to represent the given notes, with other elements used to place added value on these - sharp, flat, the length of the note, key, ornamentation. Lines in the first part of the tune notation, beginning with a letter followed by a colon, indicate various aspects of the tune such as the index, when there are more than one tune in a file (X:), the title (T:), the time signature (M:), the default note length (L:), the type of tune (R:) and the key (K:). Lines following the key designation represent the tune. It can be translated into traditional music notation using one of the abc conversion tools.

For instance here we have an example of "Greensleves" written in ABC notation

``` abc
X:870
T:Greensleeves
C:anon.
O:England
R:Broadside ballad
Z:Transcribed by Frank Nordberg - http://www.musicaviva.com
F:http://abc.musicaviva.com/tunes/england/greensleeves-dorian.abc
M:6/4
L:1/4
Q:1/2=110
K:Gdor
G|"Gm"B2c d>ed|"F"c2A F>GA|"Gm"B2A G>^FG|"Dm"A2^F D2G|
"Gm"B2c d>ed|"F"c2A F>GA|"Gm"B>AG "D"^F>EF|"Gm"G3 G2z|
"Bb"f3 f>ed|"F"c2A "Dm"F>GA|"Gm"B2G G>^FG|"Dm"A2^F D2z|
"Bb"f3 f>ed|"F"c2A "Dm"F>GA|"Gm"B>AG "D"^F>EF|"G"G3 G2|]
```

This notation is great because it enables to generate music only with the common alphabet set, without complicated notations and symbols. Also, a vast amount of tunes is available in the internets, which is important for learning.

## The model

For this exercise we will be using a simple implementation of rnn for text called char-rnn, originally developed by the omnipresent Karpathy. An implementation written in tersorflow is available at github.

It has two scripts, one for training and one for sampling. We can see the many options available calling them with *-h*.

In [1]:
!git clone https://github.com/sherjilozair/char-rnn-tensorflow.git
!python ./char-rnn-tensorflow/train.py -h
!python ./char-rnn-tensorflow/sample.py -h

Cloning into 'char-rnn-tensorflow'...
remote: Counting objects: 143, done.[K
remote: Total 143 (delta 0), reused 0 (delta 0), pack-reused 143[K
Receiving objects: 100% (143/143), 455.78 KiB | 261.00 KiB/s, done.
Resolving deltas: 100% (76/76), done.
Checking connectivity... done.
usage: train.py [-h] [--data_dir DATA_DIR] [--save_dir SAVE_DIR]
                [--rnn_size RNN_SIZE] [--num_layers NUM_LAYERS]
                [--model MODEL] [--batch_size BATCH_SIZE]
                [--seq_length SEQ_LENGTH] [--num_epochs NUM_EPOCHS]
                [--save_every SAVE_EVERY] [--grad_clip GRAD_CLIP]
                [--learning_rate LEARNING_RATE] [--decay_rate DECAY_RATE]
                [--init_from INIT_FROM]

optional arguments:
  -h, --help            show this help message and exit
  --data_dir DATA_DIR   data directory containing input.txt
  --save_dir SAVE_DIR   directory to store checkpointed models
  --rnn_size RNN_SIZE   size of RNN hidden state
  --num_layers NUM_LAYERS
       

With regard to the hyperparameters by default, if we take a look into train.py we can see which ones it uses.


In [None]:
 parser.add_argument('--data_dir', type=str, default='data/tinyshakespeare',
                       help='data directory containing input.txt')
    parser.add_argument('--save_dir', type=str, default='save',
                       help='directory to store checkpointed models')
    parser.add_argument('--rnn_size', type=int, default=128,
                       help='size of RNN hidden state')
    parser.add_argument('--num_layers', type=int, default=2,
                       help='number of layers in the RNN')
    parser.add_argument('--model', type=str, default='lstm',
                       help='rnn, gru, or lstm')
    parser.add_argument('--batch_size', type=int, default=50,
                       help='minibatch size')
    parser.add_argument('--seq_length', type=int, default=50,
                       help='RNN sequence length')
    parser.add_argument('--num_epochs', type=int, default=50,
                       help='number of epochs')
    parser.add_argument('--save_every', type=int, default=1000,
                       help='save frequency')
    parser.add_argument('--grad_clip', type=float, default=5.,
                       help='clip gradients at this value')
    parser.add_argument('--learning_rate', type=float, default=0.002,
                       help='learning rate')
    parser.add_argument('--decay_rate', type=float, default=0.97,
                       help='decay rate for rmsprop')                       
    parser.add_argument('--init_from', type=str, default=None,
                       help="""continue training from saved model at this path. Path must contain files saved by previous training process: 
                            'config.pkl'        : configuration;
                            'chars_vocab.pkl'   : vocabulary definitions;
                            'checkpoint'        : paths to model file(s) (created by tf).
                                                  Note: this file contains absolute paths, be careful when moving files around;
                            'model.ckpt-*'      : file(s) with model definition (created by tf)
                        """)

Notice how we can especify the size of the hidden state, the number of layers, the number of steps used for training and the typical stuff related to SGD, batch size, learning rate and others. Those parameters are used later to build the network inside model.py. The relevant part is shown commented below.


In [None]:
# First we chose the type of RNN according the parameters provided. The default value is lstm, yet we will use gru.
if args.model == 'rnn':
    cell_fn = rnn_cell.BasicRNNCell
elif args.model == 'gru':
    cell_fn = rnn_cell.GRUCell
elif args.model == 'lstm':
    cell_fn = rnn_cell.BasicLSTMCell
else:
    raise Exception("model type not supported: {}".format(args.model))

# The cell is initialied and replicated as layers have been determined. This value is 128 x 2 layers in the defaults
cell = cell_fn(args.rnn_size)

self.cell = cell = rnn_cell.MultiRNNCell([cell] * args.num_layers)

# The placeholders of the network are initialized, along with the initial state.
self.input_data = tf.placeholder(tf.int32, [args.batch_size, args.seq_length])
self.targets = tf.placeholder(tf.int32, [args.batch_size, args.seq_length])
self.initial_state = cell.zero_state(args.batch_size, tf.float32)

# The output softmax weights and biases are built here
with tf.variable_scope('rnnlm'):
    softmax_w = tf.get_variable("softmax_w", [args.rnn_size, args.vocab_size])
    softmax_b = tf.get_variable("softmax_b", [args.vocab_size])
    with tf.device("/cpu:0"):
        # The embedding is created here, mapping the inputs to the state of the rnn
        embedding = tf.get_variable("embedding", [args.vocab_size, args.rnn_size])
        inputs = tf.split(1, args.seq_length, tf.nn.embedding_lookup(embedding, self.input_data))
        inputs = [tf.squeeze(input_, [1]) for input_ in inputs]

# The loop function passes does a forward pass of the data through the RNN cells        
def loop(prev, _):
    prev = tf.matmul(prev, softmax_w) + softmax_b
    prev_symbol = tf.stop_gradient(tf.argmax(prev, 1))
    return tf.nn.embedding_lookup(embedding, prev_symbol)

# We get the final output of the network
outputs, last_state = seq2seq.rnn_decoder(inputs, self.initial_state, cell, loop_function=loop if infer else None, scope='rnnlm')
output = tf.reshape(tf.concat(1, outputs), [-1, args.rnn_size])
# We input the output of the cell to the softmax
self.logits = tf.matmul(output, softmax_w) + softmax_b
self.probs = tf.nn.softmax(self.logits)
# We set the loss
loss = seq2seq.sequence_loss_by_example([self.logits],
        [tf.reshape(self.targets, [-1])],
        [tf.ones([args.batch_size * args.seq_length])],
        args.vocab_size)
# The loss is averaged
self.cost = tf.reduce_sum(loss) / args.batch_size / args.seq_length
self.final_state = last_state
# Learning rate variable
self.lr = tf.Variable(0.0, trainable=False)
tvars = tf.trainable_variables()
# Gradient clipping
grads, _ = tf.clip_by_global_norm(tf.gradients(self.cost, tvars),
        args.grad_clip)
# Optimize using Adam
optimizer = tf.train.AdamOptimizer(self.lr)
self.train_op = optimizer.apply_gradients(zip(grads, tvars))

Since we don't want to mess too much with the hyperparameters, we will take the default ones as good enough, with the exception of using GRU instead of LTSM since the amount of data is not large. First we need the data. We will use this (http://www.norbeck.nu/abc/hn201602.zip), but any abc tune collection is equally good. After concat all the files in one we are ready to train. Further cleaning is possible, and it will improve the resulting generator.

In [6]:
!wget -N http://www.norbeck.nu/abc/hn201602.zip
!tar xvf hn201602.zip
!mv ./s/* ./i
!cat ./i/*abc > ./char-rnn-tensorflow/data/input.txt
!tail -n 100 ./char-rnn-tensorflow/data/input.txt
!ls ./char-rnn-tensorflow/data
!rm -R ./s ./i

--2016-06-13 15:17:41--  http://www.norbeck.nu/abc/hn201602.zip
Resolving www.norbeck.nu... 194.9.94.73
Connecting to www.norbeck.nu|194.9.94.73|:80... connected.
HTTP request sent, awaiting response... 304 Not Modified
File 'hn201602.zip' not modified on server. Omitting download.

x s/
x s/hngang0.abc
x s/hnhall0.abc
x s/hnjp0.abc
x s/hnk1p0.abc
x s/hnl1p0.abc
x s/hnop0.abc
x s/hnsch0.abc
x s/hnsk0.abc
x s/hnsp0.abc
x s/hnvals0.abc
x i/
x i/hnair0.abc
x i/hnbarn0.abc
x i/hncar0.abc
x i/hnhf0.abc
x i/hnhp0.abc
x i/hnhp1.abc
x i/hnj0.abc
x i/hnj1.abc
x i/hnj2.abc
x i/hnj3.abc
x i/hnj4.abc
x i/hnmarch0.abc
x i/hnmaz0.abc
x i/hnp0.abc
x i/hnp1.abc
x i/hnr0.abc
x i/hnr1.abc
x i/hnr2.abc
x i/hnr3.abc
x i/hnr4.abc
x i/hnr5.abc
x i/hnr6.abc
x i/hnr7.abc
x i/hnr8.abc
x i/hnr9.abc
x i/hnset0.abc
x i/hnsj0.abc
x i/hnsl0.abc
x i/hnsong0.abc
x i/hnstr0.abc
x i/hnwaltz0.abc
X:7
T:New Land, The
R:waltz
C:Otis Tomas
S:Nicholas Quemener
H:Originally in F
Z:hn-waltz-7
M:3/4
K:E
B GF|E3F (3GFE|B4 Bc|B3

# Training

We are ready to start training.

In [12]:
!python char-rnn-tensorflow/train.py --model gru --data_dir ./char-rnn-tensorflow/data --save_dir ./char-rnn-tensorflow/save

loading preprocessed files
0/25250 (epoch 0), train_loss = 4.795, time/batch = 0.801
model saved to ./char-rnn-tensorflow/save/model.ckpt
1/25250 (epoch 0), train_loss = 4.767, time/batch = 0.337
2/25250 (epoch 0), train_loss = 4.674, time/batch = 0.343
3/25250 (epoch 0), train_loss = 4.439, time/batch = 0.334
4/25250 (epoch 0), train_loss = 4.044, time/batch = 0.349
5/25250 (epoch 0), train_loss = 3.935, time/batch = 0.331
6/25250 (epoch 0), train_loss = 3.809, time/batch = 0.334
7/25250 (epoch 0), train_loss = 3.794, time/batch = 0.339
8/25250 (epoch 0), train_loss = 3.814, time/batch = 0.341
9/25250 (epoch 0), train_loss = 3.756, time/batch = 0.334
10/25250 (epoch 0), train_loss = 3.639, time/batch = 0.330
11/25250 (epoch 0), train_loss = 3.687, time/batch = 0.341
12/25250 (epoch 0), train_loss = 3.682, time/batch = 0.436
13/25250 (epoch 0), train_loss = 3.681, time/batch = 0.337
14/25250 (epoch 0), train_loss = 3.653, time/batch = 0.419
15/25250 (epoch 0), train_loss = 3.684, time/