# Composing Through Lyrics

The purpose of this project is to learn the relation between the lyrics of a song and its music. It is certain that different kind of lyrics require some different kind of music that indicate the connotation behind the used words. From a technical point of view, we have two sequences that we are trying to relate:

1) The first sequence is a list of words of the lyrics of the song.

2) The second sequence is a list of notes/chords that compose the music of the song.

With that being said, we built an encoder/decoder seq2seq model with GRU recurrent units to try to capture the relationship between the lyrics and the music.

In order to aquire some training data we needed to get creative. We thought of gathering midi files of songs that already contain the lyrics and use these as training data. Although the idea is valid, but gathering the data was a huge challenge towards this project. Most of the midi files on the internet is either expensive to get (it is the same kind of files that is used for kareoke), or the files are assumed that they contain the lyrics while in reality they do not. Luckily we were able to find http://www.olgris.kiev.ua/des/midi%20lat.html, which was our main source of data. The website contains 230 midi files with all different genres of music.

## Background Knowledge
In order to be able to work with the midi files, we needed to know what a midi file is in the first place in order to look in the right place for the data required. Without going into much detail, midi files are byte-encoded files that include all different information about a piece of music. The most important building units of a midi file are the tracks. A midi file includes a track for every instrument played in that song. However there is one special track, usually that of the piano, that includes also the lyrics. Each track consists of several events of different types. Each event contains then its own metadata that is used when playing the file. There are two important types of events in every track:

1) The "NOTE" event which is translated to a certain sound, and
    
2) The "LYRIC" event that contains the word or words at a specific position.
    
Events in a track are sequential data and are thus played in certain order. This is why the piano track is considered for this project. All notes/chords sequence associated with a sentence is extracted and treated as a training sample.

As mentioned above, midi files are byte-encoded files, and it would have been quite a hassle to work with them as raw files. This is why we used the music21 library developed at MIT https://web.mit.edu/music21/. Before getting started, please follow the installation instructions stated below.

## Installation steps
Please install all packages listed in requirements.txt by running "pip install -r requirements.txt"
In addition to that, spaCy was used for text preprocessing. This is why a spaCy model needs to be downloaded with the command "python -m spacy download en". Ultimately, the package timidity needs to be installed in order to play music in this notebook. Install timidity via the command "sudo apt-get install timidity timidity-interfaces-extra"

## A view on the data
Thanks to music21 and timidity, we can play midi files from a notebook. Here is an example

In [None]:
import numpy as np
import tensorflow as tf
from preprocess import *
file = "test_midi/14_Years.mid"
play_midi(file)

The lyrics of the song is included also in the midi file. If we were to play it in a terminal using "timidity -filename-", we would get the lyrics as in a kareoke game. Because the piano track is the only track of interest to us, we need to look at the events in order to know the type of the data we are dealing with. Here are 10 events from that track: 

In [None]:
m = midi.MidiFile()
m.open(file)
m.read()
for track in m.tracks:
    lyrics = [ev.data for ev in track.events if ev.type=="LYRIC"]
    temp_stream = midi.translate.midiTrackToStream(track)
    notes = get_notes_from_stream(temp_stream)
    if len(lyrics) > 0:
        break
print(track.events[10:20])

From these events, we came up with a string representation that would allow us to convert the strings back to notes/chords. The string representation of 20 notes/chords from this track are shown below. As we can see, we convert notes by mapping them to their pitches. As for chords, a string of the normal order representation of each note in the chord separated by a dot is recorded. For the detailed operation, please refer to the "get_notes_from_stream" in preprocess.py

In [None]:
print(notes[110:130])

At this stage, the data is ready and can be fed to the network.

Contains special words for the dictionary

In [None]:
special_words = ["<PAD>", "<GO>", "<END>"]

## Class For Dictionary
The class contains the data and the helper functions for the dictionary to train seq2seq model.
Most important functions are to mapping a list of words(a sentence) to a list of corresponding integer(indeces of words). 

In [None]:
class Seq2Seq_Dictionary:
    def __init__(self, sentences ):
        self.word2index_map = dict()
        self.index2word_map = dict()
        self.vocab_size = 0
        self.init_register(sentences)
        
    # Initiates word2index_map and index2word_map
    # Also extracts the max number of words in the sentences and saves it 
    def init_register(self,sentences):
        global special_words
        current_index = 0
        ## save the maximum length among the sentences. 
        self.max_length = max([len(sentence) for sentence in sentences])
        ### map special words, initially the mappings are empty.
        for word in special_words:
            self.word2index_map[word] = current_index
            self.index2word_map[current_index] = word
            current_index+=1
        
        s = set([item for sublist in sentences for item in sublist])
        self.word2index_map.update({e:i+current_index for i,e in enumerate(s)})
        self.index2word_map.update({v:k for k,v in self.word2index_map.items()})
        self.vocab_size = len(self.index2word_map)
    
    ## Returns the index of the word in the dictionary. It is assumed that the word
    ## will be always in dictionary.
    def get_index(self, word):
        return self.word2index_map[word]
    
    ## Maps a sentence, which is a list of words, to the corresponding list of integers.
    ## Each word is looked up from the map of the dictionary, and as in get_index method,
    ## it is assumed that the word will always be found in the dictionary
    def map_sentence(self, sentence):
        return [self.get_index(i) if i in self.word2index_map else 0 for i in sentence]
    
    ## Returns the word by its index in dictionary.
    def get_word(self, index):
        return self.index2word_map[word]
    
    ## Pads the list of words to <PAD> at the end of the list of words in sentence
    def pad_sentence(self, sentence):
        return  sentence + ["<PAD>"] * (self.max_length - len(sentence)+2)
    
    ## Adds <GO> and <END> to the start and end of the sentence
    def add_start_end_tokens(self, sentence):
        return ["<GO>"] + sentence + ["<END>"]
    
    ## Transforms the sentence in a format suitable for the Neural Network
    def transform_sentence(self, sentence):
        s = self.add_start_end_tokens(sentence)
        s = self.pad_sentence(s)
        s = self.map_sentence(s)
        return s
    
    def reverse_transform(self, sentence):
        return [self.index2word_map[s] for s in sentence if (not self.index2word_map[s] in special_words and s in self.index2word_map)]
    
    ### Reverse operation of map_sentence. Gets the sentence, list of integers, as input and
    ### maps each entry from index to word
    #def decode_indeces(self,sentence_with_index):
    #    sentence_list = list()

#### Create the dictionary

In [None]:
data, targets = get_data_from_dir("test_midi_small/")

In [None]:
data_dict = Seq2Seq_Dictionary(data)
target_dict = Seq2Seq_Dictionary(targets)
max_sentence_len = data_dict.max_length
max_notes_len = target_dict.max_length

In [None]:
## Transform the data and the targets
transformed_data = [data_dict.transform_sentence(i) for i in data]
transformed_targets = [target_dict.transform_sentence(i) for i in targets]
print(data_dict.max_length)
print(data[10])
print(transformed_data[10])
## Sanity check for the lengths of the data and the targets
# assert np.mean([len(x) for x in transformed_data])==data_dict.max_length
# assert np.mean([len(x) for x in transformed_targets])==target_dict.max_length
# assert len(transformed_data) == len(transformed_targets)

len(data_dict.index2word_map)

In [None]:
print(f"Size of data vocabulary: {data_dict.vocab_size}")
print(f"Size of targets vocabulary: {target_dict.vocab_size}")

print(f"Max. Length of the data: {data_dict.max_length}")
print(f"Max. Length of the target: {target_dict.max_length}")
print(f"Sample data: {data[10]}")
print(f"Corresponding targets: {targets[2]}")
print(f"Sample transformed data: {transformed_data[10]}")
print(f"Corresponding transformed targets: {transformed_targets[2]}")

## Model Creation
Tensorflow initialization

In [None]:
# sess.close()
tf.reset_default_graph()
sess = tf.InteractiveSession()


### General Model Variables 

In [None]:
embedding_size= 256
hidden_units = 128
keep_prob=0.5 # Dropout parameter
batch_size = 32
sentence_vocab_size = len(data_dict.index2word_map)
notes_vocab_size = len(target_dict.index2word_map)
learning_rate = 5e-4

#### Encoder Inputs

In [None]:
_encoder_inputs = tf.placeholder(shape=(batch_size, max_sentence_len+2),
                                 dtype=tf.int32, name='encoder_inputs')
_encoder_seq_len = tf.placeholder(shape=(batch_size),
                                 dtype=tf.int32, name='encoder_seq_lens')
_is_training = tf.placeholder(tf.bool,name="training_or_test")
_target_notes = tf.placeholder(shape=(batch_size, max_notes_len+2) , 
                               dtype=tf.int32, name='target_notes')

Encoder part is created here. In the architecture, a bidirectional GRU cell is used after the embedding.

In [None]:
# _encoder_inputs = tf.placeholder(shape=(batch_size, max_sentence_len),
#                                  dtype=tf.int32, name='encoder_inputs')
# _encoder_seq_len = tf.placeholder(shape=(batch_size),
#                                  dtype=tf.int32, name='encoder_seq_lens')
# _is_training = tf.placeholder(tf.bool,name="training_or_test")
# _target_notes = tf.placeholder(shape=(batch_size, max_notes_len) , 
#                                dtype=tf.int32, name='target_notes')
### remove before here
with tf.variable_scope("encoder") as encoder_sc:
    ## embeddings
    enc_embed_var = tf.Variable(
        tf.random_uniform([sentence_vocab_size,
                           embedding_size],
                          -1.0, 1.0), name='embedding')
    
    enc_embed = tf.nn.embedding_lookup(enc_embed_var, _encoder_inputs)
    
    # Forward direction cell
    enc_gru_fw = tf.nn.rnn_cell.GRUCell(hidden_units)
    # Backward direction cell
    enc_gru_bw = tf.nn.rnn_cell.GRUCell(hidden_units)
    
    enc_dropout_fw = tf.contrib.rnn.DropoutWrapper(enc_gru_fw, input_keep_prob=keep_prob,
                                                   output_keep_prob=keep_prob)

    enc_dropout_bw = tf.contrib.rnn.DropoutWrapper(enc_gru_bw, input_keep_prob=keep_prob,
                                                   output_keep_prob=keep_prob)

    
    ## here the state variable contains only the last state information of the cells
    enc_rnn_outputs,enc_rnn_state=tf.nn.bidirectional_dynamic_rnn(enc_dropout_fw,
                                                          enc_dropout_bw, 
                                                          enc_embed,
                                                          sequence_length=_encoder_seq_len,
                                                          dtype=tf.float32)
    ## Get forward and backward last states and outputs of the GRU
    enc_rnn_outputs_fw,enc_rnn_outputs_bw  = enc_rnn_outputs
    enc_rnn_fw_state,enc_rnn_bw_state  = enc_rnn_state
    
    ## concat states and outputs
    _enc_last_state = tf.concat((enc_rnn_bw_state, enc_rnn_fw_state),1)
    _enc_output = tf.concat((enc_rnn_outputs_bw,enc_rnn_outputs_fw),2)

In [None]:
print(_enc_last_state.get_shape())
print(_enc_output.get_shape())


Decoder part is created here. Because bidirectional GRU  is used in the encoder part the state vector is twice size of an GRU cell with same number of hidden units. So, after concatanating the last states of GRUs, here the hidden units of GRU should be doubled.

### Decoder  With While Loop
We are using a while loop structure because each resulting hidden state of the GRU in the decoder, will be an input to the network to calculate scores of the next word in the same sentence.

Following is the condition for the while loop. From the first word at each sentence, iteration should go until the last word.

In [None]:

def decoder_condition(t, *args):
    return t<max_notes_len+1

Decoder as a function to be called from the body of the while_loop. Note that, in order to reuse the network after each word, first we need to initialize it then set the reuse to True.

In [None]:
#tf.reset_default_graph()
#_enc_last_state = tf.placeholder(shape=(batch_size, 2*hidden_units),
#                                 dtype=tf.float32, name='decoder_input_enc_last_state')
#_enc_output = tf.placeholder(shape=(batch_size,max_seq_length ,2*hidden_units),
#                                 dtype=tf.float32, name='decoder_input_enc_last_state')
#_decoder_inputs = tf.placeholder(shape=(batch_size),
#                                 dtype=tf.int32, name='decoder_inputs')

def decoder(_decoder_inputs,_hidden_state,reuse=None):
    with tf.variable_scope("decoder",reuse=reuse) as decoder_sc:
        ## Luong's multiplicative score --> score = _hidden_state.T * W * _enc_output

        ### First the W*_enc_output part is handled. It is straightforward with a dense layer, 
        ### and its output size should be hidden_size*2, because we have a bidirectional rnn 
        ### in the encoder. Output shape should be (batch_size, max_len, 2*hidden_size)
        ### because later it will be multiplied with (batch_size,2*hidden_size) (which could be thought
        ### as batch_size, 2*hidden_size, 1) to get the score.
        w_times_enc_output = tf.layers.dense(_enc_output, hidden_units*2)
        print("shape of w_times_enc_output:",w_times_enc_output.get_shape())

        ### First hidden state is taken from the encoder's GRUs last hidden state. So the
        ### shape of it is (batch_size, 2*hidden_size). For each input sentence, there is one
        ### hidden state.
        ### _hidden_state's size is (batch_size, 2*hidden_size) one can think of it as 
        ### (batch_size, 1,2*hidden_size). Semantically, there is only one hidden state vector
        ### for each batch item(iteration).To transpose it, as the formula of Luong's suggests,
        ### we can just expand (batch_size, 2*hidden_size) to (batch_size, 2*hidden_size,1), 
        ### expanding in the 2.nd dimension.
        hidden_state_tr = tf.expand_dims(_hidden_state,2)
        print("shape of enc_last_state_tr:",hidden_state_tr.get_shape())

        ### w_times_enc_output = (batch_size, max_len, 2*hidden_size)
        ### enc_last_state_tr = (batch_size, 2*hidden_size,1)
        ### resulting score = (batch_size, max_len,1)
        score =  tf.matmul(w_times_enc_output,hidden_state_tr)
        print("shape of score:",score.get_shape())

        ### Now the shape of score (batch_size, max_len,1). We have a score for each of the 
        ### input word in a bacth. To normalize it, now they are put in a softmax, and 
        ### the normalization should be within a batch, so the axis to apply softmax is
        ### 1.st one, since 0 is used for batches.
        ### Attention weights(attention_w) has same shape with score, which is (batch_size, max_len,1)
        attention_w = tf.nn.softmax(score,1)

        ### attention_w (batch_size, max_len,1),   _enc_output (batch_size, max_len,2*hidden_size).
        ### Multiplication operator supports broadcasting, so that this multiplication does not produce
        ### an error. attention_w is broadcasted to be multiplied with each hidden unit of _enc_output,.
        ### which means multiplying each output of the hidden units with the attention weight of the
        ### associated word.
        ### Resulting context_vec is in shape of (batch_size, max_len, 2*hidden_size)
        context_vec = attention_w * _enc_output

        ### To create a context vector for each sentence in the batch, now we are summing
        ### up along the dimension of the max_len(along words in a sentence) 
        ### so that we are left with size (batch_size, 2*hidden_size).
        context_vec = tf.reduce_sum(context_vec, axis=1)
        print("shape of context_vec:",context_vec.get_shape())

        ### Input to the decoder is also put through a embedding layer, since they are
        ### target sentences.
        embed_var = lambda: tf.random_uniform([notes_vocab_size,embedding_size],-1.0, 1.0)
        dec_embed_var = tf.Variable(embed_var ,name='decoder_embedding')

        ### Size of the embedded input-> (batch_size, 1, embedding_size)
        dec_embed = tf.nn.embedding_lookup(dec_embed_var, tf.expand_dims(_decoder_inputs,1))

        print("shape of the decoder embedding:",dec_embed.get_shape())

        ### To make the 1.st dimension matching with the embedded input, now the context vector 
        ### is expanded in the 1.st dimension. resulting size is (batch_size, 1, 2*hidden_size)
        context_vec = tf.expand_dims(context_vec, 1)

        ### Concatanate along the second dimension, so the resulting size is
        ### (batch_size, 1, 2*hidden_size + hidden_dim)
        dec_before_gru = tf.concat([context_vec, dec_embed], axis=2)

        ### Since we will be feeding the decoder one input at a time, the sequence length
        ### would be either 0 or 1 depending on the current input of each sentence.
        ### So if the current input is not <PAD>, then the seq len is 1, if it is <PAD> then 
        ### it is just a padding, the seq len is 0.
        all_pads = [target_dict.get_index("<PAD>")]*batch_size
        ones = np.ones((batch_size))
        zeros = np.zeros((batch_size))
        dec_seq_len = tf.cast(tf.where(_decoder_inputs == all_pads, zeros, ones),
                              dtype=tf.float32)

        ### Now the input is ready for the GRU.
        dec_gru = tf.nn.rnn_cell.GRUCell(2*hidden_units)

        dec_dropout = tf.contrib.rnn.DropoutWrapper(dec_gru, input_keep_prob=keep_prob,
                                                       output_keep_prob=keep_prob)

        ### dec_rnn_outputs has shape (batch_size, 1, 2*hidden_size)
        ### dec_rnn_state has shape (batch_size, 2*hidden_size)
        dec_rnn_outputs,dec_rnn_state=tf.nn.dynamic_rnn(cell=dec_dropout, inputs=dec_before_gru, 
                                                        initial_state=_hidden_state,
                                                        sequence_length=dec_seq_len)
        ### To make predictions based on the output of the rnn, now we are reshaping the 
        ### the output to the shape of (batch_size, 2*hidden_size)
        dec_rnn_outputs = tf.squeeze(dec_rnn_outputs)

        ### predictions has the shape of (batch_size, notes_vocab_size). This means we are predicting
        ### only the next word for each sentence. For each sentence, there is a vector of
        ### shape notes_vocab_size which contains the likelihood of the corresponding vocabulary
        ### element for the next word in the sentence.
        preds = tf.layers.dense(dec_rnn_outputs, notes_vocab_size)
    
    return preds, dec_rnn_state,dec_seq_len  #,dec_embed_var.read_value()

In [None]:
#with tf.variable_scope("pred_layer") as pred_layer_sc:  

### Prepare the decoder inputs. 
To reduce the memory usage during the backpropagation, we are putting the each word in each sentences in a batch into a TensorArray.

The size of the decoder inputs is the (batch_size, max_notes_len), aproximately the target notes of the current input batch. First we are transposing it to word major order, so that the [i,] indexing will return the i.th note of the all note sequences in the batch. Then we are going to unstack it, hence we will get a tensorarray of size max_notes_len, each item containing batch_size notes.

## Run Decoder

General variables to run the decoder loop. 

Iteration starts from 1, because we are going to call the decoder to initialize it at the beginning. 

init_outputs stores the output predictions of each iteration.

In [None]:
init_i = tf.constant(1, dtype=tf.int32)
init_outputs = tf.TensorArray(dtype=tf.float32,size=max_notes_len+2,clear_after_read=False)
init_seq_len = tf.TensorArray(dtype=tf.float32,size=max_notes_len+2,clear_after_read=False)
#init_embed_vals = tf.random_uniform([vocab_size,embedding_size],-1.0, 1.0)

Initilize the decoder to be able to "reuse" it, note that reuse is None as default. Initial hidden state is from the encoder and first decoder input is the 0.th element of the decoder_input_arr. 

In [None]:
init_preds,init_hidden_state,temp_seq_len = decoder(
                                        [target_dict.get_index("<GO>")]*(batch_size), 
                                        _enc_last_state)
                                       #init_embed_vals)
init_outputs = init_outputs.write(0, init_preds)
init_seq_len = init_seq_len.write(0, temp_seq_len)

Run the decoder with the while loop. If we are using the teacher forcing, we need to use this function. But in the test time, we cannot use it, so we should be using  

In [None]:
def decoder_body_teacher_forcing(iteration,outputs,body_hidden_state,seq_len,decoder_input_arr):
    temp_preds,temp_hid_state,temp_seq_len = decoder(
                                    _decoder_inputs=decoder_input_arr.read(iteration), 
                                    _hidden_state=body_hidden_state,
                                    #_embedding_var = embed_val,
                                    reuse=True)
    outputs = outputs.write(iteration, temp_preds)
    seq_len = seq_len.write(iteration, temp_seq_len)
    return iteration+1, outputs, temp_hid_state,seq_len,decoder_input_arr ##,temp_embed_vals

In [None]:
def decoder_body_test(iteration,outputs,body_hidden_state,seq_len,fake_decoder_input):
    print("before argmax")
    print(tf.argmax(outputs.read(iteration-1), axis=1).get_shape())
    print("after argmax")
    temp_preds,temp_hid_state,temp_seq_len = decoder(
                                    _decoder_inputs=tf.argmax(outputs.read(iteration-1), axis=1), 
                                    _hidden_state=body_hidden_state,
                                    #_embedding_var = embed_val,
                                    reuse=True)
    outputs = outputs.write(iteration, temp_preds)
    seq_len = seq_len.write(iteration, temp_seq_len)
    return iteration+1, outputs, temp_hid_state,seq_len,fake_decoder_input       ##,temp_embed_vals

Finally run the while loop. We dont need the latest hidden state and the iteration count, the only thing needed is the predictions of the decoder.

In [None]:
def decoder_while_teacher_f():
    decoder_inputs_tr = tf.transpose(_target_notes)
    decoder_input_arr = tf.TensorArray(dtype=tf.int32, size=max_notes_len+2,clear_after_read=False)
    decoder_input_arr = decoder_input_arr.unstack(decoder_inputs_tr)
    return tf.while_loop(decoder_condition, decoder_body_teacher_forcing, 
                    [init_i, init_outputs, init_hidden_state,init_seq_len,decoder_input_arr])
def decoder_while_test():
    return tf.while_loop(decoder_condition, decoder_body_test, 
                                [init_i, init_outputs, init_hidden_state,init_seq_len,1.0])
_, dec_preds, _ ,seq_len,_= tf.cond(pred=_is_training, 
                                  true_fn=decoder_while_teacher_f,
                                  false_fn=decoder_while_test
                                 )

Now, dec_preds contains the predictions for each word. So each item in the tensorarray contains (batch_size, vocab_size) shaped tensors. 

Now we are going to create a one tensor whose shape is (batch, max_len,vocab_size). To achieve it, we are going to stack the dec_preds, which results in (max_notes_len, batch_size, vocab_size). Then transpose to get (batch_size,max_notes_len,vocab_size). And the same is done for the seq_len, except the we want seq_len as shape of (batch_size, max_notes_len). 

The all_seq_len variable will be used as a mask to the input. Because if it is 0, then the word is a <PAD>. 

In [None]:
preds = tf.transpose(dec_preds.stack(), [1,0,2])
all_seq_len = tf.to_float(tf.transpose(seq_len.stack()))

### Optimizer and the Loss function

Define the mask for the backpropagation. If the target is <PAD> then don't backpropagate. To do it, first get all <PAD> strings as 1, and then subtract from 1 to make them all 0 s.

In [None]:
### If the input word is <PAD>, then there is no need for optimization for that input.
#target_one_hot = tf.one_hot(indices=_target_notes, depth=notes_vocab_size)
print(preds.shape, _target_notes.shape)
cross_ent = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=preds, labels=_target_notes) * all_seq_len

### mean of the cross entropy is the loss of this batch
loss = tf.reduce_mean(cross_ent)

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_step = optimizer.minimize(cross_ent)

# sess.run(tf.global_variables_initializer())

## Run the code Session

In [None]:
from preprocess import *
def generate_batches(x, y, batch_size):
    assert len(x) == len(y)
    shuffeled_pairs = list(zip(x,y))
    np.random.shuffle(shuffeled_pairs)
    x = [i[0] for i in shuffeled_pairs]
    y = [i[1] for i in shuffeled_pairs]
    res = []
    for i in range(0, len(x), batch_size):
        x_batch = x[i:min(i + batch_size, len(x))] 
        y_batch = y[i:min(i + batch_size, len(y))]
        if len(x_batch) % batch_size == 0:
            res.append({
                _encoder_inputs: x_batch,
                _encoder_seq_len: [sum(i > 2 for i in seq) for seq in x_batch],
                _target_notes: y_batch,
                _is_training:True
            })
    return res

sess.run(tf.global_variables_initializer())
epochs = 10
loss_track = []
evaluate_every = 8
for epoch in range(epochs):
    batches = generate_batches(transformed_data, transformed_targets, batch_size)
    if epoch == 0:
        print(f"Number of batches: {len(batches)}")
    print(f"Epoch {epoch+1}/{epochs}")
    for e, batch in enumerate(batches) :
        feed_dict = batch
        _, l = sess.run([train_step, loss], feed_dict)
        loss_track.append(l)
    print(f"Last batch loss: {l}")



In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(loss_track)

In [None]:
from preprocess import *
def compose(lyrics):
    lyrics_list, mask = prepare_prediction(lyrics, batch_size, data_dict)
    pred_ = sess.run([preds],
        feed_dict={
            _encoder_inputs: lyrics_list,
            _encoder_seq_len: [sum(i > 2 for i in seq) for seq in lyrics_list],
            _is_training:False,
            _target_notes:np.ones((batch_size,target_dict.max_length+2))
        })
    pred = np.argmax(pred_[0], axis=2)
    pred = [target_dict.reverse_transform(i) for i in pred]
    return pred, mask
    
lyrics = ["sad","happy","i whine like a baby","i want to love, laugh and smile"]

pred, mask = compose(lyrics)
for e, p in enumerate(pred):
    if e < batch_size-mask:
        print(lyrics[e])
        fname = f"sample_{lyrics[e]}"
        notes_to_midi(p, fname)
        play_midi(f"output/{fname}.mid")

## Conclusion and Future Work