# NLP: Character Generation of the Next Chainsmokers Song Using RNN 

## Read the data

In [None]:
# We'll need to import OS, Time, Tensorflow, and Numpy

import os
import time

#Make sure Tensorflow is version 2.0.0 or higher
import tensorflow as tf
import numpy as np

# load the path to the file
path = "chainsmokers.txt"

We want to read in the data and see how many unique characters there are just to get a good idea.

In [22]:
# the encoding will help the file recognize that it's ascii english
file = open(path, "rb").read().decode(encoding='utf-8')
vocab = sorted(set(file))
print("There are " + str(len(vocab)) + " unique characters")

There are 64 unique characters


## Vectorizing the text

The first step before training this NLP program is to parse and map the strings in tables. We will use one table to map characters to numbers and another for numbers to characters.

In [None]:
# create the mapping
ch_to_int = {a:i for i, a in enumerate(vocab)}
    
int_to_ch = np.array(vocab)

text_as_int = np.array([ch_to_int[c] for c in file])

## Create training examples

Now we have to divide the text into "example" sequences (one row of a dataset) that the model will need to predict. Each input sequence will contain 'max_sentence_len' characters from the text.

For instance, if we have a text of "closer" and the sequence length is 4, the input sequence will be "clos", and the target sequence will be the same sequence length except shifted one character to the right: "lose".

Some Useful Terms:

* An Epoch is when the entire dataset is passed through once.

* A sequence is a set of data with a defined/specific order.

In [None]:
max_sentence_len = 125
examples_per_epoch = len(file)//(max_sentence_len+1)

# We call upon Tensorflow to finally create our training examples 
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

# The batch function converts individual characters into sequences of our specified size
sequences = char_dataset.batch(max_sentence_len+1, drop_remainder=True)

In [None]:
# For each sequence, use the the `map` method to quickly apply the function to each batch:

# The split_target method is simply duplicating and shifting the input and target texts
def split_target(text_data):
    input_text = text_data[:-1]
    target_text = text_data[1:]
    return input_text, target_text

dataset = sequences.map(split_target)

In [26]:
# To verify our training predictor is doing its job so far, let's take a look at the 
# first sequence and 10 characters of the file 
for input_example, target_example in  dataset.take(1):
    print ("Input data: ", str(''.join(int_to_ch[input_example.numpy()])))
    print ("Target data: ", str(''.join(int_to_ch[target_example.numpy()])))
    
# Zip helps the enumerate function understand that it's looking at an integer array
for i, (input_idx, target_idx) in enumerate(zip(input_example[:10], target_example[:10])):
    print("Character #" + str(i))
    print("Input char: " + str(input_idx) + str(int_to_ch[input_idx]))
    print("Expected output char: " + str(target_idx) + str(int_to_ch[target_idx]))

Input data:  Hey, I was doing just fine before I met you
I drink too much and that's an issue but I'm okay
Hey, you tell your friends it w
Target data:  ey, I was doing just fine before I met you
I drink too much and that's an issue but I'm okay
Hey, you tell your friends it wa
Character #0
Input char: tf.Tensor(22, shape=(), dtype=int64)H
Expected output char: tf.Tensor(42, shape=(), dtype=int64)e
Character #1
Input char: tf.Tensor(42, shape=(), dtype=int64)e
Expected output char: tf.Tensor(62, shape=(), dtype=int64)y
Character #2
Input char: tf.Tensor(62, shape=(), dtype=int64)y
Expected output char: tf.Tensor(7, shape=(), dtype=int64),
Character #3
Input char: tf.Tensor(7, shape=(), dtype=int64),
Expected output char: tf.Tensor(1, shape=(), dtype=int64) 
Character #4
Input char: tf.Tensor(1, shape=(), dtype=int64) 
Expected output char: tf.Tensor(23, shape=(), dtype=int64)I
Character #5
Input char: tf.Tensor(23, shape=(), dtype=int64)I
Expected output char: tf.Tensor(1, shape=(), dt

## Create training batches

A batch is a set of examples (one row of a dataset) used in one iteration (a single update of a model's weights during training.) of model training. A batch size is the number of examples in a batch.

In [27]:
BATCH_SIZE = 64

# Buffer size is necessary to shuffle the data into batches
BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

dataset

<BatchDataset shapes: ((64, 125), (64, 125)), types: (tf.int64, tf.int64)>

At this point in the program, we will be dealing with Tensorflow's built in functions and terminology, which includes using Keras' API. This will greatly reduce our time in trying to code out the math behind these algorithms. Most importantly, we can focus on the general structure of how we go from deciding what model to use to generating the output.

But first, we need to go over a bit more terminology to understand what's going on.

* Tensor: The primary data structure in TensorFlow programs. Tensors are N-dimensional (where N could be very large) data structures, most commonly scalars, vectors, or matrices. The elements of a Tensor can hold integer, floating-point, or string values.

* Embedding: A categorical feature represented as a continuous-valued feature. In other words, a several-hundred-element tensor in which each element holds a floating-point value between 0 and 1.

## Build the model

We will be using a simple sequential model: 
 * 1 layer of input (tf.keras.layers.Embedding) 
 * 1 hidden (tf.keras.layers.GRU) 
 * 1 output (tf.keras.layers.Dense)

In [None]:
# Length of the vocabulary in chars
vocab_size = len(vocab)

# The embedding dimension
embedding_dim = 256

# Number of RNN units/neurons
rnn_units = 1024

def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
    model = tf.keras.Sequential([
        tf.keras.layers.Embedding(vocab_size, embedding_dim,
                                  batch_input_shape=[batch_size, None]),
        tf.keras.layers.GRU(rnn_units,
                            return_sequences=True,
                            stateful=True,
                            recurrent_initializer='glorot_uniform'),
        tf.keras.layers.Dense(vocab_size)
        ])
    return model

model = build_model(vocab_size = len(vocab), embedding_dim=embedding_dim, rnn_units=rnn_units, batch_size=BATCH_SIZE)

As good coding habit, we need to verify that the model behaves as expected before we decide to spend hours and GPU power to train it.

In [29]:
for input_example_batch, target_example_batch in dataset.take(1):
  example_batch_predictions = model(input_example_batch)
  print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

model.summary()

sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()

print("Input: \n", repr("".join(int_to_ch[input_example_batch[0]])))
print()
print("Next Char Predictions: \n", repr("".join(int_to_ch[sampled_indices ])))

(64, 125, 64) # (batch_size, sequence_length, vocab_size)
Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (64, None, 256)           16384     
_________________________________________________________________
gru_2 (GRU)                  (64, None, 1024)          3938304   
_________________________________________________________________
dense_2 (Dense)              (64, None, 64)            65600     
Total params: 4,020,288
Trainable params: 4,020,288
Non-trainable params: 0
_________________________________________________________________
Input: 
 "ng you up\nCome on and give me some love tonight\nOoh, you're all that I want\nNo good at giving you up\nCome on and give me some"

Next Char Predictions: 
 'AD DYLq-ijd.R)q?8CVRFnM-Lw.LR\nKMJYzs\'pAGh!Uun1KgNEqAddFoA!a"hc8BpGLlvJl-rzBS-2U8Y4TuwYw?ElT\nsR1Lz\n AmyLWKx"!VqP4xUuM\nuDY8nkr\n'


## Optimize and Loss Reduction

* Loss: A measure of how far a model's predictions are from its label - a measure of how bad the model is. To determine this value, a model must define a loss function. For example, linear regression models typically use mean squared error for a loss function, while logistic regression models use Log Loss. tf.keras.losses.sparse_categorical_crossentropy loss function works in this case because it is applied across the last dimension of the predictions. The smaller this number is, the better.

* Adam optimizer: Stands for ADAptive with Momentum. Adam optimization is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments.

* Logits: Vector of raw predictions and probabilities, which will then be used to calculate the normalized probability. 

* Label: The "answer" we're trying to train the model to achieve. 

In [30]:
def loss(labels, logits):
    return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

example_batch_loss  = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("Mean loss:      ", example_batch_loss.numpy().mean())

model.compile(optimizer='adam', loss=loss)

Prediction shape:  (64, 125, 64)  # (batch_size, sequence_length, vocab_size)
Mean loss:       4.158063


## Set up the checkpoints

Checkpoints enable exporting model weights, as well as performing training across multiple sessions. Checkpoints also enable training to continue past errors (for example, job preemption). Note that the graph itself is not included in a checkpoint.

In [None]:
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

## Train the model

Now comes the training piece: we call Tensorflow's fit() function to make our model "fit" which basically adjusts the model parameters to minimize the loss.

This is now just a standard classification problem. Classification models are a type of machine learning model for distinguishing among two or more classes. Given the previous RNN state, and the input this time step, predict the class of the next character.

In [32]:
# Increase the EPOCHS for better results
EPOCHS=80

history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/80
Epoch 2/80
Epoch 3/80
Epoch 4/80
Epoch 5/80
Epoch 6/80
Epoch 7/80
Epoch 8/80
Epoch 9/80
Epoch 10/80
Epoch 11/80
Epoch 12/80
Epoch 13/80
Epoch 14/80
Epoch 15/80
Epoch 16/80
Epoch 17/80
Epoch 18/80
Epoch 19/80
Epoch 20/80
Epoch 21/80
Epoch 22/80
Epoch 23/80
Epoch 24/80
Epoch 25/80
Epoch 26/80
Epoch 27/80
Epoch 28/80
Epoch 29/80
Epoch 30/80
Epoch 31/80
Epoch 32/80
Epoch 33/80
Epoch 34/80
Epoch 35/80
Epoch 36/80
Epoch 37/80
Epoch 38/80
Epoch 39/80
Epoch 40/80
Epoch 41/80
Epoch 42/80
Epoch 43/80
Epoch 44/80
Epoch 45/80
Epoch 46/80
Epoch 47/80
Epoch 48/80
Epoch 49/80
Epoch 50/80
Epoch 51/80
Epoch 52/80
Epoch 53/80
Epoch 54/80
Epoch 55/80
Epoch 56/80
Epoch 57/80
Epoch 58/80
Epoch 59/80
Epoch 60/80
Epoch 61/80
Epoch 62/80
Epoch 63/80
Epoch 64/80
Epoch 65/80
Epoch 66/80
Epoch 67/80
Epoch 68/80
Epoch 69/80
Epoch 70/80
Epoch 71/80
Epoch 72/80
Epoch 73/80
Epoch 74/80
Epoch 75/80
Epoch 76/80
Epoch 77/80
Epoch 78/80
Epoch 79/80
Epoch 80/80


## Restore last checkpoint

Because of the way the RNN state is passed from timestep to timestep, the model only accepts a fixed batch size once built. We use a size of 1 to keep it fast and simple. Then we need to rebuild the model and restore the weights from the checkpoint.

In [None]:
tf.train.latest_checkpoint(checkpoint_dir)

model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)

model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))

model.build(tf.TensorShape([1, None]))

The following code block generates the text:

1) We choose the starting string, initializing the RNN state and setting the number of characters to generate.

2) Get the prediction distribution of the next character using the start string and the RNN state.

3) Then, use a categorical distribution to calculate the index of the predicted character. Use this predicted character as our next input to the model.

4) The RNN state returned by the model is fed back into the model so that it now has more context, instead than only one character. After predicting the next character, the modified RNN states are again fed back into the model, which is how it learns as it gets more context from the previously predicted characters.

## Loop of text generation

In [None]:
def generate_text(model, start_string):

    # Number of characters final song will be. For a standard 280ish word song, that'll be around 1700 characters
    # Experiment around here
    num_generate = 1700

    # Vectorize our start string
    input_eval = [ch_to_int[s] for s in start_string]
    input_eval = tf.expand_dims(input_eval, 0)

    # String to store results
    text_generated = []

    # Low temperature = predictable text.
    # Higher temperature = more surprising text.
    temperature = 0.33

    # Here batch size == 1
    model.reset_states()
    for i in range(num_generate):
        predictions = model(input_eval)
        # remove the batch dimension
        predictions = tf.squeeze(predictions, 0)

        # using a categorical distribution to predict the character returned by the model
        predictions = predictions / temperature
        predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

        # We pass the predicted character as the next input to the model
        # along with the previous hidden state
        input_eval = tf.expand_dims([predicted_id], 0)

        text_generated.append(int_to_ch[predicted_id])

    return (start_string + ''.join(text_generated))

## Create the next banger

In [38]:
#Add your first lyrics by replacing the string with whatver you want
print(generate_text(model, start_string="I punched your dad in my Rover "))

I punched your dad in my Rover (who?) Bof me I'm like anybody else
In the town full of fancy cars and crowded bars and supermodels
Looks exactly the way it did inside my head
When I dreamed about it
All the things I could live without
I need it now 'cause they're all around me
Only thing that I can't afford is to lose myself
Tryna be somebody, somebody
Somebody, somebody
(You know, just know what I like)
Somebody
You should just this is all we know
'Cause this is all we know
I'll break your heart so you don't break mine
Before I love you (nah, nah, nah)
I'm gonna leave you (nah, nah, nah)
I'm gonna leave you (nah, nah, nah)
I'm gonna leave you (nah, nah, nah)
Even if I'm not here to stay
I still want your heart
Your heart for takeaway, yeah, yeah, yeah yeah
Your heart for takeaway, yeah, yeah, yeah yeah
Your heart for takeaway, yeah, yeah, yeah yeah
Your heart for takeaway, yeah, yeah, yeah yeah
Your heart for takeaway, yeah, yeah, yeah yeah
Your heart for takeaway, yeah, yeah, yeah ye

The easiest thing you can do to improve the results it to train it for longer, so increase the EPOCHS variable. If you can get the loss to below 0.5, you're sitting pretty.

You can also experiment with a different start string, or try adding another RNN layer to improve the model's accuracy, or adjusting the temperature parameter to generate more or less random predictions.