# Eminem's Rap Generation Using RNN

RNN is one of the very powerful deep-learning algorithm which works amazingly well on Sequential Data. As historical or past data plays major role in the prediction of sequential data, RNN takes these inputs of not only recent output but also past output.

## What is RNN?
Recurrent Neural Network is a generalization of feedforward neural network that has an internal memory. RNN is recurrent in nature as it performs the same function for every input of data while the output of the current input depends on the past one computation. After producing the output, it is copied and sent back into the recurrent network. For making a decision, it considers the current input and the output that it has learned from the previous input.

## What's the Data?

I simply copied the eminem's Rap God Lyrics and pasted in eminem.txt. It has only 7855 characters which is very few. You need minimum of million data for realistic predictions. However, this is just a test.

## Importing Libraries and Data 

In [29]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [30]:
import tensorflow as tf

In [31]:
path_to_file = "eminem.txt"

In [32]:
text = open(path_to_file, 'r').read()

In [33]:
#print(text[7854:])

In [34]:
vocab = sorted(set(text))

In [35]:
vocab

['\n',
 ' ',
 '!',
 '"',
 "'",
 '(',
 ')',
 ',',
 '-',
 '.',
 '/',
 '0',
 '2',
 '4',
 '7',
 '?',
 'A',
 'B',
 'C',
 'D',
 'E',
 'F',
 'G',
 'H',
 'I',
 'J',
 'K',
 'L',
 'M',
 'N',
 'O',
 'P',
 'R',
 'S',
 'T',
 'U',
 'W',
 'Y',
 'Z',
 'a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'r',
 's',
 't',
 'u',
 'v',
 'w',
 'x',
 'y',
 'z']

In [36]:
len(vocab)

64

## Text Preprocessing 

1. Vectorizing
2. Econding

In [38]:
# for pair in enumerate(vocab):
#     print(pair)

In [39]:
char_to_ind = {char:ind for ind,char in enumerate(vocab)}

In [41]:
char_to_ind['!']

2

In [42]:
ind_to_char = np.array(vocab)

In [45]:
ind_to_char[2]

'!'

In [52]:
#we iterate in whole text and assigned the number for each 
#character using from char_to_ind 

encoded_text = np.array([char_to_ind[c] for c in text])

In [54]:
sample = text[:50]
sample

'Look, I was gonna go easy on you and not to hurt y'

In [56]:
encoded_text[:50]

array([27, 53, 53, 49,  7,  1, 24,  1, 60, 39, 56,  1, 45, 53, 52, 52, 39,
        1, 45, 53,  1, 43, 39, 56, 62,  1, 53, 52,  1, 62, 53, 58,  1, 39,
       52, 42,  1, 52, 53, 57,  1, 57, 53,  1, 46, 58, 55, 57,  1, 62])

## Creating Batches

1. Text Sequences
2. Shuddle and Generate Batches using TF

In [58]:
print(text[:500])

Look, I was gonna go easy on you and not to hurt your feelings
But I'm only going to get this one chance
Something's wrong, I can feel it (Six minutes, Slim Shady, you're on)
Just a feeling I've got, like something's about to happen, but I don't know what
If that means, what I think it means, we're in trouble, big trouble,
And if he is as bananas as you say, I'm not taking any chances
You were just what the doctor ordered
I'm beginning to feel like a Rap God, Rap God
All my people from the front


In [60]:
line = "Look, I was gonna go easy on you and not to hurt your feelings"
len(line)

62

In [61]:
lines = """
Look, I was gonna go easy on you and not to hurt your feelings
But I'm only going to get this one chance
Something's wrong, I can feel it (Six minutes, Slim Shady, you're on)
Just a feeling I've got, like something's about to happen, but I don't know what
"""

In [62]:
print(lines)


Look, I was gonna go easy on you and not to hurt your feelings
But I'm only going to get this one chance
Something's wrong, I can feel it (Six minutes, Slim Shady, you're on)
Just a feeling I've got, like something's about to happen, but I don't know what



In [63]:
len(lines)

257

In [68]:
seq_len = 150

In [69]:
total_num_seq = len(text)//(seq_len+1)

In [70]:
total_num_seq

1404

In [74]:
#create training sequences
char_dataset = tf.data.Dataset.from_tensor_slices(encoded_text)

In [75]:
type(char_dataset)

tensorflow.python.data.ops.dataset_ops.TensorSliceDataset

In [79]:
#converts into sequence we can feed in as batch
# for item in char_dataset.take(500):
#     print(ind_to_char[item.numpy()])

In [80]:
sequences = char_dataset.batch(seq_len+1, drop_remainder= True)

In [82]:
def create_seq_targets(seq):
    input_txt = seq[:-1] #look i am gonn
    target_txt = seq[1:] #ook i am gonna
    return input_txt, target_txt

In [83]:
dataset = sequences.map(create_seq_targets)

In [88]:
for input_txt, target_txt in dataset.take(1):
    print(input_txt.numpy())
    print("".join(ind_to_char[input_txt.numpy()]))
    print('\n')
    print(target_txt.numpy())
    print("".join(ind_to_char[target_txt.numpy()]))

[27 53 53 49  7  1 24  1 60 39 56  1 45 53 52 52 39  1 45 53  1 43 39 56
 62  1 53 52  1 62 53 58  1 39 52 42  1 52 53 57  1 57 53  1 46 58 55 57
  1 62 53 58 55  1 44 43 43 50 47 52 45 56  0 17 58 57  1 24  4 51  1 53
 52 50 62  1 45 53 47 52 45  1 57 53  1 45 43 57  1 57 46 47 56  1 53 52
 43  1 41 46 39 52 41 43  0 33 53 51 43 57 46 47 52 45  4 56  1 60 55 53
 52 45  7  1 24  1 41 39 52  1 44 43 43 50  1 47 57  1  5 33 47 61  1 51
 47 52 58 57 43 56]
Look, I was gonna go easy on you and not to hurt your feelings
But I'm only going to get this one chance
Something's wrong, I can feel it (Six minutes


[53 53 49  7  1 24  1 60 39 56  1 45 53 52 52 39  1 45 53  1 43 39 56 62
  1 53 52  1 62 53 58  1 39 52 42  1 52 53 57  1 57 53  1 46 58 55 57  1
 62 53 58 55  1 44 43 43 50 47 52 45 56  0 17 58 57  1 24  4 51  1 53 52
 50 62  1 45 53 47 52 45  1 57 53  1 45 43 57  1 57 46 47 56  1 53 52 43
  1 41 46 39 52 41 43  0 33 53 51 43 57 46 47 52 45  4 56  1 60 55 53 52
 45  7  1 24  1 41 39 52

In [94]:
batch_size = 150

In [95]:
#Model should be able to learn from anywhere
#so let's suffle this

In [98]:
buffer_size = 7500

dataset = dataset.shuffle(buffer_size).batch(batch_size, drop_remainder=True)

In [102]:
dataset

<BatchDataset shapes: ((150, 150), (150, 150)), types: (tf.int64, tf.int64)>

## Creating a Model

    Embedding
    GRU
    Dense

In [103]:
vocab_size = len(vocab)

In [104]:
vocab_size

64

In [105]:
embed_dim = 64

In [106]:
rnn_neurons = 756 #random

In [109]:
from tensorflow.keras.losses import sparse_categorical_crossentropy

In [110]:
help(sparse_categorical_crossentropy)

Help on function sparse_categorical_crossentropy in module tensorflow.python.keras.losses:

sparse_categorical_crossentropy(y_true, y_pred, from_logits=False, axis=-1)



In [111]:
def sparse_cat_loss(y_true, y_pred):
    return sparse_categorical_crossentropy(y_true, y_pred, from_logits=True)

In [112]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, GRU, Dense

In [116]:
def create_model(vocab_size, embed_dim, rnn_neurons, batch_size):
    
    model = Sequential()
    model.add(Embedding(vocab_size, embed_dim, batch_input_shape=[batch_size, None]))
    model.add(GRU(rnn_neurons,return_sequences=True, stateful=True, recurrent_initializer='glorot_uniform'))
    model.add(Dense(vocab_size))
    
    model.compile('adam',loss=sparse_cat_loss)
    
    return model

In [117]:
model = create_model(vocab_size= vocab_size,
                   embed_dim=embed_dim,
                   rnn_neurons= rnn_neurons,
                   batch_size=batch_size)

In [119]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (150, None, 64)           4096      
_________________________________________________________________
gru (GRU)                    (150, None, 756)          1864296   
_________________________________________________________________
dense (Dense)                (150, None, 64)           48448     
Total params: 1,916,840
Trainable params: 1,916,840
Non-trainable params: 0
_________________________________________________________________


## Training Model 

In [120]:
for input_example_batch, target_example_batch in dataset.take(1):
    example_batch_predicstions = model(input_example_batch)

In [123]:
sampled_indixes = tf.random.categorical(example_batch_predicstions[0], num_samples=1)

In [125]:
sampled_indixes = tf.squeeze(sampled_indixes, axis=-1).numpy()

In [127]:
ind_to_char[sampled_indixes]

array(['M', 'G', 'R', 'h', '!', 'T', '-', '0', 'J', 'P', 'S', 't', 'x',
       '7', 'g', 'Y', 'p', ' ', '4', '7', '.', '.', 'f', '"', 'y', 'P',
       'u', 'd', '.', 'i', '!', '!', 'w', '0', 'N', 'z', 'y', 'c', 't',
       ',', 'l', '-', 'I', 'J', 'B', 'e', 'Y', 'T', 'O', 'i', 'e', 'v',
       'i', 'g', 'J', 'o', '4', 'P', 'n', '4', 'K', 'p', 'N', 'o', 'A',
       'j', 'b', 'c', 'f', 'Y', 'c', 'M', '2', 'o', 'S', '4', 'L', "'",
       'c', '(', 'k', 'c', 'a', '(', 'S', 'd', ')', ',', 'k', 'y', 'Y',
       "'", 'I', 'B', 'E', 'K', 'M', 's', 'f', 't', '4', 'Y', '7', 'z',
       'o', '(', 'H', 'y', ',', '"', 'O', ')', 'Y', 'P', 't', 'w', 't',
       'U', 'R', 'Z', 'p', 'O', 'B', 'v', 'P', 'R', 'I', 'C', 'g', 'R',
       'r', 'y', '4', 'o', 's', ' ', 'o', 'o', 'T', 'l', 'W', '2', 'p',
       'I', ',', 'u', '0', 'g', 's', 'L'], dtype='<U1')

In [128]:
epochs = 40

In [129]:
model.fit(dataset, epochs=epochs)

Train for 9 steps
Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


<tensorflow.python.keras.callbacks.History at 0x7f980982d190>

In [130]:
model.save('eminem_rap.h5')

In [185]:
from tensorflow.keras.models import load_model

model = create_model(vocab_size, embed_dim, rnn_neurons, batch_size=1)

model.load_weights('eminem_rap.h5')

model.build(tf.TensorShape([1, None]))

In [186]:
model.summary()

Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_6 (Embedding)      (1, None, 64)             4096      
_________________________________________________________________
gru_5 (GRU)                  (1, None, 756)            1864296   
_________________________________________________________________
dense_5 (Dense)              (1, None, 64)             48448     
Total params: 1,916,840
Trainable params: 1,916,840
Non-trainable params: 0
_________________________________________________________________


In [167]:
def generate_text(model, start_seed,gen_size=100,temp=1.0):

  num_generate = gen_size

  input_eval = [char_to_ind[s] for s in start_seed]

  input_eval = tf.expand_dims(input_eval, 0)

  text_generated = []

  temperature = temp

  # Here batch size == 1
  model.reset_states()

  for i in range(num_generate):

      # Generate Predictions
      predictions = model(input_eval)

      # Remove the batch shape dimension
      predictions = tf.squeeze(predictions, 0)

      # Use a cateogircal disitribution to select the next character
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # Pass the predicted charracter for the next input
      input_eval = tf.expand_dims([predicted_id], 0)

      # Transform back to character letter
      text_generated.append(ind_to_char[predicted_id])

  return (start_seed + ''.join(text_generated))

In [180]:
print(generate_text(model,"I am Beginning to Feel like a", gen_size=50))

I am Beginning to Feel like a Rap God, Rake I La thow urenan to Sain's wrong to
