# Eminem's Rap Generation Using RNN

RNN is one of the very powerful deep-learning algorithm which works amazingly well on Sequential Data. As historical or past data plays major role in the prediction of sequential data, RNN takes these inputs of not only recent output but also past output.

## What is RNN?
Recurrent Neural Network is a generalization of feedforward neural network that has an internal memory. RNN is recurrent in nature as it performs the same function for every input of data while the output of the current input depends on the past one computation. After producing the output, it is copied and sent back into the recurrent network. For making a decision, it considers the current input and the output that it has learned from the previous input.

## What's the Data?

I simply copied the eminem's Rap God Lyrics and pasted in eminem.txt. It has only 7855 characters which is very few. You need minimum of million data for realistic predictions. However, this is just a test.

## Importing Libraries and Data 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
import tensorflow as tf

In [3]:
path_to_file = "eminem.txt"

In [4]:
text = open(path_to_file, 'r').read()

In [5]:
#print(text[7854:])

In [6]:
vocab = sorted(set(text))

In [7]:
vocab

['\n',
 ' ',
 '!',
 '#',
 '&',
 "'",
 '(',
 ')',
 '+',
 ',',
 '-',
 '.',
 '/',
 '0',
 '1',
 '2',
 '3',
 '4',
 '5',
 '7',
 '8',
 '9',
 ':',
 '?',
 'A',
 'B',
 'C',
 'D',
 'E',
 'F',
 'G',
 'H',
 'I',
 'J',
 'K',
 'L',
 'M',
 'N',
 'O',
 'P',
 'R',
 'S',
 'T',
 'V',
 'W',
 'Y',
 'a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'q',
 'r',
 's',
 't',
 'u',
 'v',
 'w',
 'x',
 'y',
 '❤',
 '🇳',
 '🇵',
 '😊',
 '🙏']

In [8]:
len(vocab)

76

## Text Preprocessing 

1. Vectorizing
2. Econding

In [9]:
# for pair in enumerate(vocab):
#     print(pair)

In [10]:
char_to_ind = {char:ind for ind,char in enumerate(vocab)}

In [11]:
char_to_ind['!']

2

In [12]:
ind_to_char = np.array(vocab)

In [13]:
ind_to_char[2]

'!'

In [14]:
#we iterate in whole text and assigned the number for each 
#character using from char_to_ind 

encoded_text = np.array([char_to_ind[c] for c in text])

In [15]:
sample = text[:50]
sample

' Hi & Good Morning !! I am writing in from India. '

In [16]:
encoded_text[:50]

array([ 1, 31, 54,  1,  4,  1, 30, 60, 60, 49,  1, 36, 60, 63, 59, 54, 59,
       52,  1,  2,  2,  1, 32,  1, 46, 58,  1, 68, 63, 54, 65, 54, 59, 52,
        1, 54, 59,  1, 51, 63, 60, 58,  1, 32, 59, 49, 54, 46, 11,  1])

## Creating Batches

1. Text Sequences
2. Shuffle and Generate Batches using TF

In [17]:
print(text[:500])

 Hi & Good Morning !! I am writing in from India. We interacted yesterday about my wish to place an order for a birthday cake to my loved one residing there in your service area. Can we interact further on the same ?
amaste 🙏🇳🇵🇳🇵🇳🇵from Yourkoseli Family. We are so glad to be part of 10000 + Celebrations and Bringing Happiness among your loved ones. Life is not just to live, it is to celebrate as well.
My query is still the same. As I am not the resident of  nepal but very keen on ordering a Birt


In [18]:
line = "Look, I was gonna go easy on you and not to hurt your feelings"
len(line)

62

In [19]:
lines = """
Look, I was gonna go easy on you and not to hurt your feelings
But I'm only going to get this one chance
Something's wrong, I can feel it (Six minutes, Slim Shady, you're on)
Just a feeling I've got, like something's about to happen, but I don't know what
"""

In [20]:
print(lines)


Look, I was gonna go easy on you and not to hurt your feelings
But I'm only going to get this one chance
Something's wrong, I can feel it (Six minutes, Slim Shady, you're on)
Just a feeling I've got, like something's about to happen, but I don't know what



In [21]:
len(lines)

257

In [22]:
seq_len = 50

In [23]:
total_num_seq = len(text)//(seq_len+1)

In [24]:
total_num_seq

358875

In [25]:
#create training sequences
char_dataset = tf.data.Dataset.from_tensor_slices(encoded_text)

In [26]:
type(char_dataset)

tensorflow.python.data.ops.dataset_ops.TensorSliceDataset

In [27]:
#converts into sequence we can feed in as batch
# for item in char_dataset.take(500):
#     print(ind_to_char[item.numpy()])

In [28]:
sequences = char_dataset.batch(seq_len+1, drop_remainder= True)

In [29]:
def create_seq_targets(seq):
    input_txt = seq[:-1] #look i am gonn
    target_txt = seq[1:] #ook i am gonna
    return input_txt, target_txt

In [30]:
dataset = sequences.map(create_seq_targets)

In [31]:
for input_txt, target_txt in dataset.take(1):
    print(input_txt.numpy())
    print("".join(ind_to_char[input_txt.numpy()]))
    print('\n')
    print(target_txt.numpy())
    print("".join(ind_to_char[target_txt.numpy()]))

[ 1 31 54  1  4  1 30 60 60 49  1 36 60 63 59 54 59 52  1  2  2  1 32  1
 46 58  1 68 63 54 65 54 59 52  1 54 59  1 51 63 60 58  1 32 59 49 54 46
 11  1]
 Hi & Good Morning !! I am writing in from India. 


[31 54  1  4  1 30 60 60 49  1 36 60 63 59 54 59 52  1  2  2  1 32  1 46
 58  1 68 63 54 65 54 59 52  1 54 59  1 51 63 60 58  1 32 59 49 54 46 11
  1 44]
Hi & Good Morning !! I am writing in from India. W


In [32]:
batch_size = 150

In [33]:
#Model should be able to learn from anywhere
#so let's suffle this

In [34]:
buffer_size = 7500

dataset = dataset.shuffle(buffer_size).batch(batch_size, drop_remainder=True)

In [35]:
dataset

<BatchDataset shapes: ((150, 50), (150, 50)), types: (tf.int64, tf.int64)>

## Creating a Model

    Embedding
    GRU
    Dense

In [36]:
vocab_size = len(vocab)

In [37]:
vocab_size

76

In [38]:
embed_dim = 64

In [39]:
rnn_neurons = 756 #random

In [40]:
from tensorflow.keras.losses import sparse_categorical_crossentropy

In [41]:
help(sparse_categorical_crossentropy)

Help on function sparse_categorical_crossentropy in module tensorflow.python.keras.losses:

sparse_categorical_crossentropy(y_true, y_pred, from_logits=False, axis=-1)



In [42]:
def sparse_cat_loss(y_true, y_pred):
    return sparse_categorical_crossentropy(y_true, y_pred, from_logits=True)

In [43]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, GRU, Dense

In [44]:
def create_model(vocab_size, embed_dim, rnn_neurons, batch_size):
    
    model = Sequential()
    model.add(Embedding(vocab_size, embed_dim, batch_input_shape=[batch_size, None]))
    model.add(GRU(rnn_neurons,return_sequences=True, stateful=True, recurrent_initializer='glorot_uniform'))
    model.add(Dense(vocab_size))
    
    model.compile('adam',loss=sparse_cat_loss)
    
    return model

In [45]:
model = create_model(vocab_size= vocab_size,
                   embed_dim=embed_dim,
                   rnn_neurons= rnn_neurons,
                   batch_size=batch_size)

In [46]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (150, None, 64)           4864      
_________________________________________________________________
gru (GRU)                    (150, None, 756)          1864296   
_________________________________________________________________
dense (Dense)                (150, None, 76)           57532     
Total params: 1,926,692
Trainable params: 1,926,692
Non-trainable params: 0
_________________________________________________________________


## Training Model 

In [47]:
for input_example_batch, target_example_batch in dataset.take(1):
    example_batch_predicstions = model(input_example_batch)

In [48]:
sampled_indixes = tf.random.categorical(example_batch_predicstions[0], num_samples=1)

In [49]:
sampled_indixes = tf.squeeze(sampled_indixes, axis=-1).numpy()

In [50]:
ind_to_char[sampled_indixes]

array(['w', 'L', '/', 'q', 'y', 'D', ' ', 's', '😊', 'O', 'I', 'T', '!',
       '\n', 'y', '9', 'S', 'u', 'e', 'j', '🇵', 'P', 'k', 'p', 'V', 'w',
       'E', 'H', 'r', 'J', 'l', '😊', 'O', 'A', 'P', 'L', 's', 's', '/',
       'c', 'm', 'F', "'", '!', 'T', '😊', 'R', 'W', 'a', '&'], dtype='<U1')

In [51]:
epochs = 40

In [None]:
model.fit(dataset, epochs=epochs)

Train for 2392 steps
Epoch 1/40
  42/2392 [..............................] - ETA: 49:04 - loss: 3.8137

In [None]:
model.save('yk.h5')

In [52]:
from tensorflow.keras.models import load_model

model = create_model(vocab_size, embed_dim, rnn_neurons, batch_size=1)

model.load_weights('yk.h5')

model.build(tf.TensorShape([1, None]))

In [53]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (1, None, 64)             4096      
_________________________________________________________________
gru_1 (GRU)                  (1, None, 756)            1864296   
_________________________________________________________________
dense_1 (Dense)              (1, None, 64)             48448     
Total params: 1,916,840
Trainable params: 1,916,840
Non-trainable params: 0
_________________________________________________________________


In [54]:
def generate_text(model, start_seed,gen_size=100,temp=1.0):

  num_generate = gen_size

  input_eval = [char_to_ind[s] for s in start_seed]

  input_eval = tf.expand_dims(input_eval, 0)

  text_generated = []

  temperature = temp

  # Here batch size == 1
  model.reset_states()

  for i in range(num_generate):

      # Generate Predictions
      predictions = model(input_eval)

      # Remove the batch shape dimension
      predictions = tf.squeeze(predictions, 0)

      # Use a cateogircal disitribution to select the next character
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # Pass the predicted charracter for the next input
      input_eval = tf.expand_dims([predicted_id], 0)

      # Transform back to character letter
      text_generated.append(ind_to_char[predicted_id])

  return (start_seed + ''.join(text_generated))

In [56]:
print(generate_text(model,"Look, I was gonna go easy on you and not to hurt your feelings But I'm only going to get this one chance Something's wrong, I can feel it (Six minutes", gen_size=50))

Look, I was gonna go easy on you and not to hurt your feelings But I'm only going to get this one chance Something's wrong, I can feel it (Six minutes, pof, so hortsonid theide
And be whon you mainthi
