# Language Models with RNNs [10 pts]

In this homework, your goal is to train a RNN that can generate text 💬 from a corpus of your choosing!

Be prepared to discuss your results in the upcoming recit class.

<img src="friendly-chatbot.jpeg" width="500">

<div class="alert alert-info">
    
**Important Notes:** 
    
1. No pre-trained or transformer-based models! Save these for your individual project / public presentation. 🤗
    
2. This homework is by LT.
    
</div>

## Summary

Fill in the following tables. You can add more information if you feel it is relevant.

Dataset Information:

|  |  |
| ----- | ----- |
| **Dataset** | Song Lyrics dataset |
| **Size on Disk** | 1049 KB |
| **Number of Unique Tokens** | 3158 tokens (Taylor Swift Songs) |

Final Configuration:

|  |  |
| ----- | ----- |
| **CPU** | Intel i7-9750H |
| **RAM** | 16GB |
| **GPU** | NVIDIA GTX1650 |
| **Optimizer** | ADAM |
| **Batch Size** | 256 |
| **Epochs** | 1000 |
| **Time per Epoch** | 76ms/step |
| **Training Time** | 10min |
| **Number of Parameters** | 3,611,991 |
| **Model Size on Disk** | 42332 KB |

## Q&A

Answer the following:

<div class="alert alert-success">
    
1. **[2 pts]** Describe the dataset that you've chosen. 

    >Dataset is retrieved from kaggle. It consists of different artists and their songs. Each song has its lyrics stored as a string. Song lyrics from artists such as Taylor Swift, Billie Eilish, The Beatles, Ed Sheeran, and several more can be found in the dataset. 
    

2. **[2 pts]** Why did you choose this dataset? What do you want your model to do?
> We chose this dataset to try and generate lyrics from different songs of different artists. We tried to generate lyrics for a Taylor Swift song.

3. **[2 pts]** Describe your final architecture. 
> Our final architecture is a sequential model with word embeddings, GRU RNN units, and a dense layer at the end to spit out logits. We trained the final model up to 1000 epochs and achieved a loss of 0.05 loss 

4. **[2 pts]** How did you end up with your architecture? Document your experimentation procedure.
> We based our architecture on the architecture on the lecture notebook and experimented by changing parameters such as batch size, buffer size, sequence length, and type of RNN units. 

5. **[2 pts]** How would you evaluate the quality of your model? Are there quantitative ways to do so?
    
     >We measured how good our model is through loss. we played around with the parameters to help us reduce the loss as low as possible

    
</div>

In [38]:
len(pd.Series(df.groupby('Artist')['Lyrics'].sum()['Taylor Swift'].split()).unique())

3158

In [1]:
import numpy as np
import tensorflow as tf
import pandas as pd
import nltk

In [2]:
import re

In [3]:
df = pd.read_csv('Songs.csv')
df.head()

Unnamed: 0,Artist,Title,Lyrics
0,Taylor Swift,cardigan,"Vintage tee, brand new phone\nHigh heels on co..."
1,Taylor Swift,exile,"I can see you standing, honey\nWith his arms a..."
2,Taylor Swift,Lover,We could leave the Christmas lights up 'til Ja...
3,Taylor Swift,the 1,"I'm doing good, I'm on some new shit\nBeen say..."
4,Taylor Swift,Look What You Made Me Do,I don't like your little games\nDon't like you...


In [4]:
df.groupby('Artist')['Lyrics'].sum()['Billie Eilish']



In [31]:
len(tokens)

19431

In [6]:
tokens = df.groupby('Artist')['Lyrics'].sum()['Taylor Swift'].split()
vocab = list(set(tokens))
vocab_size = len(vocab)

In [7]:
ids_from_tokens = tf.keras.layers.StringLookup(vocabulary=vocab)

In [8]:
ids = ids_from_tokens(tokens)

In [9]:
# Reverse lookup
tokens_from_ids = tf.keras.layers.StringLookup(vocabulary=ids_from_tokens.get_vocabulary(), invert=True) 

In [10]:
# Create a tf dataset object for easy batching / looping over data
ids_dataset = tf.data.Dataset.from_tensor_slices(ids_from_tokens(tokens))

for ids in ids_dataset.take(10):
    print(tokens_from_ids(ids).numpy().decode('utf-8'))

Vintage
tee,
brand
new
phone
High
heels
on
cobblestones
When


In [11]:
seq_length = 20 #hyperparameter, cam tune. Longer sequences = long RNNs = vanishing gradient problems
examples_per_epoch = len(tokens)

sequences = ids_dataset.batch(seq_length+1, drop_remainder=True) # seq_length+1 because we want to predict the next word

for seq in sequences.take(1):
    print(tokens_from_ids(seq))

tf.Tensor(
[b'Vintage' b'tee,' b'brand' b'new' b'phone' b'High' b'heels' b'on'
 b'cobblestones' b'When' b'you' b'are' b'young,' b'they' b'assume' b'you'
 b'know' b'nothing' b'Sequin' b'smile,' b'black'], shape=(21,), dtype=string)


In [12]:
list(sequences.take(1))

[<tf.Tensor: shape=(21,), dtype=int64, numpy=
 array([1614,  313,  831, 2313, 1540,   31, 3010, 2904,  255, 1917,  300,
         711, 1504, 1629, 2202,  300,  829, 1345, 2152, 2919, 2711],
       dtype=int64)>]

In [13]:
def split_input_target(sequence):
    input_text = sequence[:-1]
    target_text = sequence[1:]
    return input_text, target_text

split_input_target('Tensorflow is very cool.'.split())

(['Tensorflow', 'is', 'very'], ['is', 'very', 'cool.'])

In [14]:
dataset = sequences.map(split_input_target)

for input_example, target_example in dataset.take(1):
    print("Input :", tokens_from_ids(input_example))
    print("Target:", tokens_from_ids(target_example))

Input : tf.Tensor(
[b'Vintage' b'tee,' b'brand' b'new' b'phone' b'High' b'heels' b'on'
 b'cobblestones' b'When' b'you' b'are' b'young,' b'they' b'assume' b'you'
 b'know' b'nothing' b'Sequin' b'smile,'], shape=(20,), dtype=string)
Target: tf.Tensor(
[b'tee,' b'brand' b'new' b'phone' b'High' b'heels' b'on' b'cobblestones'
 b'When' b'you' b'are' b'young,' b'they' b'assume' b'you' b'know'
 b'nothing' b'Sequin' b'smile,' b'black'], shape=(20,), dtype=string)


In [15]:
# Batch size
batch_size = 256

# Buffer size to shuffle the dataset
buffer_size = 10000

dataset = sequences.map(split_input_target)
dataset = dataset.shuffle(buffer_size).batch(batch_size, drop_remainder=True) # Shuffle and batch. #shuffling to prevent overfitting

dataset

<BatchDataset shapes: ((256, 20), (256, 20)), types: (tf.int64, tf.int64)>

In [16]:
# The embedding dimension
embedding_dim = 256

# Number of RNN units
rnn_units = 512

# Word-level RNN
model = tf.keras.Sequential()
model.add(tf.keras.layers.Embedding(vocab_size+1, embedding_dim)) # +1 for the unknown token
model.add(tf.keras.layers.GRU(rnn_units, return_sequences=True)) # return outputs at every time step
model.add(tf.keras.layers.Dense(vocab_size+1)) # notice how theres no softmax here (you can put it in the loss function!)
          
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, None, 256)         808704    
_________________________________________________________________
gru (GRU)                    (None, None, 512)         1182720   
_________________________________________________________________
dense (Dense)                (None, None, 3159)        1620567   
Total params: 3,611,991
Trainable params: 3,611,991
Non-trainable params: 0
_________________________________________________________________


In [17]:
loss = tf.losses.SparseCategoricalCrossentropy(from_logits=True)

model.compile(optimizer='adam', loss=loss)


# checkpointing in case kernel crashes. It saves your progress every checkpoint.
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath='./training_checkpoints/ckpt_{epoch}',
    save_weights_only=True)

In [18]:
epochs = 1000

history = model.fit(dataset, epochs=epochs, callbacks=[checkpoint_callback])

Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000
Epoch 10/1000
Epoch 11/1000
Epoch 12/1000
Epoch 13/1000
Epoch 14/1000
Epoch 15/1000
Epoch 16/1000
Epoch 17/1000
Epoch 18/1000
Epoch 19/1000
Epoch 20/1000
Epoch 21/1000
Epoch 22/1000
Epoch 23/1000
Epoch 24/1000
Epoch 25/1000
Epoch 26/1000
Epoch 27/1000
Epoch 28/1000
Epoch 29/1000
Epoch 30/1000
Epoch 31/1000
Epoch 32/1000
Epoch 33/1000
Epoch 34/1000
Epoch 35/1000
Epoch 36/1000
Epoch 37/1000
Epoch 38/1000
Epoch 39/1000
Epoch 40/1000
Epoch 41/1000
Epoch 42/1000
Epoch 43/1000
Epoch 44/1000
Epoch 45/1000
Epoch 46/1000
Epoch 47/1000
Epoch 48/1000
Epoch 49/1000
Epoch 50/1000
Epoch 51/1000
Epoch 52/1000
Epoch 53/1000
Epoch 54/1000
Epoch 55/1000
Epoch 56/1000
Epoch 57/1000
Epoch 58/1000
Epoch 59/1000
Epoch 60/1000
Epoch 61/1000
Epoch 62/1000
Epoch 63/1000
Epoch 64/1000
Epoch 65/1000
Epoch 66/1000
Epoch 67/1000
Epoch 68/1000
Epoch 69/1000
Epoch 70/1000
Epoch 71/1000
Epoch 72/1000
E

In [33]:
temperature = 2 # Tuneable
prompt = 'love'

gen_len = 15 #generate 20 new words

for i in range(gen_len):

    output = model(tf.expand_dims(ids_from_tokens(prompt.split()), axis=0))
    output = output[:, -1, :]
    output = output/temperature 
    output = tf.random.categorical(output, num_samples=1) # vector of probabilities
    output = tf.squeeze(output, axis=-1)

    output_text = tokens_from_ids(output)
    output_text = output_text.numpy()[0].decode('utf-8')
    
    prompt = prompt + ' ' + output_text

print(prompt)

love as pure as it And thеn it fades into the gray of my day-old tea


In [34]:
"love as pure as it And thеn it fades into the gray of my day-old tea"

'love as pure as it And thеn it fades into the gray of my day-old tea'