# Character level language model - Try to make an Artist name

In [3]:
import numpy as np
from RNN import *
import random
import pandas as pd
import re
import unidecode

## 1 - Generate some Latin artist names using RNN

### 1.1 - Dataset and Preprocessing

Read the dataset of artist names, create a list of unique characters (such as a-z), and compute the dataset and vocabulary size. Let's try some latin artists names.

In [64]:
q_artist = """
    SELECT artist_name
    FROM `gro-analytics.discovery_tool.discovery_extract_anon_201806*` 
    WHERE conn_country IN ('MX','SP')
    AND LOWER(metagenre) = 'latin'
    AND _table_suffix BETWEEN '01' AND '19'
    AND length(artist_name ) >= 5
    --AND length(artist_name ) <= 35    
    AND REGEXP_CONTAINS(artist_name, r"[0-9]") IS NOT TRUE
    AND REGEXP_CONTAINS(artist_name, r"[&#@!]") IS NOT TRUE
    GROUP BY 1
  """
df_artist = pd.io.gbq.read_gbq(q_artist, dialect='standard', project_id='gro-analytics')

In [65]:
artist_name = list(df_artist.artist_name.values)
artist_name = [name.lower() for name in artist_name]
artist_name = [unidecode.unidecode(name) for name in artist_name]
artist_name = [re.sub(r'[^a-z ]',r'',name) for name in artist_name]

In [66]:
artist_name = [name for name in artist_name if len(name)>=5 and name[0]!=' ']
artist_name[:10]

['leo dan',
 'nigga',
 'via zaragoza',
 'los nocheros',
 'maritza rodriguez',
 'alci acostadaniel santosorlando contreraskike vega',
 'camilo septimo',
 'grupo rehen',
 'the sacados',
 'christian nodal']

In [67]:
artist = '\n'.join(artist_name)
artist[:50]

'leo dan\nnigga\nvia zaragoza\nlos nocheros\nmaritza ro'

In [68]:
chars = list(set(artist))
data_size, vocab_size = len(artist), len(chars)
print('There are %d total characters and %d unique characters in your data.' % (data_size, vocab_size))

There are 81251 total characters and 28 unique characters in your data.


 `char_to_ix` and `ix_to_char` are the python dictionaries. 

In [69]:
char_to_ix = { ch:i for i,ch in enumerate(sorted(chars)) }
ix_to_char = { i:ch for i,ch in enumerate(sorted(chars)) }
print(ix_to_char)

{0: '\n', 1: ' ', 2: 'a', 3: 'b', 4: 'c', 5: 'd', 6: 'e', 7: 'f', 8: 'g', 9: 'h', 10: 'i', 11: 'j', 12: 'k', 13: 'l', 14: 'm', 15: 'n', 16: 'o', 17: 'p', 18: 'q', 19: 'r', 20: 's', 21: 't', 22: 'u', 23: 'v', 24: 'w', 25: 'x', 26: 'y', 27: 'z'}


### 1.2 - Overview of the model


- Initialize parameters 
- Run the optimization loop
    - Forward propagation to compute the loss function
    - Backward propagation to compute the gradients with respect to the loss function
    - Clip the gradients to avoid exploding gradients
    - Using the gradients, update your parameter with the gradient descent update rule.
- Return the learned parameters 
    

## 2 - Building blocks of the model

- Gradient clipping: to avoid exploding gradients
- Sampling: a technique used to generate characters


### 2.1 - Clipping the gradients in the optimization loop

Perform gradient clipping when needed to make sure that gradients are not "exploding," meaning taking on overly large values. 


In [70]:
def clip(gradients, maxValue):
    '''
    Clips the gradients' values between minimum and maximum.
    
    Arguments:
    gradients -- a dictionary containing the gradients "dWaa", "dWax", "dWya", "db", "dby"
    maxValue -- everything above this number is set to this number, and everything less than -maxValue is set to -maxValue
    
    Returns: 
    gradients -- a dictionary with the clipped gradients.
    '''
    
    dWaa, dWax, dWya, db, dby = gradients['dWaa'], gradients['dWax'], gradients['dWya'], gradients['db'], gradients['dby']
   
    # clip to mitigate exploding gradients, loop over [dWax, dWaa, dWya, db, dby]. (≈2 lines)
    for gradient in [dWax, dWaa, dWya, db, dby]:
        np.clip(gradient, -maxValue, maxValue, out=gradient)
    gradients = {"dWaa": dWaa, "dWax": dWax, "dWya": dWya, "db": db, "dby": dby}
    
    return gradients

In [71]:
np.random.seed(3)
dWax = np.random.randn(5,3)*10
dWaa = np.random.randn(5,5)*10
dWya = np.random.randn(2,5)*10
db = np.random.randn(5,1)*10
dby = np.random.randn(2,1)*10
gradients = {"dWax": dWax, "dWaa": dWaa, "dWya": dWya, "db": db, "dby": dby}
gradients = clip(gradients, 10)
print("gradients[\"dWaa\"][1][2] =", gradients["dWaa"][1][2])
print("gradients[\"dWax\"][3][1] =", gradients["dWax"][3][1])
print("gradients[\"dWya\"][1][2] =", gradients["dWya"][1][2])
print("gradients[\"db\"][4] =", gradients["db"][4])
print("gradients[\"dby\"][1] =", gradients["dby"][1])

gradients["dWaa"][1][2] = 10.0
gradients["dWax"][3][1] = -10.0
gradients["dWya"][1][2] = 0.2971381536101662
gradients["db"][4] = [10.]
gradients["dby"][1] = [8.45833407]


### 2.2 - Sampling

You would like to generate new text (characters). 

In [72]:
def sample(parameters, char_to_ix, seed):
    """
    Sample a sequence of characters according to a sequence of probability distributions output of the RNN

    Arguments:
    parameters -- python dictionary containing the parameters Waa, Wax, Wya, by, and b. 
    char_to_ix -- python dictionary mapping each character to an index.
    seed -- used for grading purposes. Do not worry about it.

    Returns:
    indices -- a list of length n containing the indices of the sampled characters.
    """
    
    # Retrieve parameters and relevant shapes from "parameters" dictionary
    Waa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']
    vocab_size = by.shape[0]
    n_a = Waa.shape[1]
    
    # Step 1: Create the one-hot vector x for the first character (initializing the sequence generation). (≈1 line)
    x = np.zeros((vocab_size,1))
    # Step 1': Initialize a_prev as zeros (≈1 line)
    a_prev = np.zeros((n_a,1))
    
    # Create an empty list of indices, this is the list which will contain the list of indices of the characters to generate (≈1 line)
    indices = []
    
    # Idx is a flag to detect a newline character, we initialize it to -1
    idx = -1 
    
    # Loop over time-steps t. At each time-step, sample a character from a probability distribution and append 
    # its index to "indices". We'll stop if we reach 50 characters (which should be very unlikely with a well 
    # trained model), which helps debugging and prevents entering an infinite loop. 
    counter = 0
    newline_character = char_to_ix['\n']
    
    while (idx != newline_character and counter != 30):
        
        # Step 2: Forward propagate x using the equations (1), (2) and (3)
        a = np.tanh(np.dot(Wax, x) + np.dot(Waa, a_prev) + b)  # a⟨t+1⟩=tanh(Wax*x⟨t⟩+Waa*a⟨t⟩+b)
        z = np.dot(Wya, a) + by  # z⟨t+1⟩=Wya*a⟨t+1⟩+by
        y = softmax(z)
        
        np.random.seed(counter+seed) 
        
        # Step 3: Sample the index of a character within the vocabulary from the probability distribution y
        idx = np.random.choice(range(vocab_size), p = y.ravel())

        # Append the index to "indices"
        indices.append(idx)
        
        # Step 4: Overwrite the input character as the one corresponding to the sampled index.
        x = np.zeros((vocab_size,1))
        x[idx] = 1
        
        # Update "a_prev" to be "a"
        a_prev = a
        
        seed += 1
        counter +=1

    if (counter == 30):
        indices.append(char_to_ix['\n'])
    
    return indices

## 3 - Building the language model 

It is time to build the character-level language model for text generation. 


### 3.1 - Gradient descent 

Implementation teps:

- Forward propagate through the RNN to compute the loss
- Backward propagate through time to compute the gradients of the loss with respect to the parameters
- Clip the gradients if necessary 
- Update your parameters using gradient descent  

The following functions are implemented in RNN.py

```python
def rnn_forward(X, Y, a_prev, parameters):
    """ Performs the forward propagation through the RNN and computes the cross-entropy loss.
    It returns the loss' value as well as a "cache" storing values to be used in the backpropagation."""
    ....
    return loss, cache
    
def rnn_backward(X, Y, parameters, cache):
    """ Performs the backward propagation through time to compute the gradients of the loss with respect
    to the parameters. It returns also all the hidden states."""
    ...
    return gradients, a

def update_parameters(parameters, gradients, learning_rate):
    """ Updates parameters using the Gradient Descent Update Rule."""
    ...
    return parameters
```

In [77]:
def optimize(X, Y, a_prev, parameters, learning_rate = 0.01, vocab_size = vocab_size):
    """
    Execute one step of the optimization to train the model.
    
    Arguments:
    X -- list of integers, where each integer is a number that maps to a character in the vocabulary.
    Y -- list of integers, exactly the same as X but shifted one index to the left.
    a_prev -- previous hidden state.
    parameters -- python dictionary containing:
                        Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
                        Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
                        Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        b --  Bias, numpy array of shape (n_a, 1)
                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
    learning_rate -- learning rate for the model.
    
    Returns:
    loss -- value of the loss function (cross-entropy)
    gradients -- python dictionary containing:
                        dWax -- Gradients of input-to-hidden weights, of shape (n_a, n_x)
                        dWaa -- Gradients of hidden-to-hidden weights, of shape (n_a, n_a)
                        dWya -- Gradients of hidden-to-output weights, of shape (n_y, n_a)
                        db -- Gradients of bias vector, of shape (n_a, 1)
                        dby -- Gradients of output bias vector, of shape (n_y, 1)
    a[len(X)-1] -- the last hidden state, of shape (n_a, 1)
    """
    
    # Forward propagate through time
    loss, cache = rnn_forward(X, Y, a_prev, parameters,vocab_size)
    
    # Backpropagate through time 
    gradients, a = rnn_backward(X, Y, parameters, cache)
    
    # Clip radients between -5 (min) and 5 (max)
    gradients = clip(gradients, 5)
    
    # Update parameters 
    parameters = update_parameters(parameters, gradients, learning_rate)

    
    return loss, gradients, a[len(X)-1]



In [74]:
# test
np.random.seed(1)
n_a, vocab_size = 30, len(chars)
#vocab_size, n_a = 53, 100
a_prev = np.random.randn(n_a, 1)
Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(vocab_size, n_a)
b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1)
parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b, "by": by}
X = [12,3,5,11,22,3]
Y = [4,14,11,22,25,26]

loss, gradients, a_last = optimize(X, Y, a_prev, parameters, learning_rate = 0.01,vocab_size = vocab_size)
print("Loss =", loss)
print("gradients[\"dWaa\"][1][2] =", gradients["dWaa"][1][2])
print("np.argmax(gradients[\"dWax\"]) =", np.argmax(gradients["dWax"]))
print("gradients[\"dWya\"][1][2] =", gradients["dWya"][1][2])
print("gradients[\"db\"][4] =", gradients["db"][4])
print("gradients[\"dby\"][1] =", gradients["dby"][1])
print("a_last[4] =", a_last[4])

Loss = 73.82355093017527
gradients["dWaa"][1][2] = -0.6881049834946623
np.argmax(gradients["dWax"]) = 152
gradients["dWya"][1][2] = -0.0766191173363582
gradients["db"][4] = [-0.07707836]
gradients["dby"][1] = [0.08717006]
a_last[4] = [0.99989098]


### 3.2 - Training the model 

Given the dataset of artist names, we use each line of the dataset (one name) as one training example. Every 100 steps of stochastic gradient descent, we sampe 5 randomly chosen names to see how the algorithm is doing. 


In [75]:
def model(data, ix_to_char, char_to_ix, num_iterations = 60000, n_a = 15, num_names = 5, vocab_size = vocab_size):
    """
    Trains the model and generates dinosaur names. 
    
    Arguments:
    data -- text corpus
    ix_to_char -- dictionary that maps the index to a character
    char_to_ix -- dictionary that maps a character to an index
    num_iterations -- number of iterations to train the model for
    n_a -- number of units of the RNN cell
    num_names -- number of dinosaur names you want to sample at each iteration. 
    vocab_size -- number of unique characters found in the text, size of the vocabulary
    
    Returns:
    parameters -- learned parameters
    """
    
    # Retrieve n_x and n_y from vocab_size
    n_x, n_y = vocab_size, vocab_size
    
    # Initialize parameters
    parameters = initialize_parameters(n_a, n_x, n_y)
    
    # Initialize loss (this is required because we want to smooth our loss, don't worry about it)
    loss = get_initial_loss(vocab_size, num_names)
    
    # Build list of all names (training examples).
    examples = artist_name
    
    # Shuffle list of all dinosaur names
    np.random.seed(0)
    np.random.shuffle(examples)
    
    # Initialize the hidden state of your LSTM
    a_prev = np.zeros((n_a, 1))
    
    # Optimization loop
    for j in range(num_iterations):
        
        index = j % len(examples)
        X = [char_to_ix[ch] for ch in examples[index]] 
        Y = X[1:] + [char_to_ix["\n"]]
        
        # Perform one optimization step: Forward-prop -> Backward-prop -> Clip -> Update parameters
        # Choose a learning rate of ?
        curr_loss, gradients, a_prev = optimize(X, Y, a_prev, parameters, learning_rate = 0.05)
        
        # Use a latency trick to keep the loss smooth. It happens here to accelerate the training.
        loss = smooth(loss, curr_loss)

        # Every 2000 Iteration, generate "n" characters 
        seed = 0
        if j % 2000 == 0:
            
            print('Iteration: %d' % (j) + '\n')
            
            # The number of dinosaur names to print
            # seed = 0
            for name in range(num_names):
                #print("name",name)
                
                # Sample indices and print them
                sampled_indices = sample(parameters, char_to_ix, seed)
                #while sampled_indices[0] == 0:
                #    seed += 1
                #    sampled_indices = sample(parameters, char_to_ix, seed)
                #def print_sample(sample_ix, ix_to_char):
                txt = ''.join(ix_to_char[ix] for ix in sampled_indices)
                #print("sample:",sampled_indices,"\n")
                if len(txt)>4:
                    print(txt)
                #print("sample:",sampled_indices,"\n")
                #print_sample(sampled_indices, ix_to_char)
                
                seed += 1  # To get the same result for grading purposed, increment the seed by one. 
      
            print('\n')
        
    return parameters

Run the following cell, the model outputs random-looking characters at the first iteration. After a few thousand iterations, our model should learn to generate reasonable-looking names. 

In [76]:
parameters = model(artist, ix_to_char, char_to_ix)

Iteration: 0

njzwwtcmeqodygspv sijivt

jne 

jzwwtcmeqodygspv sijivt

zwwtcmeqodygspv sijivt



Iteration: 2000

ilutro maridues s

en b dos cantitedivel

ezos celio varlo

usro maricues s



Iteration: 4000

ertoso marrez

exsos larrdreses

usos karrez



Iteration: 6000

iltolo ladido res

ezpoe lalmatenio

ie a roca

uspo lalo y res



Iteration: 8000

ikyos charmbulos arerevo

eztor jariates parenasi hily d

uto charo varos



Iteration: 10000

litosa

in a dus aboretedoto ay gora

ivosa

utro



Iteration: 12000

elupon lariatamez

en a

ezmos

ec actra

uvora coray pos



Iteration: 14000

entosico piez

ezkon

el aguon

uto cominas



Iteration: 16000

icurro

ie balos acho senez dez

ivoscala mexdo sarichoi

utro



Iteration: 18000

iluton mano

ere agunda

izkor

utor



Iteration: 20000

ievoro

exton jarnaterno

ic adro

usto



Iteration: 22000

iguosa lamicthupo

eqe acos canteta ito cu de cas

eyos cherlas los

urrmanana vermo



Iteration: 24000

ikusto

ezprialaliauer

### 3.3 Results

Final names after iteration: 59000, Loss: 47.687334

ejusto

ezhin de las nor pendio

ec almo cantes diticay

toro
