In this project we give a list of all the names of the existing Pokémon so far and train the computer to generate new names that look similar. We use a recurent neural network (RNN) for this task. Since the dataset is pretty small (around 800 names) the network will be very simple. The idea for this project came from the following sources:
- https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py
- https://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/
- https://towardsdatascience.com/character-level-language-model-1439f5dd87fe
- assignment 2 of week 1 of the 5th course in the deep learning specialization on Coursera

# Gather and clean up the  data

The data that I need is the names of all the Pokémon. It must be very easy to find such a list online and download it but I found a beautiful website (https://pokeapi.co) that has an API for all sorts of information about Pokémon and I thought it would be fun to learn how to use it since I never used an API.

In [1]:
import numpy as np
import requests

ModuleNotFoundError: No module named 'requests'

It is actually very simple to get a list of the names because there is a specific link with all of them.

In [12]:
result = requests.get("https://pokeapi.co/api/v2/pokemon/?limit=1000").json()
print(result.keys())

dict_keys(['count', 'next', 'previous', 'results'])


We need to extract just the names from the list.

In [15]:
full_names = [result['results'][i]['name'] for i in range(result['count'])]
full_names[:10]

['bulbasaur',
 'ivysaur',
 'venusaur',
 'charmander',
 'charmeleon',
 'charizard',
 'squirtle',
 'wartortle',
 'blastoise',
 'caterpie']

The dataset is pretty small so we can study it manually. There are many names that are repeated with some characteristic at the end, like all the alola versions of the Pokémons and the mega evolutions. Those are indicated by '-' so we will use that to get rid of the extras. This may cause a few problems with cases where the '-' is actually important. We can take care of these examples by hand:
- mr-mime should become mr mime (number 122)
- ho-oh should become ho oh (number 250)
- type-null should become type null (number 772)
- mime-jr should become mime jr (number 439)
- should remove the tapu for the ones that contain it (numbers 785-788)
- should transform porygon2 into porygon two (number 233)
- porygon-z should become porygon z (number 474)

In [16]:
full_names[121] = 'mr mime'
full_names[249] = 'ho oh'
full_names[771] = 'type null'
full_names[438] = 'mime jr'
full_names[784] = 'koko'
full_names[785] = 'lele'
full_names[786] = 'bulu'
full_names[787] = 'fini'
full_names[232] = 'porygon two'
full_names[473] = 'porygon z'

Now we split all the names at the '-' and keep only the first part, which if we did the cleaning correctly should be the actual name of the Pokémon.

In [17]:
names_duplicates = list(map(lambda s: s.split('-')[0], full_names))
len(names_duplicates)

964

We got rid of the extra parts in the names so we have now a lot of repeated names, which we get rid of here.

In [18]:
names = list(set(names_duplicates))
len(names)

806

The last thing that will be useful to do is add a '.' at the end of each name. This will be useful to tell the RNN that the name is over.

In [19]:
names = list(map(lambda s: s + '.', names))
names[:10]

['throh.',
 'politoed.',
 'beartic.',
 'ariados.',
 'dragonair.',
 'lapras.',
 'chatot.',
 'maractus.',
 'tangela.',
 'horsea.']

# Transform the data

Now that we have our data cleaned up it's time to transform it into a form that the neural network will understand. The details of the model will be explained later but all we need to know is that we will input characters into the network instead of words. Each of these characters will then be converted to numbers and the conversion is done using the following dictionary:

In [11]:
# Convert from character to index
char_to_index = dict( (chr(i+96), i) for i in range(1,27))
char_to_index[' '] = 0
char_to_index['.'] = 27

# Convert from index to character
index_to_char = dict( (i, chr(i+96)) for i in range(1,27))
index_to_char[0] = ' '
index_to_char[27] = '.'

We need to define a few constants that will be useful later.

In [12]:
# maximum number of characters in Pokémon names
# this will be the number of time steps in the RNN
max_char = len(max(names, key=len))

# number of elements in the list of names, this is the number of training examples
m = len(names)

# number of potential characters, this is the length of the input of each of the RNN units
char_dim = len(char_to_index)

Finally we convert the list of names into a training dataset. The input X of the network is an array of size (m, max_char, char_dim). It contains a matrix for each of the m names. Each matrix contains a row for each character in the name. (Note that there are always the same number of rows and if the name doesn't have enough characters to fill the whole matrix the remaining rows contain nothing.) Each of these rows represents one character and it is encoded as a one-hot vector. This means that it is a vector of zeros with a one only in the entry that corresponds to the character that is present.

The output Y is the same as the input but translated by one unit. This means that the ith character in Y is the (i+1)th one in the actual name. This means that the network predicts the character that follows a given character in a sequence. 

In [13]:
X = np.zeros((m, max_char, char_dim))
Y = np.zeros((m, max_char, char_dim))

for i in range(m):
    name = list(names[i])
    for j in range(len(name)):
        X[i, j, char_to_index[name[j]]] = 1
        if j < len(name)-1:
            Y[i, j, char_to_index[name[j+1]]] = 1

# RNN model

The model that we will use is a many-to-many recurrent neural network. This is a network that contains a given number of 'time' steps that each act with the same weights on the individual inputs and are all connected. Each time step takes in one input (in this case one character) and outputs a one-hot vector that represents the probabilities for the input of the next time step. 

In [14]:
from keras.models import Sequential
from keras.layers import LSTM, Dense
from keras.callbacks import LambdaCallback

Using TensorFlow backend.


In the case of interest here we only consider one layer of recurrence, which we take to be LSTM with 128 units. We return the output of this layer and use it into a fully connected dense layer that converts the result of the LSTM layer into a vector of size char_dim using a softmax activation. We use categorical cross entropy as a cost function because of the softmax result and use Adam optimization. There is not really any useful metric to judge if the model does good so we will mostly just look at the results.

In [68]:
model = Sequential()
model.add(LSTM(128, input_shape=(max_char, char_dim), return_sequences=True))
model.add(Dense(char_dim, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam')

Once this model will be trained we will use it to create new Pokémon names. This is achieved using the following function. The idea is to input empty characters to the trained network and use the output of the first time step as a probability distribution for the first letter of the name. We then use this distribution to decide randomly the first character, record it and update the input to pass this character as an input for the second time step. This is continued for the following time steps to create a name.

This is where using a '.' at the end of each name becomes important, because we stop the procedure once we get a '.' as an output, meaning that the generated name is done. Also if we reach the length of the largest name in the training set we put a '.' and end the procedure.

In [40]:
def make_name(model):
    name = []
    x = np.zeros((1, max_char, char_dim))
    end = False
    i = 0
    
    while end==False:
        probs = list(model.predict(x)[0,i])
        probs = probs / np.sum(probs)
        index = np.random.choice(range(char_dim), p=probs)
        if i == max_char-2:
            character = '.'
            end = True
        else:
            character = index_to_char[index]
        name.append(character)
        x[0, i+1, index] = 1
        i += 1
        if character == '.':
            end = True
    
    print(''.join(name))

Now we want to use this function during the training to monitor how the generated names get better. To this end we create a function that will be given to the model when we fit it. We basically run the previous function a few times every 50 epochs and print the results.

In [41]:
def generate_name_loop(epoch, _):
    if epoch % 25 == 0:
        
        print('Names generated after epoch %d:' % epoch)

        for i in range(3):
            make_name(model)
        
        print()

This converts the function to be able to use it in keras.

In [42]:
name_generator = LambdaCallback(on_epoch_end = generate_name_loop)

We fit the model with the function and look at the results. It is clear that the names make more and more sense as we train.

In [69]:
model.fit(X, Y, batch_size=64, epochs=300, callbacks=[name_generator], verbose=0)

Names generated after epoch 0:
dxjaemprwpk.
zykhv.
uzvlzwbvisa.

Names generated after epoch 25:
.
nponre.
tiagy.

Names generated after epoch 50:
onteel.
ouf.
memowskros.

Names generated after epoch 75:
sarlite.
 ichhile.
.

Names generated after epoch 100:
heppodosn.
regentes.
hiternotr.

Names generated after epoch 125:
ugerli.
yunof.
mhanurs.

Names generated after epoch 150:
uflon.
howedile.
labet.

Names generated after epoch 175:
repig.
andurat.
randrul.

Names generated after epoch 200:
actreo.
imosease.
idriouno.

Names generated after epoch 225:
ilicinioaa.
arbok.
umphoos.

Names generated after epoch 250:
elekrdsss.
nislea.
hoon.

Names generated after epoch 275:
urcono.
iggyy.
louk.



<keras.callbacks.History at 0x239431999e8>

We can no use the final trained model to generate names as we want.

In [70]:
for i in range(20):
    make_name(model)

 om.
yanmog.
 oto.
entor.
weinole.
 om.
oosgiss.
angorstr.
oscalish.
utterg.
pine.
lickio.
incono.
enege.
cameruee.
inccinon.
unteri.
weidoe.
areacita.
rielu.


It could be possible to make the network better by changing the hyperparameters (number of units in LSTM layer, parameters of Adam optimization, adding extra LSTM layer) but I am actually pretty satisfied with the results that I get so I will leave it at that. One thing to notice is that there are some cases where the model seems to have overfitted and we recover known names or names very close to known ones.