# Pet Names (Python)

This trains a neural network with Keras that generates pet names. It saves the neural to a file at the end, to use in a function in a separate notebook.

First, load up the appropriate TensorFlow and other libraries.

In [1]:
import numpy as np
import io
import itertools
import pandas as pd
import random
import string

from tensorflow import keras
from tensorflow.keras import layers

Next, create some lookups and variables. The character list is the set of characters to use in the names. The lookup is that converted into a dictionary of integers for encoding.

In [2]:
character_list = list(string.ascii_lowercase) + [".","-"," ","+"]
character_lookup = dict(zip(character_list, range(len(character_list))))
max_length = 10
num_characters = len(character_lookup)

This loads the data, converts it to lower case, and removes any pet that doesn't have a name or species (or the name has invalid characters).

In [5]:
def load_pet_data():
    """Load the pet data
    This loads the pet data from the csv file and cleans it appropriately.
    It removes extra columns, rows with malformed names or breeds, and makes the names
    lowercase.
    """
    pet_data = pd.read_csv("seattle_pet_licenses.csv", dtype = {"Animal's Name": str, 'Species': str,'Primary Breed': str,
                                                        'Secondary Breed': str},
                                                        usecols=["Animal's Name",'Species',
                                                        'Primary Breed','Secondary Breed'])
    
    pet_data = pet_data.rename({"Name":"name",
                     "Species": "species",
                     "Primary Breed": "primary_breed",
                     "Secondary Breed": "secondary_breed"})
    pet_data = pet_data.dropna(subset=['name', 'species'])

    for column in pet_data.columns:
        pet_data[column] = pet_data[column].str.lower()

    pet_data = pet_data[pet_data["name"].str.match("^[ \\.a-z-]+$")]

    return pet_data

pet_data = load_pet_data()

      Animal's Name Species         Primary Breed           Secondary Breed
0       Tinkerdelle     Cat    Domestic Shorthair                       NaN
1            Pepper     Cat                  Manx                       Mix
2          Grey Fox     Cat               Siamese                       Mix
3            Hannah     Cat     Domestic Longhair                       NaN
4             Daisy     Cat    Domestic Shorthair                       NaN
...             ...     ...                   ...                       ...
55956    Carly Rose     Dog   Retriever, Labrador                       NaN
55957       Cricket     Dog     Poodle, Miniature  Spaniel, American Cocker
55958     Caledonia     Dog  Bernese Mountain Dog                       NaN
55959        Ziggie     Dog    Miniature Pinscher                       NaN
55960          Nala     Dog               Kai Ken                       NaN

[55961 rows x 4 columns]


KeyError: ['name', 'species']

This code below:

1. Has a function that converts the DataFrame into a list of objects, where each object has some metadata plus the list of subsequences (partials of the name string. So like "spot", has ["s", "sp", "spo", "spot", "spot+"])
2. Grabs the subsequences, which then are a list of lists, and flattens them into a single list.
3. Shuffles said list.

In [17]:
def make_subsequences(name):
    characters = name + '+'
    subsequences = [list(characters[0:(i+1)]) for i in range(len(characters))]
    return subsequences

# Change this to adding a column to pandas instead of a dict of lists
def get_all_subsequences(pet_data):
    subsequences = pet_data["name"].map(make_subsequences)
    return subsequences

pet_data['subsequences'] = get_all_subsequences(pet_data)

Then we convert the characters to ints using the lookup, pad them, and one hot encode them.

In [21]:
def characters_to_matrix(character_data):
    character_data = [[character_lookup[chr] for chr in c] for c in character_data]
    padded_character_data = keras.utils.pad_sequences(character_data, maxlen = max_length+1)
    text_matrix = keras.utils.to_categorical(padded_character_data, num_classes = num_characters)
    return text_matrix

# Use explode here, especially with multiple columns
# random shuffle seed

character_data = list(itertools.chain.from_iterable(pet_data["subsequences"]))
random.shuffle(character_data)
text_matrix = characters_to_matrix(character_data)

x_name = text_matrix[:,range(max_length),:]
y_name = text_matrix[:,max_length,:]

Now we start the neural network part. Below is the model architecture definition.

In [58]:
model = keras.Sequential(
    [
        keras.Input(shape=(max_length, num_characters)),
        layers.LSTM(32, return_sequences = True),
        layers.LSTM(32),
        layers.Dropout(0.2),
        layers.Dense(num_characters, activation="softmax"),
    ]
)
optimizer = keras.optimizers.RMSprop(learning_rate=0.01)
model.compile(loss="categorical_crossentropy", optimizer=optimizer)

This is training the model.

In [62]:
model.fit(x_name, y_name, batch_size = 64, epochs = 25)

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.callbacks.History at 0x2b732a340d0>

Last we save the model.

In [60]:
model.save("model.h5")