# Animalese: An Animal Crossing Dialogue Generator
The Animal Crossing series of video games is one of the most popular in the world. One of the biggest draws of the games is their extensive dialogue: each character appears to have its own personality, despite them drawing from a script of limited text. However, long time players can sometimes become frustrated as eventually this dialogue becomes repetitive. Because of the large amount of available source material, this seemed like a great opportunity to apply neural networks to generate "new" dialogue from the games.

Neural networks can generate truly new sequences of text rather than procedurally generated, scripted variations of existing phrases. There are essentially unlimited future results based on the unlimited set of inputs you can provide a network. In this notebook, we will try to generate comprehensible English text in the style of Animal Crossing dialogue by using a RNN (recurrent neural network). These types of neural networks are used to predict sequential data, such as English text.

Our first step in creating a neural network is obtaining existing data to train it on. In this case, this is lines of dialogue from the Animal Crossing series. Because none of the scripts are publicly available, I instead use a web scraper to retrieve fan transcribed lines available online and process these pages of dialogue for use in model training. I then process this text for use in two different RNN models, build said models, then use both to generate new dialogue.

## Obtaining text through web scraping

Here are the packages that we will need to scrape the webpages that contain lines of dialogue and then save them for future use.

In [2]:
import requests
import re
import pandas as pd
from bs4 import BeautifulSoup

Here are links to the sites where we'll be downloading the dialogue from, the Animal Crossing fandom.com site and Nookipedia, an Animal Crossing wiki. Because the formatting of these pages can vary slightly, there are a few sites that are put in a second list that we will process slightly differently than the first.

In [3]:
url_list = ["https://animalcrossing.fandom.com/wiki/Guide:Cranky_dialogues_(New_Leaf)", 
            "https://animalcrossing.fandom.com/wiki/Guide:Peppy_dialogues_(New_Leaf)",
            "https://animalcrossing.fandom.com/wiki/Guide:Player_dialogues",
            "https://animalcrossing.fandom.com/wiki/Guide:Normal_dialogues_(New_Leaf)",
            "https://animalcrossing.fandom.com/wiki/Guide:Lazy_dialogues_(New_Leaf)", 
            "https://animalcrossing.fandom.com/wiki/Guide:Sisterly_dialogues_(New_Leaf)",
            "https://animalcrossing.fandom.com/wiki/Guide:Jock_dialogues_(New_Leaf)", 
            "https://animalcrossing.fandom.com/wiki/Guide:Smug_dialogues_(New_Leaf)",
            "https://animalcrossing.fandom.com/wiki/Guide:Snooty_dialogues_(New_Leaf)",
            "https://nookipedia.com/wiki/Cranky/New_Horizons_dialogue",
            "https://nookipedia.com/wiki/Lazy/Wild_World_dialogue",
            "https://nookipedia.com/wiki/Lazy/Pocket_Camp_dialogue"]

label_list = ["cranky","peppy","player","normal","lazy","uchi","jock","smug","snooty","cranky","lazy","lazy"]
#the format of these pages is slightly different, so they are their own list
p_urls = ["https://animalcrossing.fandom.com/wiki/Guide:Isabelle_dialogues","https://animalcrossing.fandom.com/wiki/Guide:Resetti_dialogues_(Animal_Crossing)",
"https://animalcrossing.fandom.com/wiki/Franklin_Dialogue_(GCN)", "https://animalcrossing.fandom.com/wiki/Jingle_Dialogue_(GCN)"]
p_labels = ["isabelle","resetti","franklin","jingle"]

We'll be storing the scraped data in two lists that can later be stored as a Pandas dataframe and exported as a csv file.

In [4]:
dialogue = []
labels = []

We will use this regular expression to help clean the text obtained from each webpage.

In [5]:
pattern = re.compile("\"([\S+\s]+)\"")

Here we'll scrape the first group of URLs where the dialogue is contained in li tags on each webpage. We'll get the page's html content using the requests package, then parse it using the BeautifulSoup4 package. After we parse the page's text content contained in the list item tags, we clean it using the regular expression above.

In [6]:
for j in range(len(url_list)):
    page = requests.get(url_list[j])
    soup = BeautifulSoup(page.content, 'html.parser')
    li = soup.find_all('li')
    for item in li:
        for i in item.children:
            if i.string != None:
                text = i.string
                #use regex to clean up the string
                clean = pattern.match(text)
                if clean != None:
                    dialogue.append(clean.group(1))
                    labels.append(label_list[j])
                    
print("Done scraping first set of URLs!")

Done scraping first set of URLs!


Now we will do the same with the second group of URLs where the dialogue is contained in paragraph tags.

In [7]:
for i in range(len(p_urls)):
    page = requests.get(p_urls[i])
    soup = BeautifulSoup(page.content, 'html.parser')
    p = soup.find_all('p')
    for paragraph in p:
        if paragraph.string != None:
            text = paragraph.string
            #use regex to clean up the string
            clean = pattern.match(text)
            if clean != None:
                dialogue.append(clean.group(1))
                labels.append(p_labels[i])

print("Done scraping second set of URLs!")

Done scraping second set of URLs!


Now we store everything in a Pandas DataFrame and export it as a .csv file for future use if desired. This also includes labels for the villager type of each dialogue line. This could be useful for classifying text or creating different text generators for each villager type in the future.

In [8]:
df_data = {'dialogue':dialogue, 'labels':labels}
dialogue_df = pd.DataFrame(df_data)
dialogue_df.to_csv('dialogue.csv', index=False)
print("Dialogue has been saved!")

Dialogue has been saved!


Let's check out what our dataframe looks like.

In [9]:
dialogue_df.head()

Unnamed: 0,dialogue,labels
0,"Yo, [player]! Whaddya want? [catchphrase]!",cranky
1,"Hey, hey, [player]! You got somethin' you wann...",cranky
2,"Yo, [player]! What're ya doin'? [catchphrase]?",cranky
3,"Whoa, easy now, [player]. Deep breaths... OK. ...",cranky
4,"Oh. Hey, [player]. Whaddya want from me? [Catc...",cranky


And let's see how many lines of dialogue we obtained in total.

In [10]:
dialogue_df.shape

(601, 2)

It looks like there are just over 600 lines of dialogue on these pages. While it would be ideal to have a greater amount of training data, we can use this as a starting point for generating our dialogue.

## Text Pre-Processing

Unlike humans, computers are not able to understand pure text data. In order to use the lines of dialogue that we obtained, we will need to do some kind of processing to the data. In this case, we will be creating a character based model, so we will process each line of dialogue into a list of character embeddings- a numerical representation of each character that can be used in neural networks.

Here are the packages we will need to process the text for our model.

In [11]:
import tensorflow as tf
import numpy as np
import os
import time

### Creating character embeddings

We will need to create an embedding for each character in our vocabulary. Let's see how many unique characters are contained in these lines of dialogue. Because words like [player] and [item], which represent a variable that can be replaced with the players' name or a specific item in the game respectively, are common in lines of villager dialogue, I chose not to remove symbols like '[' and ']' from our dataset.

In [12]:
full_text = ""

for phrase in dialogue_df['dialogue']:
    full_text = full_text + ' ' + phrase
    
full_text = full_text.strip()
vocab = sorted(set(full_text))
print(f'{len(vocab)} unique characters')
print(vocab)

78 unique characters
[' ', '!', '"', "'", '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', ':', '<', '>', '?', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'Y', 'Z', '[', ']', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'à', 'é', '—', '…', '✮']


In [13]:
len(full_text)

72034

We'll now split up our dialogue into its individual characters so we can encode them. TensorFlow contains a function to help us do just this, which I use below.

In [12]:
chars = tf.strings.unicode_split(full_text, input_encoding='UTF-8')

Here's an example of how this encoding looks for our first 10 characters of dialogue. The characters are stored in a 1D tensor object, similar to a default Python list. The 'b' before each character represents the fact the character is a Unicode binary representation of the character.

In [13]:
print(chars[0:10])

tf.Tensor([b'Y' b'o' b',' b' ' b'[' b'p' b'l' b'a' b'y' b'e'], shape=(10,), dtype=string)


Rather than directly processing this text, our model will be using integer ID's to represent each character. Below we create a StringLookup preprocessing layer that we then pass our dialogue strings to in order to encode them as integers. We also create another StringLookup layer that can take the integer ID's the model will output and translate them back to characters.

In addition, we can create a text_from_ids function that allows us to quickly pass in a list of ids and translate them into human readable text.

In [14]:
ids_from_chars = tf.keras.layers.StringLookup(
    vocabulary=list(vocab), mask_token=None)

chars_from_ids = tf.keras.layers.StringLookup(
    vocabulary=ids_from_chars.get_vocabulary(), invert=True, mask_token=None)

def text_from_ids(ids):
    return tf.strings.reduce_join(chars_from_ids(ids), axis=-1)

In [15]:
ids = ids_from_chars(chars)

In [16]:
ids_dataset = tf.data.Dataset.from_tensor_slices(ids)

### Creating sequences for training

When generating text, our model is actually trying to predict the best (most probable) possible character to come next based on a sequence of characters it has just seen. We need to provide a sequence of a reasonable length: one that is not so long that it is difficult for the model to remember the whole thing, but that is not so short that it will appear extremely frequently in widely varying contexts. In this case, since our pieces of dialogue are relatively short, we will use sequences of 50 characters.

We will break up the entire text into sequences, then make sure the model knows how many example sequences to expect each epoch of training after dividing the entire thing up into sequences of characters.

In [17]:
seq_length = 50
examples_per_epoch = len(full_text)//(seq_length+1)

In [18]:
sequences = ids_dataset.batch(seq_length+1, drop_remainder=True)

For each example, the model will predict the next character in the sequence. In addition to the input the model is predicting, we also need to show it the correct answer after it has made its prediction, a target seqence. In this case, our target is simply the input text shifted one character to the right.

![](https://www.tensorflow.org/text/tutorials/images/text_generation_sampling.png)

Here we'll turn the raw 50 character sequences from our dataset into both input and target sequences.

In [19]:
def split_input_target(sequence):
    input_text = sequence[:-1]
    target_text = sequence[1:]
    return input_text, target_text

In [42]:
dataset_seq = sequences.map(split_input_target)

In [21]:
for input_example, target_example in dataset_seq.take(1):
    print("Input :", text_from_ids(input_example).numpy())
    print("Target:", text_from_ids(target_example).numpy())

Input : b'Yo, [player]! Whaddya want? [catchphrase]! Hey, he'
Target: b'o, [player]! Whaddya want? [catchphrase]! Hey, hey'


Here we'll finish preparing the data by creating a TensorFlow dataset object that will allow us to fetch samples for our model more efficiently.

In [22]:
# How many samples we'll see in each batch
BATCH_SIZE = 64
# How many samples to keep in the memory buffer - we can't fit all of them!
BUFFER_SIZE = 10000

dataset = (
    dataset_seq
    .shuffle(BUFFER_SIZE)
    .batch(BATCH_SIZE, drop_remainder=True)
    .prefetch(tf.data.experimental.AUTOTUNE)) #This lets us prepare upcoming samples in advance

#Here we'll check the shape of the input data
print(dataset)

<PrefetchDataset shapes: ((64, 50), (64, 50)), types: (tf.int64, tf.int64)>


# Building our models

To generate potential dialogue snippets, we will be using two different types of RNN cells that are connected to a dense output layer: GRUs and LSTM cells. A RNN (recurrent neural network) is a type of network that is used on sequential data, such as stock market performance data, language, and even music.

In addition to generating an output that can be passed to another layer of the network, a recurrent layer also passes its current state forward at each step. GRUs and LSTM cells are types of recurrent layers that also allow for some information to be forgotten or retained for later use by the network. I'll go over the important differences between the two as we train each model.

To minimize training time in this toy example, each will contain only a single layer of RNN cells.

In [23]:
# Here is some basic information we will need for fitting both models
# Length of the vocabulary in chars
vocab_size = len(ids_from_chars.get_vocabulary())

# The embedding dimension - this determines how large the vector space is for our character embeddings
embedding_dim = 256

# Number of RNN units
rnn_units = 1024

## Our First Model: LSTM

LSTM stands for **"Long-Short Term Memory"** and is a type of RNN layer that allows the network to forget past information and selectively omit or include information depending on the situation. 

![A Long-Short Term Memory cell](https://d2l.ai/_images/lstm-0.svg)

Because it applies functions to new inputs, outputs of the previous step, and determines what information to retain at what step without any human decision making, this is a fairly computationally intensive type of neural network. Hopefully the quality of predictions will make up for the high compute costs! Let's start building one and see.

In [24]:
lstm_model = tf.keras.Sequential()
lstm_model.add(tf.keras.layers.Embedding(vocab_size, embedding_dim))
lstm_model.add(tf.keras.layers.LSTM(rnn_units,
                                return_sequences=True,
                                return_state=False))
lstm_model.add(tf.keras.layers.Dense(vocab_size))
lstm_model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, None, 256)         20224     
                                                                 
 lstm (LSTM)                 (None, None, 1024)        5246976   
                                                                 
 dense (Dense)               (None, None, 79)          80975     
                                                                 
Total params: 5,348,175
Trainable params: 5,348,175
Non-trainable params: 0
_________________________________________________________________


This model contains 3 layers: an embedding layer (to turn our integer embeddings into information the neural network can use), our LSTM cells, and a dense layer which serves as our output. We'll check to make sure that it is outputting a prediction of the correct shape before compiling our model.

In [25]:
for input_example_batch, target_example_batch in dataset.take(1):
    example_batch_predictions = lstm_model(input_example_batch)
    print(example_batch_predictions.shape, "lstm_model: (batch_size, sequence_length, vocab_size)")

(64, 50, 79) lstm_model: (batch_size, sequence_length, vocab_size)


Before using the model, we need to make sure to compile it. This means assigning it loss function (a value it is trying to minimize) and an optimizer (a function the model will use to help find the best way to minimize the loss). In this case, we will use the ADAM optimizer to try to minimize the Sparse Categorical Cross-Entropy, which is a criteria based on how well our model predicts each class (character in our vocabulary). 

In [26]:
loss = tf.losses.SparseCategoricalCrossentropy(from_logits=True)

In [27]:
lstm_model.compile(optimizer='adam', loss=loss, run_eagerly=True)

Now that our model has been compiled, we will train it.

In [28]:
EPOCHS = 50 #You can increase this number for better quality predictions (with longer training)

#see if I can add loading onto the GPU?

In [29]:
lstm_history = lstm_model.fit(dataset, epochs=EPOCHS) #Fitting the model

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


Now that the model has been trained, we can use it to generate new dialogue. Here we'll create a separate model class that will allow us to predict a character at a time. We'll concatenate a series of generated characters and reverse their embeddings in order to have human readable dialogue.

In [30]:
class OneStep(tf.keras.Model):
  def __init__(self, model, chars_from_ids, ids_from_chars, temperature=1.0):
    super().__init__()
    self.temperature = temperature
    self.model = model
    self.chars_from_ids = chars_from_ids
    self.ids_from_chars = ids_from_chars

    # Create a mask to prevent "[UNK]" from being generated.
    skip_ids = self.ids_from_chars(['[UNK]'])[:, None]
    sparse_mask = tf.SparseTensor(
        # Put a -inf at each bad index.
        values=[-float('inf')]*len(skip_ids),
        indices=skip_ids,
        # Match the shape to the vocabulary
        dense_shape=[len(ids_from_chars.get_vocabulary())])
    self.prediction_mask = tf.sparse.to_dense(sparse_mask)

  @tf.function
  def generate_one_step(self, inputs, states=None):
    # Convert strings to token IDs.
    input_chars = tf.strings.unicode_split(inputs, 'UTF-8')
    input_ids = self.ids_from_chars(input_chars).to_tensor()

    # Run the model.
    # predicted_logits.shape is [batch, char, next_char_logits]
    predicted_logits = self.model(inputs=input_ids)
    # Only use the last prediction.
    predicted_logits = predicted_logits[:, -1, :]
    predicted_logits = predicted_logits/self.temperature
    # Apply the prediction mask: prevent "[UNK]" from being generated.
    predicted_logits = predicted_logits + self.prediction_mask

    # Sample the output logits to generate token IDs.
    predicted_ids = tf.random.categorical(predicted_logits, num_samples=1)
    predicted_ids = tf.squeeze(predicted_ids, axis=-1)

    # Convert from token ids to characters
    predicted_chars = self.chars_from_ids(predicted_ids)

    # Return the characters and model state.
    return predicted_chars

In [31]:
one_step_lstm = OneStep(lstm_model, chars_from_ids, ids_from_chars, temperature=0.5) 
# You can adjust the temperature if the desired (anywhere between 0 and 1).
# Here I chose a lower temperature which will result in more conservative model predictions.

Finally, let's see what kind of dialogue we can generate!

In [33]:
start = time.time()
next_char = tf.constant(['Wow']) #Set this string to a starting word or phrase for your dialogue
result = [next_char]

for n in range(200):
    next_char = one_step_lstm.generate_one_step(next_char, states=None)
    result.append(next_char)

result = tf.strings.join(result)
end = time.time()
print(result[0].numpy().decode('utf-8'), '\n\n' + '_'*80)
print('\nRun time:', end - start)

Wow! thtong at'm Thit'e ayou y? to at are t ind che ily? Chr y a te y shit it stoumer theave witint tom yotindanthellondathind wayor t ast ind yor hathpin it Whe thplanor I t t hithithin'sondond an wat t 

________________________________________________________________________________

Run time: 1.4359660148620605


Hmm... it's a little difficult to tell what exactly this model was trying to say here (it certainly doesn't look like any English I've seen). Let's see if another model will produce any better results.

## A Lighter Model: Using GRUs

In addition to our LSTM based model, we will also try creating a model with a layer of GRUs (Gated Recurrent Units) instead. Rather than seperately calculating how much of the input to use, how much to forget, and how much to send to the output of the next cell, a GRU weights the input of the previous step and a newly provided input ("reset" and "update" gates).

![An image of a Gated Recurrent Unit](https://d2l.ai/_images/gru-1.svg)

We will use the Model superclass in Keras to create our own new model class used for this second neural network. Like the previous network, it uses an embedding layer for the input and a dense layer to the output. Instead of a recurrent LSTM layer, this time we use a layer of GRUs.

In [35]:
class GRUModel(tf.keras.Model):
    def __init__(self, vocab_size, embedding_dim, rnn_units):
        super().__init__(self)
        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
        self.gru = tf.keras.layers.GRU(rnn_units,
                                        return_sequences=True,
                                        return_state=True)
        self.dense = tf.keras.layers.Dense(vocab_size)

    def call(self, inputs, states=None, return_state=False, training=False):
        x = inputs
        x = self.embedding(x, training=training)
        if states is None:
            states = self.gru.get_initial_state(x)
        x, states = self.gru(x, initial_state=states, training=training)
        x = self.dense(x, training=training)

        if return_state:
            return x, states
        else:
            return x

In [38]:
gru_model = GRUModel(
    # Be sure the vocabulary size matches the `StringLookup` layers.
    vocab_size=len(ids_from_chars.get_vocabulary()),
    embedding_dim=embedding_dim,
    rnn_units=rnn_units)

In [43]:
dataset2 = (
    dataset_seq
    .shuffle(BUFFER_SIZE)
    .batch(BATCH_SIZE, drop_remainder=True)
    .prefetch(tf.data.experimental.AUTOTUNE))

Let's double check the size of our outputs.

In [45]:
for input_example_batch, target_example_batch in dataset2.take(1):
    gru_example_batch_predictions = gru_model(input_example_batch)
    print(gru_example_batch_predictions.shape, "gru_model: (batch_size, sequence_length, vocab_size)")

(64, 50, 79) gru_model: (batch_size, sequence_length, vocab_size)


And we'll compile this model using the same optimizer and loss function as we did in the LSTM model.

In [46]:
gru_model.compile(optimizer='adam', loss=loss)

As before, we'll fit the model and create a OneStep method that will allow us to generate characters from the model one at a time.

In [47]:
history2 = gru_model.fit(dataset2, epochs=EPOCHS)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [48]:
gru_one_step = OneStep(gru_model, chars_from_ids, ids_from_chars, temperature=0.5)

Let's see what we can generate using this model!

In [49]:
start = time.time()
next_char = tf.constant(['Wow']) #Set this string to a starting word or phrase for your dialogue
result = [next_char]

for n in range(200):
    next_char = gru_one_step.generate_one_step(next_char, states=None)
    result.append(next_char)

result = tf.strings.join(result)
end = time.time()
print(result[0].numpy().decode('utf-8'), '\n\n' + '_'*80)
print('\nRun time:', end - start)

Wow, at thit'se bu ou thitofule t dand thit t me t and ast That m ing thathitouthidout I haserar oume t bu ton athise ar wit t ithe t wanoullatout t at thin'se thit'tid thind win'st t athe t thesthis m t 

________________________________________________________________________________

Run time: 1.789717435836792


This model didn't produce comprehensable English dialogue either... but this isn't that surprising.

While it may be difficult to tell, this model did work as intended. It generates English characters from our vocabulary in a sequence in ways with observable patterns. Unfortunately, these patterns don't quite resemble actual English! Why not?

Well... the answer to that is complicated. It is quite time consuming to look inside of each layer and cell of a neural network and understand which variables are affecting what outputs- remember that both of the models above have over 5 million parameters each! There are a few reasons that may be why the networks are not producing the desired output: 
- **The set of training data could be too small.** Our full text is around 72,000 characters, made into approximately 1400 fifty character sequences. One of the best current human language models, GPT-3, used over 260 billion tokens for its training. More unique training examples would surely improve the performance of these models.
- **Our models could need more training to be effective.** Models like GPT-3 and BERT (another highly performing language model) take multiple days to have their parameters sufficiently refined by training. Our models could both be trained in a matter of hours on a CPU (much slower than using parallel processing or even a single GPU). Additional rounds of training (epochs) may help the performance of our model.
- **Our network structure could be suboptimal.** Each of our models consistents of an embedding layer, a single recurrent layer, and a dense output layer. They each only have around 5.5 million parameters that go into the function which determines which character to predict next in the sequence. While this may sound like a large number, more effective recent general language models contain not millions or even billions but *trillions* of adjustable parameters. Using additional layers in our network would allow for additional trainable parameters and potentially better predictions (dialogue generation).

With these considerations in mind, what would be the best next step to generate dialogue that better resembles English and the style of Animal Crossing?

One of the easiest steps to take would be to train the model for a longer amount of time, although its performance would be limited by a small amount of training data. It also could be trained on word embeddings rather than character embeddings. Since the training data contains a limited vocabulary, this is a reasonable way to preprocess our data and would lead to more usage of real words.

Some more difficut to implement but likely more effective steps to create a better model would be to use a deeper, more complex model architecture, although this would be limitedly effective without more training data, or to utilize an existing model like GPT-Neo (an open-source version of GPT-3) or BERT. These models often have settings that can be tweaked for particular tasks, allowing users to leverage their powerful capabilities while customizing the flavor of the output.

I plan to try some of these strategies, including obtaining additional training data, using word embeddings, and changing our model architecture, in a future notebook. Stay tuned!