This project shows how to host a handful of pre-trained text generating models on a node server. The Github repository with information on these models and how they were trained can be found here.
Three different approaches were used in creating these text-generating models.
The first approach is based on the LSTM text generation model presented in this tensorflowjs example, which generates one character at a time. I will call this the char-based model. Because these text-generating models were intended for a younger audience, we found it favorable to apply some spelling correction to the generated text and some word filtering. (Todo, improve the word filtering).
The second approach uses a very similar LSTM model which generates entire words at a time instead of character by character. I will call this model the word-based model. With the word-based model, we would not need to run a spell check on the output, since the model would only choose from words which it has seen before, ideally spelled correctly in the corpus. Additionally, we hoped that the speed of this model would be faster than the char-based model. Although this was the case, the speed increase was not as dramatic as we would have liked.
The third and final approach was to stray away from LSTM models, and apply an Unsmoothed Maximum Likelihood Character Level Language Model. I will call this model the max-likelihood model. In short, this model looks at the entire corpus and tracks the probability a certain character follows the given phrase. For more detail, I recommend looking at the notebook linked above. This model is incredibly fast and will never give a spelling mistake. Syntactical and grammatical errors remain a problem though.
By default, this project will run instances of the char-based models trained on a number of corpuses in the index.js file. If you would like to run a node server which will generate text using the word-based model or the max-likelihood model, run node word-based-index.js
or node max-likelihood-index.js
instead. The instructions for querying the models is given in the section below.
Before you can run a node server with either the char-based or word-based models, install the tfjs-node package with npm install @tensorflow/tfjs-node
. To run to project, run the command node index.js
.
Before you can run a node server with the max-likelihood model, install the python shell package with npm install python-shell
.
When querying the node server, you can provide various parameters such as the model you want to use, the seed text you want to provide, and the length you would like the generated text to be (in characters). The only parameter that is required is the seed text, which must meet or exceed a length of 40 characters.
You can then specify the model you would like to load by adding model=
and adding the name of the model. Names of models currently included can be found in the variable modelFileNames
in the index.js file.
For the char-based and word-based models, the structure for the models is <name>_<num epochs>
Corpus | Char-Based | Word-Based | Max-Likelihood |
---|---|---|---|
alice-in-wonderland.txt | aliceInWonderland_0 aliceInWonderland_1 aliceInWonderalnd_5 aliceInWonderland_20 |
(WIP) aliceInWonderland_0 aliceinWonderland_25 aliceinWonderland_100 aliceinWonderland_500 |
aliceInWonderland |
drseuss.txt | drSeuss_0 drSeuss_1 drSeuss_5 drSeuss_20 |
(WIP) drSeuss_0 drSeuss_25 drSeuss_100 drSeuss_500 |
drSeuss |
harry-potter-1.txt | harryPotter_0 harryPotter_1 harryPotter_5 harryPotter_20 |
(WIP) harryPotter_0 harryPotter_25 harryPotter_100 harryPotter_500 |
harryPotter |
nancy-drew.txt | nancy_0 nancy_1 nancy_5 nancy_20 |
(WIP) nancy_0 nancy_25 nancy_100 nancy_500 |
nancy |
narnia-1.txt | narnia_1_0 narnia_1_1 narnia_1_5 narnia_1_20 |
narnia_0 narnia_25 narnia_100 narnia_500 |
narnia |
tomsawyer.txt | tomSawyer_0 tomSawyer_1 tomSawyer_5 tomSawyer_20 |
(WIP) tomSawyer_0 tomSawyer_25 tomSawyer_100 tomSawyer_500 |
tomSawyer |
wizard-of-oz.txt | wizardOfOz_0 wizardOfOz_1 wizardOfOz_5 wizardOfOz_20 |
(WIP) wizardOfOz_0 wizardOfOz_25 wizardOfOz_100 wizardOfOz_500 |
wizardOfOz |
In addition, the max-likelihood model has the models hamlet
, hungerGames
, and shakespeare
which you can use to generate text.
When querying the node server, the only required input is the seed text, inputText=
. The only requirement for the seed text is that, when using the char-based model, the seed text needs to be at least 40 characters long. The other word-based and max-ikelihood models, the seed text just needs to be at least one character long.
You can also specify the number of characters which you want the resulting string to be. Simply add outputLength=
followed by the number of characters. The default values are:
Model Architecture | Default Output Length |
---|---|
Char-Based | 40 characters |
Word-Based | 40 characters |
Max-Likelihood | 40 words |
This setting applies only to the char-based model. You can deactivate the spellchecker for the model by setting spellcheck=0
in the query. By default, the model's output gets checked by a spell checker and takes the first suggestion for the correction.
path-to-the-node-server:1234/?inputText=hello%20there%20how%20are%20you%20today%20because%20I%20am%20doing%20pretty%20well&model=drSeuss_20&outputLength=10
To simply test if the server is online and responding properly, the server checks to see if inputText='test'
. If it does, it will return "The test returned correctly" for all of the node servers.
path-to-the-node-server:1234/?inputText=test&model=narnia_100
Overview from Original Source
This example illustrates how to use TensorFlow.js to train a LSTM model to generate random text based on the patterns in a text corpus such as Nietzsche's writing or the source code of TensorFlow.js itself.
The LSTM model operates at the character level. It takes a tensor of
shape [numExamples, sampleLen, charSetSize]
as the input. The input is a
one-hot encoding of sequences of sampleLen
characters. The characters
belong to a set of charSetSize
unique characters. With the input, the model
outputs a tensor of shape [numExamples, charSetSize]
, which represents the
model's predicted probabilites of the character that follows the input sequence.
The application then draws a random sample based on the predicted
probabilities to get the next character. Once the next character is obtained,
its one-hot encoding is concatenated with the previous input sequence to form
the input for the next time step. This process is repeated in order to generate
a character sequence of a given length. The randomness (diversity) is controlled
by a temperature parameter.
The UI allows creation of models consisting of a single LSTM layer or multiple, stacked LSTM layers.
This example also illustrates how to save a trained model in the browser's IndexedDB using TensorFlow.js's model saving API, so that the result of the training may persist across browser sessions. Once a previously-trained model is loaded from the IndexedDB, it can be used in text generation and/or further training.
This example is inspired by the LSTM text generation example from Keras: https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py
yarn && node index.js