#  Train AITextGen models with GPU/TPU for Free 

by [Mohamed Rashad](https://github.com/MohamedAliRashad)

For more about `aitextgen`, you can visit [this GitHub repository](https://github.com/minimaxir/aitextgen).


## Install Dependencies

In [1]:
!pip3 install -q aitextgen
from aitextgen import aitextgen

## Get shakespeare dataset (optional)

In [2]:
!git clone https://github.com/ravexina/shakespeare-plays-dataset-scraper.git
!mv /content/shakespeare-plays-dataset-scraper/shakespeare-db/ /content/db/

fatal: destination path 'shakespeare-plays-dataset-scraper' already exists and is not an empty directory.
mv: cannot stat '/content/shakespeare-plays-dataset-scraper/shakespeare-db/': No such file or directory


## Upload dataset to train on

The data needs to be text in one file.

In [None]:
from google.colab import files
uploaded = files.upload()

## Enter configurations for training

In [3]:
training_file_path = "/content/db/Hamlet.txt" #@param {type:"string"}
gpt_model = '124M' #@param ["124M", "355M", "774M", "1558M"]

## Training Script

In [4]:
from aitextgen.TokenDataset import TokenDataset
from aitextgen.tokenizers import train_tokenizer
from aitextgen import aitextgen

# Train a custom BPE Tokenizer on the downloaded text
# This will save one file: `aitextgen.tokenizer.json`, which contains the
# information needed to rebuild the tokenizer.
train_tokenizer(training_file_path)
tokenizer_file = "aitextgen.tokenizer.json"

# Instantiate aitextgen using the created tokenizer and config
ai = aitextgen(tf_gpt2=gpt_model, tokenizer_file=tokenizer_file)

# You can build datasets for training by creating TokenDatasets,
# which automatically processes the dataset with the appropriate size.
data = TokenDataset(training_file_path, tokenizer_file=tokenizer_file, block_size=64)

# Train the model! It will save pytorch_model.bin periodically and after completion to the `trained_model` folder.
# On a 2020 8-core iMac, this took ~25 minutes to run.
ai.train(data, batch_size=64, num_steps=1000, generate_every=500, save_every=500)


HBox(children=(FloatProgress(value=0.0, layout=Layout(flex='2'), max=5403.0), HTML(value='')), layout=Layout(d…

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]





HBox(children=(FloatProgress(value=0.0, layout=Layout(flex='2'), max=1000.0), HTML(value='')), layout=Layout(d…

[1m500 steps reached: saving model to /trained_model[0m
[1m500 steps reached: generating sample texts.[0m
mwood, wormwood.
Player Queen
     Since fool no where but in's own house.
     Enter LAERTES and O, forced!
Ghost
     My hour is almost come, good mother.
     Since my dear Give houch dear lord, sir,--
HAMLET
HORATIO
     In my dear Ran's eyeild and mind's eye, tell us, Horatio,--
     He sword, Horatio,--
HORATIO
     I saw's wonder dear lord, som in the old theme.
     Enter Keep; and all you, seen, see this came be wond, twent, with you see the same before let's goodman defectain!
HAMLET
HAMLET
HAMLET
HORATIO
     Young For uses, thou art, sir.
HORATIO
     I saws
     I sawason!
HORATIO
HAMLET
HAMLET
HORATIO
     Mostrew's a harposen's a ha!
     Marry, sir, twit's goodman defe
[1m1,000 steps reached: saving model to /trained_model[0m
[1m1,000 steps reached: generating sample texts.[0m
ent piece of two brothers.
     Seee, what a grace was seated on this brow;
     H

## Load weights and Generate text

In [7]:
# With your trained model, you can reload the model at any time by
# providing the folder containing the pytorch_model.bin model weights + the config, and providing the tokenizer.
ai2 = aitextgen(model_folder="trained_model",
                tokenizer_file="aitextgen.tokenizer.json")

ai2.generate(1, prompt="Hamlet")

[1mHamlet[0ms are horse, when he meant to beg it; might it not?
HORATIO
     Ay, my lord.
HAMLET
     Why, e'en so: and now my Lady Worm's; chapless, and
     knocked about the mazzard with a sexton's spade:
     here's fashion, ans ans it ans mere, answer here in the man
     Let memory, I must wife' betwerewering you shall hangers
     Not a ford.
     Not a port
     are another, and pile
     are most in the fire of their addddddddesty would perughted gi would perce would perualents
     and pers
     Py, and their
     and sil, their
     Players
     Players
     their
     and their to sch
     and as fashion will fire
     fit, their in the fire
     fire


## Download trained weights

In [8]:
!zip -r /content/trained_weights.zip /content/trained_model

from google.colab import files
files.download("/content/trained_weights.zip")

  adding: content/trained_model/ (stored 0%)
  adding: content/trained_model/config.json (deflated 50%)
  adding: content/trained_model/pytorch_model.bin (deflated 9%)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>