This repository is a simple and clean GPT implementation in TensorFlow.
- Python 3.9.16
- TensorFlow 2.12.0
- TensorFlow Text 2.12.1
- TensorFlow Datasets 4.9.2
- KerasNLP 0.5.2
- Datasets 2.13.0
The model is trained by default on the OpenWebText dataset. Use --model_dir=<model_dir>
to specify the model directory name.
python train.py --model_dir=<model_dir>
Some other options:
- The
model.py
functions are compiled with XLA. To disable XLA, setjit_compile=False
.
Use --model_dir=<model_dir>
and --context=<context>
to specify the model directory name and context.
python generate.py --model_dir=<model_dir> --context=<context>
To download and try pretrained GPT-Mini, run demo.ipynb
. If you want to fine-tune GPT-Mini using the pretrained weights, you will need to modify the code in the demo.ipynb
notebook or create a new notebook specifically for fine-tuning.
Adjust hyperparameters in the config.py
file.
Run tensorboard --logdir ./
.
- Improving language understanding by generative pre-training
- Language models are unsupervised multitask learners
- Language models are few-shot learners
- minGPT
Implementation notes:
- The
model.py
functions are compiled with XLA
MIT