GPT from scratch

This repo contains code to train a GPT from scratch. The dataset is taken from the RedPajama 1 trillion data. Only samples from this are taken and used for the training purposes. The implementation of the transformer is similar to the LitGPT.

The trained model has a parameter count of about 160M. The final training loss was found to be 3.2154.

The Hugging Face implementation can be found here.

The training details can be found in the attached notebooks. The initial training was stopped when the loss was around 4.

Using the checkpoint, the second part of the training was resumed and stopped when it went below 3.5.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
tokenizer_config		tokenizer_config
tsai_gpt		tsai_gpt
Inference Examples.ipynb		Inference Examples.ipynb
README.md		README.md
Training Part 1.ipynb		Training Part 1.ipynb
Training Part 2.ipynb		Training Part 2.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokenizer_config

tokenizer_config

tsai_gpt

tsai_gpt

Inference Examples.ipynb

Inference Examples.ipynb

README.md

README.md

Training Part 1.ipynb

Training Part 1.ipynb

Training Part 2.ipynb

Training Part 2.ipynb

requirements.txt

requirements.txt

Repository files navigation

GPT from scratch

Sample Predictions

About

Releases

Packages

Languages

mkthoma/gpt_from_scratch

Folders and files

Latest commit

History

Repository files navigation

GPT from scratch

Sample Predictions

About

Resources

Stars

Watchers

Forks

Languages