Skip to content

mkthoma/gpt_from_scratch

Repository files navigation

GPT from scratch

This repo contains code to train a GPT from scratch. The dataset is taken from the RedPajama 1 trillion data. Only samples from this are taken and used for the training purposes. The implementation of the transformer is similar to the LitGPT.

The trained model has a parameter count of about 160M. The final training loss was found to be 3.2154.

The Hugging Face implementation can be found here.

image

The training details can be found in the attached notebooks. The initial training was stopped when the loss was around 4.

image

Using the checkpoint, the second part of the training was resumed and stopped when it went below 3.5.

image

Sample Predictions

image

image

image

image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published