Skip to content

ehartford/gpt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gpt

This is my first gpt, I adapted it from Andrej Karpathy's excellent video Let's build GPT: from scratch, in code, spelled out. I refactored to separate training from generating and also added configuration.

This is tested and works in both native Windows and WSL2 Ubuntu.

I have Windows, and an AMD Radeon 6800 XT, and I wanted to use it to train a gpt, but Cuda doesn't work with Radeon, and ROCm only works in Linux.

The solution is DirectML. So this code targets DirectML but it would be a small change to switch it to cuda, ROCm, or any other pytorch backend.

Setup

Install miniconda (miniconda specifically is required for DirectML support), then run the following commands:

conda env create
conda activate gpt

Training

First you generate model.pt by running the following command: (this takes 1.5 hours on my 6800xt - it might be faster or slower on your hardware)

python train.py

Generating

Then you can generate text by running the following command:

python generate.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages