gpt

This is my first gpt, I adapted it from Andrej Karpathy's excellent video Let's build GPT: from scratch, in code, spelled out. I refactored to separate training from generating and also added configuration.

This is tested and works in both native Windows and WSL2 Ubuntu.

I have Windows, and an AMD Radeon 6800 XT, and I wanted to use it to train a gpt, but Cuda doesn't work with Radeon, and ROCm only works in Linux.

The solution is DirectML. So this code targets DirectML but it would be a small change to switch it to cuda, ROCm, or any other pytorch backend.

Setup

Install miniconda (miniconda specifically is required for DirectML support), then run the following commands:

conda env create
conda activate gpt

Training

First you generate model.pt by running the following command: (this takes 1.5 hours on my 6800xt - it might be faster or slower on your hardware)

python train.py

Generating

Then you can generate text by running the following command:

python generate.py

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
config.ini		config.ini
config.py		config.py
environment.yml		environment.yml
generate.py		generate.py
input.txt		input.txt
model.py		model.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

config.ini

config.ini

config.py

config.py

environment.yml

environment.yml

generate.py

generate.py

input.txt

input.txt

model.py

model.py

train.py

train.py

Repository files navigation

gpt

Setup

Training

Generating

About

Releases

Packages

Languages

ehartford/gpt

Folders and files

Latest commit

History

Repository files navigation

gpt

Setup

Training

Generating

About

Resources

Stars

Watchers

Forks

Languages