Skip to content

kdbanman/gpt2-dro

Repository files navigation

gpt2-dro

This repository attempts to replicate the GPT-2 training process using a Conditional Value-at-Risk Distributionally Robust Optimization (CVaR DRO) optimality target, rather than the common Expected Risk Minimization (ERM) optimality target.

This is achieved with a performant minibatch ɑ-CVaR DRO implementation.

Since the GPT-2 training dataset (WebTex) is proprietary to OpenAI, the OpenWebText dataset is used instead.

The primary file is gpt2-openwebtext-dro.py, containing data loading, training, and logging code.

  • HuggingFace is used for model checkpoints.

    For example, these publicly accessible models trained from this repo. ⍺ refers to the uncertainty set size for ⍺-CVaR DRO

    kdbanman__Kirby_Banman_

  • WandB is used for training logs.

    For example, these monitoring ~60 and ~200 hour training runs for the models above. Again, ⍺ refers to the uncertainty set size for ⍺-CVaR DRO

    gpt2-dro_Workspace_–_Weights___Biases

The training is designed for data parallelism across multiple large* GPUs within a single node using the HuggingFace Accelerate library. For example,

accelerate launch gpt2-openwebtext-dro.py config-0.8.json

which runs the experiment configuration in config-0.8.json. ^This command is exactly what the launch-0.8.sh file does headlessly directly on a compute node, and what the slurm_launch.sh file does from a login node in a slurm cluster.


* "large" GPUs as of early 2023 - the models and gradients fit fine in ~20GB VRAM with modest batch sizes (less than 100 samples)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published