Skip to content

Ocram7/SimpleGPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

SimpleGPT banner

SimpleGPT

arXiv

This repository provides a reference implementation of SimpleGPT, a novel architecture that explores a unified normalization strategy for Transformer-based large language models.

Instead of treating normalization as a special operation inserted at specific locations (e.g., pre-norm, post-norm, or QK-norm), SimpleGPT adopts SimpleNorm, a unified design principle in which every affine (linear) transformation is immediately followed by normalization.

We show that this simplification improves training stability, enables larger learning rates, and yields competitive or improved performance compared to standard GPT architectures, without significant overhead. Please refer to our paper SimpleGPT: Improving GPT via a Simple Normalization Strategy for more details.

SimpleNorm and SimpleGPT Architecture

Definition of SimpleNorm
Definition of SimpleNorm

SimpleGPT architecture overview
SimpleGPT architecture diagram

Training loss on Llama-7B
Llama 7B run loss plots

Running the Code

Setup

Install dependencies

We use the Torchtitan codebase to parallelize training. To run the code, please install dependencies with the following commands or refer to instructions from the official Torchtitan repository.

git clone https://github.com/Ocram7/SimpleGPT
cd SimpleGPT/src
pip install -r requirements.txt
pip3 install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu129 --force-reinstall # this is for cuda version 12.9. Please choose a version depending on your cuda installation 
pip3 install --pre torchdata --index-url https://download.pytorch.org/whl/nightly

Download tokenizer

Our codebase currently implements variants of the Llama2 and Llama3 architecture. Please follow the instructions from the official meta-llama repository to ensure you have acces to the Llama model weights. Once you have confirmed access, you can run the following commands to download the Llama2 and Llama3 tokenizers to your local machine.

# Get your HF token from https://huggingface.co/settings/tokens

# llama3 tokenizer.model
python torchtitan/datasets/download_tokenizer.py --repo_id meta-llama/Meta-Llama-3-8B --tokenizer_path "original" --hf_token=...

# llama2 tokenizer.model
python torchtitan/datasets/download_tokenizer.py --repo_id meta-llama/Llama-2-13b-hf --hf_token=...

After downloading the tokenizers, please ensure that the model.tokenizer_path attribute in the toml configuration files points to your local tokenizer.

Download dataset

If you wish to train on the C4 dataset, please download it from https://huggingface.co/datasets/allenai/c4 and update the training.dataset_path attribute in the configuration files.

Please download the C4-mini dataset (download from Google Drive) for validation. After downloading it, move it under src/torchtitan/datasets/c4_mini. Note that in our paper, we do not report validation loss for our Llama-based experiments, and only train using C4.

Training SimpleGPT

To run the code on a single node, execute the run.sh bash script with the desired config file as the first parameter:

./run.sh "train_configs/llama2_1B/simplegpt.toml"

Config files

We include sample configuration files for variants of llama2_1B, llama2_7B, and llama3_8B in train_configs. For each model scale, we provide three configs: standard prenorm-only Llama (prenorm.toml), Llama with prenorm and qknorm (preqknorm.toml), and Llama-based SimpleGPT (simplegpt.toml).

Special configuration parameters

Parameter Type Description
model.name str Architecture variant (llama2, llama2_preqknorm, llama2_simplegpt, llama3, llama3_preqknorm, llama3_simplegpt)
model.tokenizer_path str Path to local tokenizer
training.dataset str Dataset name (c4, c4_mini)
training.dataset_path str Path to dataset

Note on hyperparameters:
The sample configuration files are provided for ease of use and debugging, and may differ slightly from the hyperparameters reported in the paper. These differences do not affect the conclusions of the work. Please refer to the paper for exact settings.

Changelog

  1. [2026-03-01]: Initial code released!

Acknowledgements

This codebase is heavily based on the implementations of Torchtitan and Adam-mini. We thank the authors of these projects for making their code publicly available.

Note. In our paper, we evaluate SimpleGPT on both Llama-based and nanoGPT-based architectures. This repository contains only the code used for the Llama-based experiments. The nanoGPT-based experiments were conducted using the original nanoGPT codebase, available at https://github.com/karpathy/nanoGPT.

Citation

If you find our work useful, please cite:

@article{chen2026simplegpt,
  title={SimpleGPT: Improving GPT via A Simple Normalization Strategy},
  author={Chen, Marco and Qi, Xianbiao and He, Yelin and Ye, Jiaquan and Xiao, Rong},
  journal={arXiv preprint arXiv:2602.01212},
  year={2026}
}

About

Improving GPT via a simple normalization strategy

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors