### 1. Mount Google Colab with external storage (Google Drive in this case).

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')


### 2. Clone GitHub repository

We will be using the official implementation of the Sophia-G optimizer from the paper available at https://arxiv.org/abs/2305.14342 and the GPT-2 training scripts from the repository https://github.com/Liuhong99/Sophia. We need to clone their GitHub repository into your Google Drive.

 A big thank you to Liu Hong, Li Zhiyuan, Hall David, Liang Percy, and Ma Tengyu for their hard work on this project!


In [None]:
!git clone https://github.com/Liuhong99/Sophia

In [None]:
%cd <your_path>/Sophia


### 3. Install the required libraries.

In [None]:
!pip install torch torchvision torchaudio
!pip install transformers
!pip install datasets
!pip install tiktoken
!pip install wandb

###4. Download and tokenize the OpenWebText dataset.

If the tokenization process takes a significant amount of time, you can download pre-tokenized datasets from the following link: https://drive.google.com/drive/folders/1-hUTokBt_Y0tN3wTMWpe1g4FVK1QoB0c?usp=sharing.

In [None]:
!python data/openwebtext/prepare.py

### 5. Start training the GPT-2 model using the **Sophia** optimizer.

1. Using a larger batch size may result in a "Cuda out of memory" error on Google Colab.

2. At the `train_sophiag.py` file, please be aware that setting `dtype = 'bfloat16'` could lead to an error on Google Colab. To mitigate this, you can try using `dtype = 'float32'` instead.

In [None]:
!torchrun --standalone --nproc_per_node=1 train_sophiag.py config/train_gpt2_small_sophiag.py --batch_size=2 --gradient_accumulation_steps=8

### 6. Start training the GPT-2 model using the **Adam** optimizer.

1. Using a larger batch size may result in a "Cuda out of memory" error on Google Colab.

2. At the `train_adam.py` file, please be aware that setting `dtype = 'bfloat16'` could lead to an error on Google Colab. To mitigate this, you can try using `dtype = 'float32'` instead.

In [None]:
!torchrun --standalone --nproc_per_node=1 train_adam.py config/train_gpt2_small_adam.py --batch_size=2 --gradient_accumulation_steps=8

### Reference Result
You can find the reference result for the trained GPT-2 model with Sophia and Adam optimizer at the following link: https://api.wandb.ai/links/ilovedorayaki1998/sb5hhjbb.

Please note that this link leads to an external website, and its availability or content may be subject to change.