# Deep Dive into LitGPT

## What you will learn in this course 🧐🧐

In the previous lecture, we covered a way to fine-tune a model without spending much time on explaining the underlying technology involved in the process. Let's fix that in this lecture and dive deeper into Pytorch Lightning and especially LitGPT.

## What is LitGPT

LitGPT is a library that is part of the [LightningAI](https://lightning.ai/docs/overview/getting-started) suite which are open source tools based on Pytorch to build, scale and deploy large models, especially LLMs. 

LitGPT has been specifically made to train and finetune LLMs. 

## Demo setup 

To dive deeper into LitGPT, we will need to use GPUs. If you don't have a GPUs on your local machine, we definitely advise you to pull up a **LightningAI Studio**. To do so:

1. (If not done already): Create an account on [LightningAI](https://lightning.ai/)
2. Go to your Dashboard and click on "New Studio" 
3. Once your Studio is up, simply switch your hardware to A10 GPUs 

<Note type="note">

If you want to use VSCode locally, you can SSH into your LigthningAI studio. Follow the guideline here:

* [Connect to Local IDE](https://lightning.ai/docs/overview/studios/connect-local-ide)

</Note>

Once you are done, open a new terminal on VSCode when you need to run command:

In [None]:
# Install the following library 
# Don't use % if you are already on the terminal
%pip install "litgpt[all]"

In [2]:
# Restart your kernel and test if everything works
# Don't use ! if you are in the terminal (not in Jupyter Notebook)
!litgpt --help

usage: litgpt [-h] [--config CONFIG] [--print_config[=flags]]
              {download,chat,finetune,finetune_lora,finetune_full,finetune_adapter,finetune_adapter_v2,pretrain,generate,generate_full,generate_adapter,generate_adapter_v2,generate_sequentially,generate_tp,convert_to_litgpt,convert_from_litgpt,convert_pretrained_checkpoint,merge_lora,evaluate,serve}
              ...

options:
  -h, --help            Show this help message and exit.
  --config CONFIG       Path to a configuration file.
  --print_config[=flags]
                        Print the configuration after applying all other
                        arguments and exit. The optional flags customizes the
                        output and are one or more keywords separated by
                        comma. The supported flags are: comments,
                        skip_default, skip_null.

subcommands:
  For more details of each subcommand, add it as an argument followed by
  --help.

  Available subcommands:
    downloa

If you see the output above, it means that LitGPT is correctly installed. Now reading from the output above will already help learn about `litgpt`'s capacity. In this course, we will focus on the main components: 


* Pretraining : Training from scratch an LLM like Llama. So weights are set at random but the model architecture remains the same
* Fine-tuning : Optimizing a model's answer based on a custom dataset. This would be equivalent of Transfer Learning that you saw earlier in the program

## Download model (with weights)

If you want to fine-tune a model, you will to have one at your disposal. LitGPT includes the most popular open-source LLMs available.

In [None]:
!litgpt download list

Please specify --repo_id <repo_id>. Available values:
allenai/OLMo-1B-hf
allenai/OLMo-7B-hf
allenai/OLMo-7B-Instruct-hf
BSC-LT/salamandra-2b
BSC-LT/salamandra-2b-instruct
BSC-LT/salamandra-7b
BSC-LT/salamandra-7b-instruct
codellama/CodeLlama-13b-hf
codellama/CodeLlama-13b-Instruct-hf
codellama/CodeLlama-13b-Python-hf
codellama/CodeLlama-34b-hf
codellama/CodeLlama-34b-Instruct-hf
codellama/CodeLlama-34b-Python-hf
codellama/CodeLlama-70b-hf
codellama/CodeLlama-70b-Instruct-hf
codellama/CodeLlama-70b-Python-hf
codellama/CodeLlama-7b-hf
codellama/CodeLlama-7b-Instruct-hf
codellama/CodeLlama-7b-Python-hf
deepseek-ai/DeepSeek-R1-Distill-Llama-70B
deepseek-ai/DeepSeek-R1-Distill-Llama-8B
EleutherAI/pythia-1.4b
EleutherAI/pythia-1.4b-deduped
EleutherAI/pythia-12b
EleutherAI/pythia-12b-deduped
EleutherAI/pythia-14m
EleutherAI/pythia-160m
EleutherAI/pythia-160m-deduped
EleutherAI/pythia-1b
EleutherAI/pythia-1b-deduped
EleutherAI/pythia-2.8b
EleutherAI/pythia-2.8b-deduped
EleutherAI/pythia-31m
El

In [3]:
# List all the downloadable models
# Don't use ! if you are in the terminal (not in Jupyter Notebook)
!litgpt download list

Please specify --repo_id <repo_id>. Available values:
codellama/CodeLlama-13b-hf
codellama/CodeLlama-13b-Instruct-hf
codellama/CodeLlama-13b-Python-hf
codellama/CodeLlama-34b-hf
codellama/CodeLlama-34b-Instruct-hf
codellama/CodeLlama-34b-Python-hf
codellama/CodeLlama-70b-hf
codellama/CodeLlama-70b-Instruct-hf
codellama/CodeLlama-70b-Python-hf
codellama/CodeLlama-7b-hf
codellama/CodeLlama-7b-Instruct-hf
codellama/CodeLlama-7b-Python-hf
databricks/dolly-v2-12b
databricks/dolly-v2-3b
databricks/dolly-v2-7b
EleutherAI/pythia-1.4b
EleutherAI/pythia-1.4b-deduped
EleutherAI/pythia-12b
EleutherAI/pythia-12b-deduped
EleutherAI/pythia-14m
EleutherAI/pythia-160m
EleutherAI/pythia-160m-deduped
EleutherAI/pythia-1b
EleutherAI/pythia-1b-deduped
EleutherAI/pythia-2.8b
EleutherAI/pythia-2.8b-deduped
EleutherAI/pythia-31m
EleutherAI/pythia-410m
EleutherAI/pythia-410m-deduped
EleutherAI/pythia-6.9b
EleutherAI/pythia-6.9b-deduped
EleutherAI/pythia-70m
EleutherAI/pythia-70m-deduped
garage-bAInd/Camel-Plat

Let's download LLama-3.2-1B which is relatively lightweight but still pretty performant. 

<Note type="important">

If you are following along and want to try other models, keep in mind that the more parameters you select the heavier the model will be not just in terms of storage but also in terms of memory (RAM). Don't use too large models for your local computer otherwise it will freeze. 

</Note>

In [9]:
# Download TinyLLama (a small copy of LLama)
# Don't use ! if you are in the terminal (not in Jupyter Notebook)
!litgpt download meta-llama/Llama-3.2-1B --access_token <read_token>

Setting HF_HUB_ENABLE_HF_TRANSFER=1
config.json: 100%|█████████████████████████████| 843/843 [00:00<00:00, 8.42MB/s]
generation_config.json: 100%|██████████████████| 185/185 [00:00<00:00, 2.35MB/s]
model.safetensors: 100%|███████████████████▉| 2.47G/2.47G [00:04<00:00, 527MB/s]
tokenizer.json: 100%|██████████████████████| 9.09M/9.09M [00:00<00:00, 13.5MB/s]
tokenizer_config.json: 100%|███████████████| 50.5k/50.5k [00:00<00:00, 90.4MB/s]
Converting checkpoint files to LitGPT format.
{'checkpoint_dir': PosixPath('checkpoints/meta-llama/Llama-3.2-1B'),
 'debug_mode': False,
 'dtype': None,
 'model_name': None}
Loading weights: model.safetensors: 100%|███████████████| 00:39<00:00,  2.54it/s
Saving converted checkpoint to checkpoints/meta-llama/Llama-3.2-1B


<Note type="important" title="If you want to use LLama official models">

If you want to use real LLama model from Meta, you will need to:

1. Go to HuggingFace and [accept Meta's term of use](https://huggingface.co/meta-llama/Llama-3.2-1B)
2. Wait for the approval from Meta 
3. Then provide your HuggingFace token as parameter of your bash command like this:

```bash
!litgpt download --repo_id meta-llama/Llama-3.2-1B --access_token YOUR_HF_TOKEN
```

You can get your access token by clicking on your profile > ACCESS TOKENS

It should take about 30 minutes for you to get approved. 

</Note>


Once your command has been successfully executed, you should see a `checkpoint/NAME_OF_YOUR_MODEL` directory that should be at the same place as where you executed the command. 

Now you should be able to interact with your model the following way:

In [12]:
# RUN THIS COMMAND IN A STANDALONE TERMINAL NOT IN JUPYTER 
# otherwise you won't be able to interact 😉
# Don't use ! if you are in the terminal (not in Jupyter Notebook)
!litgpt chat checkpoints/meta-llama/Llama-3.2-1B


{'access_token': None,
 'checkpoint_dir': PosixPath('checkpoints/meta-llama/Llama-3.2-1B'),
 'compile': False,
 'max_new_tokens': 50,
 'multiline': False,
 'precision': None,
 'quantize': None,
 'temperature': 0.8,
 'top_k': 50,
 'top_p': 1.0}
Now chatting with Llama-3.2-1B.
To exit, press 'Enter' on an empty prompt.

Seed set to 1234
>> Prompt: ^C

Memory used: 3.10 GB


## Fine-tuning 

Now if you want to fine-tune a model, you will need first to build a dataset that follows this structure:

```json 
[
    {
        "instruction": "THIS IS THE USER PROMPT",
        "input": "THIS IS ADDITIONAL CONTEXT FROM THE INSTRUCTION",
        "output": "THIS IS THE EXPECTED ANSWER"
    },
]
```

You can download the template files here:

* [train.json](https://full-stack-assets.s3.eu-west-3.amazonaws.com/train.json)
* [val.json](https://full-stack-assets.s3.eu-west-3.amazonaws.com/val.json)

Then you can simply run:

In [26]:
# Define the path of your model in the first line 
# 2nd line: Set the data format to JSON (for custom data)
# 3rd line: Path to the training data in JSONL format
# 4th line: Path for saving the fine-tuned model output
# 5th line: Define the Batch size 
# 6th line: Define the number of Epochs
# Don't use ! if you are in the terminal (not in Jupyter Notebook)
!litgpt finetune_lora checkpoints/meta-llama/Llama-3.2-1B \
--data JSON \
--data.json_path $(pwd)/data \
--out_dir results/fine-tuned-llama-3.2-1B \
--train.global_batch_size=32 \
--train.epochs=5

{'access_token': None,
 'checkpoint_dir': PosixPath('checkpoints/meta-llama/Llama-3.2-1B'),
 'data': JSON(json_path=PosixPath('/teamspace/studios/this_studio/data'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x7ff54c929270>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 1,
 'eval': EvalArgs(interval=100,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=True,
                  evaluate_example='first'),
 'logger_name': 'csv',
 'lora_alpha': 16,
 'lora_dropout': 0.05,
 'lora_head': False,
 'lora_key': False,
 'lora_mlp': False,
 'lora_projection': False,
 'lora_query': True,
 'lora_r': 8,
 'lora_value': True,
 'num_nodes': 1,
 'optimizer': 'AdamW',
 'out_dir': PosixPath('results/fine-tuned-llama-3.2-1B'),
 'precision': None,
 'quantize': None,
 'see

### Further fine-tune models

If you want to further train a model, you can actually use the same commands as before, the only thing that will change will be the directory where the new model is stored:

```bash
# 1st line: Change the path to the directory where the results are
litgpt finetune_lora results/fine-tuned-llama-3.2-1B/final \
--data JSON \
--data.json_path $(pwd)/00-Lectures/src \
--out_dir results/fine-tuned-llama-3.2-1B \
--train.global_batch_size=32 \
--train.epochs=5
```

If you need to further configure your model, I advise you to create a `config.yaml` file and then add it as a flag in your bash command:

```bash
litgpt results/fine-tuned-llama-3.2-1B/final/ \
--config config.yaml
```

You can use the following template for your `config.yaml`:

* [Configuration template](https://full-stack-assets.s3.eu-west-3.amazonaws.com/config.yaml)

<Note type="tip">

Keep the `results` folder somewhere. This is what you will use whenever you want to stage your model into production later on 😉

</Note>

## Pretraining

Now the only thing with these commands is that when you launch a job, you will see this output:

```bash 
Number of trainable parameters: 1,126,400
Number of non-trainable parameters: 1,100,048,384
```

This means that most of the model's weights won't be further trained. If that's what you want to do, you will need to use another method called `pretrain`. Let's see an example:

<Note type="important" type="disclaimer">

As of today, Pretraining (whether with Python or with the command-line) is pretty limited as custom dataset is not extremely well supported (see the code below for more info). This will be fixed over time. 

However, again I want to stress out the fact that Pretraining LLMs from scratch is very costly and therefore, you will see poor performance unless you have:

* A 1TB dataset to train on 
* A large model with large GPUs 


The example below is mainly to illustrate this point, and unless you will be working in tech company that works on foundational models, you most likely won't need to train your own LLM from scratch (and we definitely don't advise you to!)

</Note>

### Prepare dataset 

Before running a training job, we need to prepare a dataset. LitGPT accepts two type of data:

- `.txt` local files 
- [LitData](https://github.com/Lightning-AI/litdata): These are datasets optimized via LitData and hosted on an S3

For this course we will cover the first option. You can download the whole dataset here:

- [Star Wars Text Dataset](https://full-stack-assets.s3.eu-west-3.amazonaws.com/Text_files.zip)

Now all you have to do is to run:

```bash 
# 1st line: This is a limitation but you absolutely need to specify a "base" model from the litgpt pretrain list 
# 2nd line: Since we fine tuned a model based on tiny llama, let's try to further pretrain it without locking any weights
# 3nd line: Define the output directory
# 4th line: Define the type of data to receive a .txt files 
# 5th line: This define the path where the .txt files should be (here at ./Text_files)
# 6th line: Set a maximum number of tokens 
# 7th line: Give a maximum sequence length of tokens. AS OF TODAY there is a bug on the library and you need to add a number that is below 2048
# See more here: https://github.com/Lightning-AI/litgpt/issues/1450
litgpt pretrain meta-llama/Llama-3.2-1B \
   --initial_checkpoint_dir results/fine-tuned-tiny-llama/final \
   --tokenizer_dir results/fine-tuned-llama-3.2-1B/final \
   --out_dir ./new_pretrained_checkpoint \
   --data TextFiles \
   --data.train_data_path Text_files \
   --train.max_tokens 1_000_000 \
   --train.max_seq_length 1000
```

By default, the training job will run for 6 epochs. Once it is done, you can try your new model:

```bash 
litgpt chat new_pretrained_checkpoint/final
```

You might be a bit disappointed with the results as you might see something along the lines below:

```
him ! ! him very ... oh!l a h...!.! .!. him. oh!  !  !'.... !..! oh... it..! him  !
```

This is normal, you have two phenomenon coming into play:

1. The dataset is relatively small and there are a lot of `him` / `very` / `oh` tokens. So the model is blindingly repeating those words 
2. When training from scratch, there is a possibility of forgetting previous training epochs : the loss is becoming higher as we go through more epochs 

A solution would be to increase the size of the dataset and increase the number of epochs for the model to learn better the subtleties of the dataset. This is a great example to illustrate that continued learning is the costliest option when it comes to training LLMs 

<Note type="important">

If you want to follow along and run the above pretraining script, you will need to change your Studio to **L40S** GPUs as it requires a lot of RAMs to load the full model and start training it.

</Note>

## Python API

Playing on the terminal is great but you might want to have further control and use litgpt in an API. If that's the case, you should use LitGPT Python API

In [4]:
from litgpt import LLM

llm = LLM.load("checkpoints/meta-llama/Llama-3.2-1B") # You can also add the path of any of your checkpoints
text = llm.generate("What is the name of the most powerful Jedi in the galaxy?", top_k=1, max_new_tokens=300)


# For text generation in streaming 
for token in text:
    print(token, end="", flush=True)

 The answer is simple: Darth Vader. He is the most powerful Jedi in the galaxy, and he is the most powerful Sith in the galaxy. He is the most powerful Sith in the galaxy, and he is the most powerful Sith in the galaxy. He is the most powerful Sith in the galaxy, and he is the most powerful Sith in the galaxy. He is the most powerful Sith in the galaxy, and he is the most powerful Sith in the galaxy. He is the most powerful Sith in the galaxy, and he is the most powerful Sith in the galaxy. He is the most powerful Sith in the galaxy, and he is the most powerful Sith in the galaxy. He is the most powerful Sith in the galaxy, and he is the most powerful Sith in the galaxy. He is the most powerful Sith in the galaxy, and he is the most powerful Sith in the galaxy. He is the most powerful Sith in the galaxy, and he is the most powerful Sith in the galaxy. He is the most powerful Sith in the galaxy, and he is the most powerful Sith in the galaxy. He is the most powerful Sith in the galaxy, 

<Note type="note">

LitGPT technically also support pretraining in Python but it is pretty limited as of today as you won't be able to use custom dataset (documentation will show you how to, but the code won't work). However, if you are curious, feel free to checkout the documentation:

- [Python API for Pretraining jobs](https://github.com/Lightning-AI/litgpt/blob/main/tutorials/python-api.md)

</Note>

## Resources 📚📚

* [LightiningAI](https://lightning.ai/docs/overview/getting-started)
* [Connect to Local IDE](https://lightning.ai/docs/overview/studios/connect-local-ide)
* [LitData](https://github.com/Lightning-AI/litdata)
* [Python API](https://github.com/Lightning-AI/litgpt/blob/main/tutorials/python-api.md)