# **Basic Training: Fine-Tuning an LLM with Axolotl**

Follow along for a basic run down of running your first fine-tune with Axolotl!

## **Resources Used:**
- [Axolotl GitHub Repo](https://github.com/OpenAccess-AI-Collective/axolotl) 
- [Hamel's Blog](https://hamel.dev/blog/posts/axolotl/) (Debugging)
- [Axolotl Docs](https://openaccess-ai-collective.github.io/axolotl/docs/debugging.html#general-tips) (Debugging)
- [jarvislabs Axolotl QuickStart](https://jarvislabs.ai/templates/axolotl) 

## **General Workflow:**
- Select our Framework (Axolotl)
- Pick a model to use in the examples folder on Axolotl's Github repository
- Pick the dataset you want the model to train on
- Create a GPU instance with enough storage for training
- Launch the GPU instance in the Cloud
- Clone the repo, cd into it, create a venv
- Download dependencies
- Authentication command: some models in HuggingFace require this
- Run a preprocessing dataset command for the model
- Run a training command
- Run an inference command
- Debug

**Note**: If you're a beginner, much of the vernacular will be foregin to you as it was to me. it's best to not worry about details (difference between lora/qlora, how to evaluate models and datasets, etc.) for the first few iterations. Once you get a feel, come back and visit the notes section or embedding links in these blogs to get some context.

## **Start**
### **1. Selecting a Framework** 

I'll start by running my first fine tuning job with Axolotl. The Mastering LLM's conference bundle came with tons of compute (GPU-power) from various GPU providers, so I arbitrarily picked [jarvislabs.ai](https://jarvislabs.ai/). It's time to fine-tune!

Once inside JarvisLabs (after making an account, etc.), I clicked on Axolotl when given the option to select a framework for training. Then hit "run on cloud". Note that you may have to pay for compute on Jarvis if you don't already own some for training.

### **2 & 3. Pick a Dataset and Model**
Let's start by visiting the - [Axolotl GitHub Repo](https://github.com/OpenAccess-AI-Collective/axolotl) and navigating to the examples folder. This contains tons of models that have been run on Axolotl with various configuration (config) files listed in each model that have been used to run those models. Pick a model (try tiny-llama, for example, as it's better to start with smaller models for debugging purposes). Inside, you can see various configs. Click on lora.yml. Notice all of the parameters, like data paths, tokenization settings, and any specific preprocessing steps needed for the dataset. Yaml files specify model architecture, learning rates, batch sizes, and more.

Examine the dataset in the yaml file. This example configuration lists the [alpaca_2k_test](https://huggingface.co/datasets/mhenrichsen/alpaca_2k_test) as the dataset used for training. Let's stick with this.

### **4 & 5. Pick a GPU and Launch**

Toggle back to jarvis. It's time to pick a GPU to train our model. This dictates how much CPU, RAM, and VRAM will be available to you when training. I picked the 1 x RTX6000Ada. Now, skip the rest and launch it! Select VSCode or Jupyter (whichever is more comfortable) to open the instance in the cloud.

### **6 & 7. Clone Axolotl and Create a Venv**

First, clone the Axolotl repository and navigate inside it. Then, create your virtual environment, where we will begin to install dependencies. These package installs can easily cause dependency issues when running later commands, so it's important to keep track of which dependencies you install. 

```sh
git clone https://github.com/OpenAccess-AI-Collective/axolotl
cd axolotl
```

```sh
python3 -m venv axolotl-env
source axoltol-env/bin/activate
```

```sh
pip3 install packaging
pip3 install -e '.[flash-attn,deepspeed]'
```

### **8. Authenticate to HF (if necessary)**

Sometimes, you need to accept permissions to use a dataset on HuggingFace. Login below and find your HuggingFace token at [HF Token](https://huggingface.co/settings/tokens)

```sh
pip install huggingface_hub
huggingface-cli login

### **9, 10, 11. Preprocess, Train, and Inference the Model**

These are where the money gets made! You can use any model in the following commands, but we'll use examples/tiny-llama/lora.yml.

**Preprocessing Datasets (optional):**

```python
python -m axolotl.cli.preprocess examples/tiny-llama/lora.yml.
```

Runs the preprocess module from the axolotl.cli package on the config file specifying the dataset and preprocessing options. If you want to specify particular GPUs, set CUDA_VISIBLE_DEVICES to the IDs of those GPUs (e.g., 0 for the first GPU). If you want to use all available GPU's, don't include the command. If you set it as an empty string, it will only use the CPU (CUDA_VISIBLE_DEVICES="")

Preprocess reads the yml file to understand how the dataset should be processed, and then tokenizes the raw text data into a format suitable for the model. Tokenizers sometimes are specific to the model itself. Here, any data augmentation specified in yml are applied to the dataset The preprocessed dataset is saved in an a ready-to-go format for the training step.

**Fine Tune the Model:**

```python
accelerate launch -m axolotl.cli.train examples/tiny-llama/lora.yml.
```

This accelerate launch command launches the training process using the accelerate library, which handles distributed training and optimizes usage of available hardware resources.

-m specifies the module to run (train mod from axolotl.cli). We then pass the tiny llama model for training. 

Here's a quick rundown of what is happening during the training loop:
- Forward Pass: Data is passed through the model to compute predictions.
- Loss Calculation: The difference between the predictions and the actual labels (targets) is computed as the loss.
- Backward Pass: Gradients are computed using backpropagation to update the model weights.
- Optimization Step: The optimizer adjusts the model weights based on the computed gradients.
- Checkpointing: Periodically, the model state is saved (checkpoints) to allow for resuming training or for future inference.

**Inference:**

```python
accelerate launch -m axolotl.cli.inference examples/tiny-llama/lora.yml. \
    --lora_model_dir="./outputs/lora-out" --gradio
```

We pass our model as an argument to the inference module from the axolotl.cli package.

Option: 
```python
--lora_model_dir="./outputs/lora-out"
```
Explanation: This specifies the directory where the fine-tuned weights are stored. The inference process will load the model from this directory.

Under the hood overview:
- Loading Model: The fine-tuned model, including the LoRA-specific weights, is loaded from the specified directory.
- Inference Loop: The model processes the input data to generate predictions. This involves a forward pass and post processing.
- Forward Pass: Input data is passed through the model to generate predictions.
- Post-Processing: The raw output from the model is post-processed to convert it into a human-readable or application-specific format.
- The predictions are outputted

### **12. Debugging**

While you'll want to change parameters in your yaml file in certain iterations to learn, sometimes, you'll want to change certain parameters to debug. Use these links for common ways to debug your yaml file and other tips:
- [Hamel's Blog](https://hamel.dev/blog/posts/axolotl/) (Debugging)
- [Axolotl Docs](https://openaccess-ai-collective.github.io/axolotl/docs/debugging.html#general-tips) (Debugging)

Aside from that, here are some parameters I've changed throughout various runs that have saved me from errors:
- use_reentrant: flip to False
- flash_attention: flip to False