# Finetuning using Axolotl and Python

This notebook is an minimal example of how to finetune a LLM using Axolotl and Python. Axolotl is a CLI tool that uses config files for different methods of LLM finetuning. We created a Python wrapper around the CLI for the end-to-end workflow for this process.

In the example below, we show how you can define or load finetuning configurations, start an finetuning job, and push it to Hugging Face.

## Setup

Make sure to run this notebook in a system with enough compute resources to run the finetuning, and follow the setup instructions in the README to install axolotl and related libraries.

Let's start with loading the code components we need. The `FinetuneConfig` class holds configurations, and the `Finetune` class is used to create and run a finetuning job.

In [3]:
import sys
import os

sys.path.insert(0, os.path.join(os.getcwd(), '..'))

from finetune import Finetune, FinetuneConfig

Also make sure to login to Hugging Face Hub to save the output model.

In [2]:
from huggingface_hub import login
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Configuration
We then load up a config to perform QLoRA finetuning of Llama 2 from in a config file stored locally. Optionally, we can assign a new field `hub_model_id`, indicating the Hugging Face model the finetuned LLM will be pushed to.

In [6]:
import yaml

# Specify the path to your YAML file
file_path = os.path.join(os.getcwd(), '..', 'finetune/examples/llama-2/qlora.yml')

# Open the file and load the data
with open(file_path, encoding='utf-8') as file:
    config_dict = yaml.safe_load(file)  # Load the existing data

config_dict['hub_model_id'] = 'vijil/my_lora_tune'  # Add or update the model_id to push the trained model
config_dict['eval_sample_packing'] = False

In [7]:
config_dict

{'base_model': 'NousResearch/Llama-2-7b-hf',
 'model_type': 'LlamaForCausalLM',
 'tokenizer_type': 'LlamaTokenizer',
 'load_in_8bit': False,
 'load_in_4bit': True,
 'strict': False,
 'datasets': [{'path': 'mhenrichsen/alpaca_2k_test', 'type': 'alpaca'}],
 'dataset_prepared_path': None,
 'val_set_size': 0.05,
 'output_dir': './qlora-out',
 'adapter': 'qlora',
 'lora_model_dir': None,
 'sequence_len': 4096,
 'sample_packing': True,
 'pad_to_sequence_len': True,
 'lora_r': 32,
 'lora_alpha': 16,
 'lora_dropout': 0.05,
 'lora_target_modules': None,
 'lora_target_linear': True,
 'lora_fan_in_fan_out': None,
 'wandb_project': None,
 'wandb_entity': None,
 'wandb_watch': None,
 'wandb_name': None,
 'wandb_log_model': None,
 'gradient_accumulation_steps': 4,
 'micro_batch_size': 2,
 'num_epochs': 4,
 'optimizer': 'paged_adamw_32bit',
 'lr_scheduler': 'cosine',
 'learning_rate': 0.0002,
 'train_on_inputs': False,
 'group_by_length': False,
 'bf16': 'auto',
 'fp16': None,
 'tf32': False,
 'gradi

Let's now load the config dict in the `FinetuneConfig` object.

In [8]:
# see all config options in './finetune/examples/config.qmd'
config = FinetuneConfig(config_dict)

[2024-05-03 18:16:02,711] [DEBUG] [axolotl.normalize_config:79] [PID:18933] [RANK:0] bf16 support detected, enabling for this configuration.[39m




[2024-05-03 18:16:02,835] [INFO] [axolotl.normalize_config:182] [PID:18933] [RANK:0] GPU memory usage baseline: 0.000GB (+0.423GB misc)[39m


## Run the finetuning job
Now simply load up the config into a `FineTune` object and kick off the job.

In [9]:
# create a finetune object with the config and run
finetune = Finetune(config)

In [None]:
finetune.run() # start train

[INFO] Job ID: 385293fa-0979-11ef-b980-024398e8947b
[2024-05-03 18:16:11,511] [DEBUG] [axolotl.load_tokenizer:279] [PID:18933] [RANK:0] EOS: 2 / </s>[39m
[2024-05-03 18:16:11,512] [DEBUG] [axolotl.load_tokenizer:280] [PID:18933] [RANK:0] BOS: 1 / <s>[39m
[2024-05-03 18:16:11,512] [DEBUG] [axolotl.load_tokenizer:281] [PID:18933] [RANK:0] PAD: 0 / <unk>[39m
[2024-05-03 18:16:11,513] [DEBUG] [axolotl.load_tokenizer:282] [PID:18933] [RANK:0] UNK: 0 / <unk>[39m
[2024-05-03 18:16:11,513] [INFO] [axolotl.load_tokenizer:293] [PID:18933] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.[39m
[2024-05-03 18:16:11,514] [INFO] [axolotl.load_tokenized_prepared_datasets:183] [PID:18933] [RANK:0] Unable to find prepared dataset in last_run_prepared/5c09ab14f30fc60ead5860bcbbb2e263[39m
[2024-05-03 18:16:11,514] [INFO] [axolotl.load_tokenized_prepared_datasets:184] [PID:18933] [RANK:0] Loading raw datasets...[39m
[2024-05-03 18:16:11,515] [INFO] [axolotl.lo

Saving the dataset (0/1 shards):   0%|          | 0/2000 [00:00<?, ? examples/s]

[2024-05-03 18:16:18,626] [DEBUG] [axolotl.log:61] [PID:18933] [RANK:0] total_num_tokens: 414_041[39m
[2024-05-03 18:16:18,666] [DEBUG] [axolotl.log:61] [PID:18933] [RANK:0] `total_supervised_tokens: 294_246`[39m
[2024-05-03 18:16:22,644] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:18933] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 414041[39m
[2024-05-03 18:16:22,644] [DEBUG] [axolotl.log:61] [PID:18933] [RANK:0] data_loader_len: 6[39m
[2024-05-03 18:16:22,645] [INFO] [axolotl.log:61] [PID:18933] [RANK:0] sample_packing_eff_est across ranks: [0.9719637357271634][39m
[2024-05-03 18:16:22,645] [DEBUG] [axolotl.log:61] [PID:18933] [RANK:0] sample_packing_eff_est: 0.98[39m
[2024-05-03 18:16:22,646] [DEBUG] [axolotl.log:61] [PID:18933] [RANK:0] total_num_steps: 24[39m
[2024-05-03 18:16:22,656] [DEBUG] [axolotl.train.log:61] [PID:18933] [RANK:0] loading tokenizer... NousResearch/Llama-2-7b-hf[39m




[2024-05-03 18:16:22,908] [DEBUG] [axolotl.load_tokenizer:279] [PID:18933] [RANK:0] EOS: 2 / </s>[39m
[2024-05-03 18:16:22,909] [DEBUG] [axolotl.load_tokenizer:280] [PID:18933] [RANK:0] BOS: 1 / <s>[39m
[2024-05-03 18:16:22,909] [DEBUG] [axolotl.load_tokenizer:281] [PID:18933] [RANK:0] PAD: 0 / <unk>[39m
[2024-05-03 18:16:22,910] [DEBUG] [axolotl.load_tokenizer:282] [PID:18933] [RANK:0] UNK: 0 / <unk>[39m
[2024-05-03 18:16:22,910] [INFO] [axolotl.load_tokenizer:293] [PID:18933] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.[39m
[2024-05-03 18:16:22,911] [DEBUG] [axolotl.train.log:61] [PID:18933] [RANK:0] loading model and peft_config...[39m
[2024-05-03 18:16:23,278] [INFO] [axolotl.load_model:359] [PID:18933] [RANK:0] patching with flash attention for sample packing[39m
[2024-05-03 18:16:23,279] [INFO] [axolotl.load_model:408] [PID:18933] [RANK:0] patching _expand_mask[39m
[2024-05-03 18:16:23,604] [INFO] [accelerate.utils.modeling.get

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



[2024-05-03 18:18:03,505] [INFO] [axolotl.load_model:720] [PID:18933] [RANK:0] GPU memory usage after model load: 3.710GB (+0.255GB cache, +0.709GB misc)[39m
[2024-05-03 18:18:03,524] [INFO] [axolotl.load_model:771] [PID:18933] [RANK:0] converting PEFT model w/ prepare_model_for_kbit_training[39m
[2024-05-03 18:18:03,528] [INFO] [axolotl.load_model:780] [PID:18933] [RANK:0] converting modules to torch.bfloat16 for flash attention[39m
[2024-05-03 18:18:03,533] [INFO] [axolotl.load_lora:924] [PID:18933] [RANK:0] found linear modules: ['gate_proj', 'o_proj', 'q_proj', 'up_proj', 'k_proj', 'v_proj', 'down_proj'][39m
trainable params: 79,953,920 || all params: 6,818,369,536 || trainable%: 1.172625208678628
[2024-05-03 18:18:04,304] [INFO] [axolotl.load_model:825] [PID:18933] [RANK:0] GPU memory usage after adapters: 3.859GB (+1.239GB cache, +0.709GB misc)[39m


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


[2024-05-03 18:18:04,483] [INFO] [axolotl.train.log:61] [PID:18933] [RANK:0] Pre-saving adapter config to ./qlora-out[39m
[2024-05-03 18:18:04,487] [INFO] [axolotl.train.log:61] [PID:18933] [RANK:0] Starting trainer...[39m
[2024-05-03 18:18:04,739] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:18933] [RANK:0] packing_efficiency_estimate: 0.98 total_num_tokens per device: 414041[39m
[2024-05-03 18:18:04,741] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:18933] [RANK:0] packing_efficiency_estimate: 0.98 total_num_tokens per device: 414041[39m
[2024-05-03 18:18:04,795] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:18933] [RANK:0] packing_efficiency_estimate: 0.98 total_num_tokens per device: 414041[39m




Step,Training Loss,Validation Loss
1,1.1521,1.170463
3,1.0962,1.166554
6,1.0649,1.107039
9,0.9359,0.985908
12,0.9264,0.948929
15,0.9515,0.936418
18,0.9272,0.921776
21,0.8475,0.910943
24,0.973,0.903616
27,0.8718,0.89928


[2024-05-03 18:20:43,476] [INFO] [accelerate.accelerator.log:61] [PID:18933] The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.
[2024-05-03 18:21:16,142] [INFO] [axolotl.callbacks.on_step_end:125] [PID:18933] [RANK:0] GPU memory usage while training: 3.875GB (+7.346GB cache, +1.358GB misc)[39m
[2024-05-03 18:23:53,372] [INFO] [accelerate.accelerator.log:61] [PID:18933] The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.
[2024-05-03 18:27:35,881] [INFO] [accelerate.accelerator.log:61] [PID:18933] The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.
[2024-05-03 18:31:18,414] [INFO] [accelerate.accelerator.log:61] [PID:18933] The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.
[2024-05-03 18:35:00,929] [INFO] [accelerate.accelerator.log:61] [PID:18933] The used dataset had no length, returning gathe



[2024-05-03 18:35:36,859] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:18933] [RANK:0] packing_efficiency_estimate: 0.98 total_num_tokens per device: 414041[39m
[2024-05-03 18:38:46,701] [INFO] [accelerate.accelerator.log:61] [PID:18933] The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.
[2024-05-03 18:42:29,216] [INFO] [accelerate.accelerator.log:61] [PID:18933] The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.
[2024-05-03 18:46:11,786] [INFO] [accelerate.accelerator.log:61] [PID:18933] The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.
[2024-05-03 18:49:54,324] [INFO] [accelerate.accelerator.log:61] [PID:18933] The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.




[2024-05-03 18:50:57,994] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:18933] [RANK:0] packing_efficiency_estimate: 0.98 total_num_tokens per device: 414041[39m
[2024-05-03 18:53:35,211] [INFO] [accelerate.accelerator.log:61] [PID:18933] The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.
[2024-05-03 18:57:17,625] [INFO] [accelerate.accelerator.log:61] [PID:18933] The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.
[2024-05-03 19:01:00,064] [INFO] [accelerate.accelerator.log:61] [PID:18933] The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.
[2024-05-03 19:04:42,526] [INFO] [accelerate.accelerator.log:61] [PID:18933] The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.




[2024-05-03 19:08:27,541] [INFO] [accelerate.accelerator.log:61] [PID:18933] The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.
[2024-05-03 19:08:27,645] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:18933] [RANK:0] packing_efficiency_estimate: 0.98 total_num_tokens per device: 414041[39m
[2024-05-03 19:12:10,016] [INFO] [accelerate.accelerator.log:61] [PID:18933] The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.
[2024-05-03 19:15:52,487] [INFO] [accelerate.accelerator.log:61] [PID:18933] The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.
[2024-05-03 19:19:34,980] [INFO] [accelerate.accelerator.log:61] [PID:18933] The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.




[2024-05-03 19:19:44,329] [INFO] [axolotl.train.log:61] [PID:18933] [RANK:0] Training Completed!!! Saving pre-trained model to ./qlora-out[39m




adapter_model.bin:   0%|          | 0.00/160M [00:00<?, ?B/s]

## Status
To check the job status, please use `finetune_axolotl-status.ipynb`.