<a href="https://colab.research.google.com/github/royam0820/HuggingFace/blob/main/amr_fine_tuning_with_axolotl_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook is based on the following blog post by Mlabonne https://mlabonne.github.io/blog/posts/A_Beginners_Guide_to_LLM_Finetuning.html


In [None]:
# installing the Github repository axolotl
!git clone https://github.com/OpenAccess-AI-Collective/axolotl
%cd axolotl

Cloning into 'axolotl'...
remote: Enumerating objects: 6975, done.[K
remote: Counting objects: 100% (75/75), done.[K
remote: Compressing objects: 100% (43/43), done.[K
remote: Total 6975 (delta 32), reused 63 (delta 29), pack-reused 6900[K
Receiving objects: 100% (6975/6975), 2.14 MiB | 28.83 MiB/s, done.
Resolving deltas: 100% (4452/4452), done.
/content/axolotl


# Setup

In [None]:
# !pip install -q -U transformers accelerate git+https://github.com/huggingface/peft.git
# !pip install -q datasets bitsandbytes einops wandb

In [None]:
!pip install packaging
#pip install -e '.[flash-attn,deepspeed]'
!pip install -e .'[flash-attn]'

Obtaining file:///content/axolotl
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting peft@ git+https://github.com/huggingface/peft.git (from axolotl==0.3.0)
  Cloning https://github.com/huggingface/peft.git to /tmp/pip-install-qparl251/peft_f1baf948af3d4f7691d9400f6d96a494
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/peft.git /tmp/pip-install-qparl251/peft_f1baf948af3d4f7691d9400f6d96a494
  Resolved https://github.com/huggingface/peft.git to commit 0c16918c347bf32ad9532823068441f5fb76197a
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting transformers@ git+https://github.com/huggingface/transformers.git@bd6205919aad4d3a2300a39a98a642f1cc3a5348 (from axolotl==0.3.0)
  Cloning https://github.com/huggingface/transformers.git (to revision bd6205919aad4d3a2300a39a98a642f1cc3a5348) to /tmp/pip-install-qparl251

In [None]:
!pip uninstall numpy

Found existing installation: numpy 1.25.2
Uninstalling numpy-1.25.2:
  Would remove:
    /usr/local/bin/f2py
    /usr/local/bin/f2py3
    /usr/local/bin/f2py3.10
    /usr/local/lib/python3.10/dist-packages/numpy-1.25.2.dist-info/*
    /usr/local/lib/python3.10/dist-packages/numpy.libs/libgfortran-040039e1.so.5.0.0
    /usr/local/lib/python3.10/dist-packages/numpy.libs/libopenblas64_p-r0-5007b62f.3.23.dev.so
    /usr/local/lib/python3.10/dist-packages/numpy.libs/libquadmath-96973f99.so.0.0.0
    /usr/local/lib/python3.10/dist-packages/numpy/*
Proceed (Y/n)? Y
  Successfully uninstalled numpy-1.25.2


In [None]:
!pip install numpy==1.24.3

Collecting numpy==1.24.3
  Downloading numpy-1.24.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.3/17.3 MB[0m [31m61.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: numpy
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
axolotl 0.3.0 requires numpy>=1.24.4, but you have numpy 1.24.3 which is incompatible.[0m[31m
[0mSuccessfully installed numpy-1.24.3


In [None]:
# logging to the HF hub to get access to the authentication token
from huggingface_hub import login
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Fine-tuning Llama model with Axolotl
The growing interest in Large Language Models (LLMs) has led to a surge in tools and wrappers designed to streamline their training process.

Popular options include FastChat from LMSYS (used to train Vicuna) and Hugging Face’s transformers/**trl** libraries (used in my last fine-tuning session).

For this fine-tuning step we are going to use a tool called **Axolotl**, which has been created by the **OpenAccess AI Collective**. We will use it to fine-tune a Code Llama 7b model on an evol-instruct dataset comprised of 1,000 samples of Python code.

## Why Axolotl

The main appeal of **Axolotl** is that it provides a one-stop solution, which includes numerous features, model architectures, and an active community. Here’s a quick list of my favorite things about it:

**Configuration**: All parameters used to train an LLM are neatly stored in a yaml config file. This makes it convenient for sharing and reproducing models. You can see an example for Llama 2 here.

**Dataset Flexibility**: Axolotl allows the specification of multiple datasets with varied prompt formats such as alpaca ({"instruction": "...", "input": "...", "output": "..."}), sharegpt:chat ({"conversations": [{"from": "...", "value": "..."}]}), and raw completion ({"text": "..."}). Combining datasets is seamless, and the hassle of unifying the prompt format is eliminated.

**Features**: Axolotl is packed with SOTA techniques such as FSDP, deepspeed, LoRA, QLoRA, ReLoRA, sample packing, GPTQ, FlashAttention, xformers, and rope scaling.

**Utilities**: There are numerous user-friendly utilities integrated, including the addition or alteration of special tokens, or a custom wandb configuration.

In [None]:
# getting the configuration file
%cd /content
!wget https://gist.githubusercontent.com/mlabonne/8055f6335e2b85f082c8c75561321a66/raw/e02351e171db5fc2fe3d55121cf2ef13d3717e1e/EvolCodeLlama-7b.yaml

/content
--2023-10-10 12:27:05--  https://gist.githubusercontent.com/mlabonne/8055f6335e2b85f082c8c75561321a66/raw/e02351e171db5fc2fe3d55121cf2ef13d3717e1e/EvolCodeLlama-7b.yaml
Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1250 (1.2K) [text/plain]
Saving to: ‘EvolCodeLlama-7b.yaml’


2023-10-10 12:27:05 (108 MB/s) - ‘EvolCodeLlama-7b.yaml’ saved [1250/1250]



NB: modify the yaml file to make sure that this value is updated: `tf32: true`.

Before we start training our model, I want to introduce a few parameters that are important to understand:

**QLoRA**: We’re using QLoRA for fine-tuning, which is why we’re loading the base model in 4-bit precision (NF4 format). You can check this article from Benjamin Marie to know more about QLoRA.

**Gradient checkpointing**: It lowers the VRAM requirements by removing some activations that are re-computed on demand during the backward pass. It also slows down training by about 20%, according to Hugging Face’s documentation.

**FlashAttention**: This implements the FlashAttentionmechanism, which improves the speed and memory efficiency of our model thanks to a clever fusion of GPU operations (learn more about it in this article from Aleksa Gordiç).

**Sample packing**: Smart way of creating batches with as little padding as possible, by reorganizing the order of the samples (bin packing problem). As a result, we need fewer batches to train the model on the same dataset. It was inspired by the Multipack Sampler (see my note) and Krell et al.

You can find FlashAttention in some other tools, but sample packing is relatively new. As far as I know, OpenChatwas the first project to use sample packing during fine-tuning. Thanks to Axolotl, we’ll use these techniques for free.

## Fine-tune Code Llama
Now that the config file is ready, we can launch the training.

In [None]:
#!accelerate launch /content/axolotl/scripts/finetune.py EvolCodeLlama-7b.yaml
!accelerate launch -m axolotl.cli.train /content/EvolCodeLlama-7b.yaml

The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `1`
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--dynamo_backend` was set to a value of `'no'`
                              dP            dP   dP 
                              88            88   88 
   .d8888b. dP.  .dP .d8888b. 88 .d8888b. d8888P 88 
   88'  `88  `8bd8'  88'  `88 88 88'  `88   88   88 
   88.  .88  .d88b.  88.  .88 88 88.  .88   88   88 
   `88888P8 dP'  `dP `88888P' dP `88888P'   dP   dP 
                                                    
                                                    

Downloading (…)lve/main/config.json: 100% 637/637 [00:00<00:00, 2.85MB/s]
[2023-10-10 12:28:01,345] [INFO] [axolotl.normalize_config:122] [PID:2766] [RANK:0] GPU memory usage baseline: 0.000GB (+0.439GB misc)[39m
Downloading (…)okenizer_config.json: 100% 749/749 [00:00<00:00, 4.17MB/s]
Download

# Inference

In [None]:
# amr
!accelerate launch -m axolotl.cli.inference /content/EvolCodeLlama-7b.yaml \
    --lora_model_dir="./lora-out"

The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `1`
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--dynamo_backend` was set to a value of `'no'`
                              dP            dP   dP 
                              88            88   88 
   .d8888b. dP.  .dP .d8888b. 88 .d8888b. d8888P 88 
   88'  `88  `8bd8'  88'  `88 88 88'  `88   88   88 
   88.  .88  .d88b.  88.  .88 88 88.  .88   88   88 
   `88888P8 dP'  `dP `88888P' dP `88888P'   dP   dP 
                                                    
                                                    

[2023-10-10 13:01:08,420] [INFO] [axolotl.normalize_config:122] [PID:11396] [RANK:0] GPU memory usage baseline: 0.000GB (+0.439GB misc)[39m
[2023-10-10 13:01:08,422] [INFO] [axolotl.common.cli.load_model_and_tokenizer:38] [PID:11396] [RANK:0] loading tokenizer... codellama/CodeLlama-7b-hf[39m


In [None]:
# testing the train dictionary
tst = dict(**trn[3])
tst['question'] = 'Get the count of competition hosts by theme.'
tst

In [None]:
# prompt format
fmt = """SYSTEM: Use the following contextual information to concisely answer the question.

USER: {}
===
{}
ASSISTANT:"""

In [None]:
# sql prompt information
def sql_prompt(d): return fmt.format(d["context"], d["question"])

In [None]:
# printing the result
print(sql_prompt(tst))

SYSTEM: Use the following contextual information to concisely answer the question.

USER: CREATE TABLE farm_competition (Hosts VARCHAR, Theme VARCHAR)
===
Get the count of competition hosts by theme.
ASSISTANT:
