# SFT - Instruction tuning Aguila 7B model with Axolotl

This notebook is based on the [official notebook](https://github.com/OpenAccess-AI-Collective/axolotl#egg=axolotl) for running Axolotl on google colab by OpenAccess-AI-Collective to tune a Falcon model. And also in the excellent work by [Maxime Labonne](https://github.com/mlabonne/llm-course/blob/main/Fine_tune_LLMs_with_Axolotl.ipynb)


<a href="https://colab.research.google.com/github/edumunozsala/llama-2-7B-4bit-python-coder/blob/main/Aguila-7B-Instruction-tuned-Axolot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Axolot

Axolotl is a tool designed to streamline the fine-tuning of various AI models, offering support for multiple configurations and architectures.

Features:
- Train various Huggingface models such as llama, pythia, falcon, mpt
- Supports fullfinetune, lora, qlora, relora, and gptq
- Customize configurations using a simple yaml file or CLI overwrite
- Load different dataset formats, use custom formats, or bring your own tokenized datasets
- Integrated with xformer, flash attention, rope scaling, and multipacking
- Works with single GPU or multiple GPUs via FSDP or Deepspeed
- Easily run with Docker locally or on the cloud
- Log results and optionally checkpoints to wandb or mlflow

[GitHub repo](https://github.com/OpenAccess-AI-Collective/axolotl)

## Install the libraries

In [None]:
!pip install torch=="2.1.2"
!pip install -e git+https://github.com/OpenAccess-AI-Collective/axolotl#egg=axolotl
!pip install flash-attn=="2.5.0"
!pip install deepspeed=="0.13.1"
!pip install huggingface_hub

Collecting torch==2.1.2
  Downloading torch-2.1.2-cp310-cp310-manylinux1_x86_64.whl (670.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m670.2/670.2 MB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.1.2)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m70.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.1.2)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m823.6/823.6 kB[0m [31m73.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.1.2)
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.1/14

Collecting flash-attn==2.5.0
  Downloading flash_attn-2.5.0.tar.gz (2.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m40.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting ninja (from flash-attn==2.5.0)
  Downloading ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m307.2/307.2 kB[0m [31m40.0 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: flash-attn
  Building wheel for flash-attn (setup.py) ... [?25l[?25hdone
  Created wheel for flash-attn: filename=flash_attn-2.5.0-cp310-cp310-linux_x86_64.whl size=120823033 sha256=3335e74258645eb190597754d42c2fee391fbdeb772847f9e1de12da60450a33
  Stored in directory: /root/.cache/pip/wheels/9e/c3/22/a576eb5627fb2c30dc4679a33d67d34d922d6dbeb24a9119b2
Successfully built flash-attn
Installing collected packages: ninja, flash-attn
Successfully

## Load and review the dataset

In [None]:
!pip install datasets

Collecting datasets
  Downloading datasets-2.17.0-py3-none-any.whl (536 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m536.6/536.6 kB[0m [31m13.2 MB/s[0m eta [36m0:00:00[0m
Collecting pyarrow>=12.0.0 (from datasets)
  Downloading pyarrow-15.0.0-cp310-cp310-manylinux_2_28_x86_64.whl (38.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m38.3/38.3 MB[0m [31m46.7 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m18.7 MB/s[0m eta [36m0:00:00[0m
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m19.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pyarrow, dill, multiprocess, datasets
  Attempting uninstall: pyarrow
    Found exi

In [None]:
from datasets import load_dataset

dataset_name = "MBZUAI/Bactrian-X"
dataset = load_dataset(dataset_name, "es", split="train")

Downloading data:   0%|          | 0.00/31.6M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/67017 [00:00<?, ? examples/s]

In [None]:
dataset[0]

{'instruction': 'Crea un ejemplo de algo que represente la metáfora dada.',
 'input': 'Más vale pájaro en mano que ciento volando',
 'id': 'alpaca-16757',
 'output': 'Es mejor aceptar la oferta de trabajo que te ofrecen con un salario ya establecido, que estar buscando por meses uno que te pague mucho más, pero que aún no has encontrado. Más vale tener un ingreso seguro que estar esperando por uno hipotético e incierto.'}

In [None]:
dataset= dataset.remove_columns(column_names=["id"])

In [None]:
# Select only 50,000 rows to reduce training time
dataset = dataset.select(range(50_000))

In [None]:
dataset

Dataset({
    features: ['instruction', 'input', 'output'],
    num_rows: 50000
})

In [None]:
dataset.push_to_hub("edumunozsala/Bactrian-X-es-50k")

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/50 [00:00<?, ?ba/s]

CommitInfo(commit_url='https://huggingface.co/datasets/edumunozsala/Bactrian-X-es-50k/commit/fa237d67844582b3130fad41698a20c899af456b', commit_message='Upload dataset', commit_description='', oid='fa237d67844582b3130fad41698a20c899af456b', pr_url=None, pr_revision=None, pr_num=None)

## Create the config.yaml file

In [None]:
import yaml

This section creates the configuration YAML file where all the parameters of our model, training, quantization, ... are defined.

In [None]:
new_model = "edumunozsala/aguila-7b-instructft-bactrian-x"

yaml_string = """
base_model: projecte-aina/aguila-7b
# required by falcon custom model code: https://huggingface.co/tiiuae/falcon-7b/tree/main
trust_remote_code: true
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
is_falcon_derived_model: true
load_in_8bit: false
# enable 4bit for QLoRA
load_in_4bit: true
gptq: false
strict: false

push_dataset_to_hub:
datasets:
  - path: edumunozsala/Bactrian-X-es-50k
    type: alpaca
dataset_prepared_path:
val_set_size: 0.05
# enable QLoRA
adapter: qlora
lora_model_dir:
sequence_len: 2048
max_packed_sequence_len:

# hyperparameters from QLoRA paper Appendix B.2
# "We find hyperparameters to be largely robust across datasets"
lora_r: 64
lora_alpha: 16
# 0.1 for models up to 13B
# 0.05 for 33B and 65B models
lora_dropout: 0.05
# add LoRA modules on all linear layers of the base model
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

output_dir: ./qlora-out

# QLoRA paper Table 9
# - 16 for 7b & 13b
# - 32 for 33b, 64 for 64b
# Max size tested on A6000
# - 7b: 40
# - 40b: 4
# decrease if OOM, increase for max VRAM utilization
micro_batch_size: 4
gradient_accumulation_steps: 2
num_epochs: 2
# Optimizer for QLoRA
optimizer: paged_adamw_32bit
torchdistx_path:
lr_scheduler: cosine
# QLoRA paper Table 9
# - 2e-4 for 7b & 13b
# - 1e-4 for 33b & 64b
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: true
gradient_checkpointing: true
# stop training after this many evaluation losses have increased in a row
# https://huggingface.co/transformers/v4.2.2/_modules/transformers/trainer_callback.html#EarlyStoppingCallback
# early_stopping_patience: 3
resume_from_checkpoint:
auto_resume_from_checkpoints: true
local_rank:
logging_steps: 200
xformers_attention: true
flash_attention:
gptq_groupsize:
gptq_model_v1:
warmup_steps: 10
evals_per_epoch: 1
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.000001
fsdp:
fsdp_config:
special_tokens:
  pad_token: "<|endoftext|>"
  bos_token: "<|endoftext|>"
  eos_token: "<|endoftext|>"
"""

# Convert the YAML string to a Python dictionary
yaml_dict = yaml.safe_load(yaml_string)

# Specify your file path
yaml_file = 'config.yaml'

# Write the YAML file
with open(yaml_file, 'w') as file:
    yaml.dump(yaml_dict, file)

## Run the training

Now, we can run the training through a CLI command

In [None]:
!accelerate launch -m axolotl.cli.train config.yaml

The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `1`
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--dynamo_backend` was set to a value of `'no'`
[2024-02-15 06:37:33,438] [INFO] [datasets.<module>:58] [PID:4680] PyTorch version 2.1.2 available.
[2024-02-15 06:37:33,439] [INFO] [datasets.<module>:95] [PID:4680] TensorFlow version 2.15.0 available.
[2024-02-15 06:37:33,440] [INFO] [datasets.<module>:108] [PID:4680] JAX version 0.4.23 available.
2024-02-15 06:37:34.880738: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-15 06:37:34.880781: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registere

## Merge the Adapter to the base model

The following command will merge your LORA adapater with your base model. You can  pass the argument --lora_model_dir to specify the directory where your LORA adapter was saved, otherwhise, this will be inferred from output_dir in your axolotl config file. The merged model is saved in the sub-directory {lora_model_dir}/merged

In [None]:
!python3 -m axolotl.cli.merge_lora config.yaml --lora_model_dir="./qlora-out"

[2024-02-15 13:30:33,387] [INFO] [datasets.<module>:58] [PID:105337] PyTorch version 2.1.2 available.
[2024-02-15 13:30:33,388] [INFO] [datasets.<module>:95] [PID:105337] TensorFlow version 2.15.0 available.
[2024-02-15 13:30:33,389] [INFO] [datasets.<module>:108] [PID:105337] JAX version 0.4.23 available.
2024-02-15 13:30:34.427112: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-15 13:30:34.427167: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-15 13:30:34.428841: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[2024-02-15 13:30:36,374] [INFO] [real_accelerator.py:191

## Update the model to Huggingface Hub

Once, the model is trained and merged, we can upload it to the Huggingface Hub.

In [None]:
from huggingface_hub import HfApi
from google.colab import userdata

In [None]:
# Set the model name in the hub
new_model = "edumunozsala/aguila-7b-instructft-bactrian-x"


In [None]:
# HF_TOKEN defined in the secrets tab in Google Colab
api = HfApi()

# Upload merge folder
api.create_repo(
    repo_id=new_model,
    repo_type="model",
    exist_ok=True,
)
api.upload_folder(
    repo_id=new_model,
    folder_path="qlora-out/merged",
)

pytorch_model-00001-of-00003.bin:   0%|          | 0.00/4.85G [00:00<?, ?B/s]

Upload 3 LFS files:   0%|          | 0/3 [00:00<?, ?it/s]

pytorch_model-00002-of-00003.bin:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

pytorch_model-00003-of-00003.bin:   0%|          | 0.00/3.89G [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/edumunozsala/aguila-7b-instructft-bactrian-x/commit/1e00bbdee0a31169b0e4abc20ffded65b75fe025', commit_message='Upload folder using huggingface_hub', commit_description='', oid='1e00bbdee0a31169b0e4abc20ffded65b75fe025', pr_url=None, pr_revision=None, pr_num=None)

## Test the model

Finally we download the created model from the hub and test it to make sure it works fine!

In [None]:
!pip install transformers
!pip install einops
!pip install accelerate



Create and download the model and tokenizer

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "edumunozsala/aguila-7b-instructft-bactrian-x"

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True)

pytorch_model.bin.index.json:   0%|          | 0.00/16.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

pytorch_model-00001-of-00003.bin:   0%|          | 0.00/4.85G [00:00<?, ?B/s]

pytorch_model-00002-of-00003.bin:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

pytorch_model-00003-of-00003.bin:   0%|          | 0.00/3.89G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

And, we format the input text to the alpaca format and run the inference to get the response.

In [None]:
instruction="Piense en una solución para reducir la congestión del tráfico."

input=""

prompt = f"""### Instrucción:
{instruction}

### Entrada:
{input}

### Respuesta:
"""

input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()
# with torch.inference_mode():
outputs = model.generate(input_ids=input_ids, max_new_tokens=256, do_sample=True, top_p=0.9,temperature=0.3)

print(f"Prompt:\n{prompt}\n")
print(f"Generated instruction:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}")




Prompt:
### Instrucción:
Piense en una solución para reducir la congestión del tráfico.

### Entrada:


### Respuesta:


Generated instruction:
Una solución para reducir la congestión del tráfico podría ser implementar un sistema de transporte público eficiente y accesible para todos. Esto incluiría la construcción de nuevas líneas de metro, trenes de alta velocidad, autobuses y tranvías, así como la ampliación de las redes de transporte público existentes. Además, se podría fomentar el uso de vehículos eléctricos y la implementación de políticas de reducción de emisiones de gases de efecto invernadero. También se podría trabajar en la promoción de la bicicleta como medio de transporte sostenible y saludable. En resumen, se trata de una solución integral que involucra a diferentes sectores y que busca reducir la congestión del tráfico y mejorar la calidad de vida de la población.


In [None]:
instruction="Encuentra todos los numeros divisibles por 3 entre 1 y 30."

input=""

prompt = f"""### Instrucción:
{instruction}

### Entrada:
{input}

### Respuesta:
"""

input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()
# with torch.inference_mode():
outputs = model.generate(input_ids=input_ids, max_new_tokens=256, do_sample=True, top_p=0.9,temperature=0.3)

print(f"Prompt:\n{prompt}\n")
print(f"Generated instruction:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}")


Prompt:
### Instrucción:
Encuentra todos los numeros divisibles por 3 entre 1 y 30.

### Entrada:


### Respuesta:


Generated instruction:
Lo siento, como modelo de lenguaje de IA, no tengo la capacidad de buscar números en una lista. Pero puedo decirte que la respuesta es:

1, 3, 7, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 166, 167, 169, 171, 173, 175, 177, 181, 183, 187, 189, 190, 191, 192, 193, 194, 195, 196, 197, 199, 202, 203, 204, 205, 206, 207, 208
