## This example notebook uses Axolotl to fine-tune large foundation models

[Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) is a tool designed to streamline the fine-tuning of various AI models, offering support for multiple configurations and architectures.

Features:

- Train various Huggingface models such as llama, pythia, falcon, mpt
- Supports fullfinetune, lora, qlora, relora, and gptq
- Customize configurations using a simple yaml file or CLI overwrite
- Load different dataset formats, use custom formats, or bring your own tokenized datasets
- Integrated with xformer, flash attention, rope scaling, and multipacking
- Works with single GPU or multiple GPUs via FSDP or Deepspeed
- Easily run with Docker locally or on the cloud

In [3]:
%pip install -Uq sagemaker
%pip install -Uq datasets
!pip install -Uq transformers==4.33.1 
!pip install -Uq bitsandbytes peft accelerate
!pip install scipy

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[

In [4]:
import boto3
import sagemaker
import json
from sagemaker import Model, image_uris, serializers, deserializers
import time
from pathlib import Path
from utils import download_model

boto3_session=boto3.session.Session()
# boto3_session=boto3.session.Session()

smr = boto3_session.client("sagemaker-runtime") # sagemaker runtime client for invoking the endpoint
sm = boto3_session.client("sagemaker") 
s3_rsr = boto3_session.resource("s3")
role = sagemaker.get_execution_role()  

sess = sagemaker.session.Session(boto3_session, sagemaker_client=sm, sagemaker_runtime_client=smr)  # sagemaker session for interacting with different AWS APIs
bucket = sess.default_bucket()  # sagemaker session for interacting with different AWS APIs
region = sess._region_name  # region name of the current SageMaker Studio environment
s3_prefix = "code-llama7b"

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


  from .autonotebook import tqdm as notebook_tqdm


sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


### Download Model

In [5]:
# uncomment to download model
local_model_path = download_model("codellama/CodeLlama-7b-hf", "CodeLlama-7b-hf")

Model already exists at CodeLlama-7b-hf
Skipping download


In [16]:
if list(s3_rsr.Bucket(bucket).objects.filter(Prefix=s3_prefix)) :
    print("Model already exists on the S3 bucket")
    print(f"If you want to upload a new model, please delete the existing model from the S3 bucket with the following command: \n !aws s3 rm --recursive s3://{bucket}/{s3_prefix}")
    s3_model_location = f"s3://{bucket}/{s3_prefix}"
else:
    s3_model_location = sess.upload_data(path=local_model_path.as_posix(), bucket=bucket, key_prefix=s3_prefix)

### Download Data and upload to S3
[Spider dataset with schema](https://huggingface.co/datasets/b-mc2/sql-create-context)

In [17]:
import datasets

# download the training data mhenrichsen/alpaca_2k_test using the HuggingFace datasets library and save output as json
dataset = datasets.load_dataset("b-mc2/sql-create-context")
print(dataset)

data_path = Path("data")
data_path.mkdir(exist_ok=True)

dataset["train"].to_pandas().to_json("data/spider_create_context_train.json", orient="records", lines=True)
s3_data = sess.upload_data(path="data/spider_create_context_train.json", bucket=bucket, key_prefix=f"{s3_prefix}/data")

print(f"Uploaded training data file to {s3_data}")

DatasetDict({
    train: Dataset({
        features: ['question', 'context', 'answer'],
        num_rows: 78577
    })
})
Uploaded training data file to s3://sagemaker-us-west-2-376678947624/code-llama7b/data/spider_create_context_train.json


In [18]:
!aws s3 ls $s3_data

2023-10-27 21:24:00   19871585 spider_create_context_train.json


In [19]:
from sagemaker.pytorch import PyTorch
from sagemaker.debugger import TensorBoardOutputConfig
import time

str_time = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())

tb_output_config = TensorBoardOutputConfig(s3_output_path=f"s3://{bucket}/{s3_prefix}/tensorboard/{str_time}",
    container_local_output_path="/opt/ml/output/tensorboard")

hyperparameters = {
    "config": "code-llama-7b-qlora.yml",
    "deepspeed": "axolotl/deepspeed/zero2.json"
}


estimator = PyTorch(
    source_dir = "src",
    entry_point="axolotl/src/axolotl/cli/train.py",
    sagemaker_session=sess,
    role=role,
    instance_count=2, 
    hyperparameters=hyperparameters,
    instance_type="ml.g5.2xlarge", 
    framework_version="2.0.1",
    py_version="py310",
    disable_profiler=True,
    max_run=60*60*24*2,
    keep_alive_period_in_seconds=3600,
    tensorboard_output_config=tb_output_config,
    environment = {"HUGGINGFACE_HUB_CACHE": "/tmp", 
                    "LIBRARY_PATH": "/opt/conda/lib/",
                    "TRANSFORMERS_CACHE": "/tmp",
                    "NCCL_P2P_LEVEL": "NVL"},
    distribution={"torch_distributed": {"enabled": True}} 
)

In [20]:
estimator.fit({"model": s3_model_location, "train": s3_data})

INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.


Using provided s3_resource


INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker:Creating training-job with name: pytorch-training-2023-10-27-21-24-37-446


2023-10-27 21:24:39 Starting - Starting the training job...
2023-10-27 21:24:55 Starting - Preparing the instances for training......
2023-10-27 21:26:08 Downloading - Downloading input data......................................................
2023-10-27 21:35:06 Training - Training image download completed. Training in progress..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2023-10-27 21:35:08,717 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2023-10-27 21:35:08,730 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2023-10-27 21:35:08,739 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2023-10-27 21:35:08,746 sagemaker_pytorch_container.training INFO     Invoking TorchDistributed...[0m
[34m2023-10-27 21:35:08,746 sagemaker_pytorch_container.trai

## Check Tensorboard report

In [21]:
f"s3://{bucket}/{s3_prefix}/tensorboard/{str_time}"

's3://sagemaker-us-west-2-376678947624/code-llama7b/tensorboard/2023-10-27-21-24-01'

## Test model performance before and after fine tuning

In [6]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    local_model_path,
    load_in_8bit=True,
    torch_dtype=torch.float16,
    device_map="auto",
)

Loading checkpoint shards: 100%|██████████| 2/2 [00:48<00:00, 24.37s/it]


In [7]:
tokenizer = AutoTokenizer.from_pretrained(local_model_path)

### Before fine tuning

In [8]:
eval_prompt = """You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.

You must output the SQL query that answers the question.
### Input:
Which Class has a Frequency MHz larger than 91.5, and a City of license of hyannis, nebraska?

### Context:
CREATE TABLE table_name_12 (class VARCHAR, frequency_mhz VARCHAR, city_of_license VARCHAR)

### Response:
"""
# {'question': 'Name the comptroller for office of prohibition', 'context': 'CREATE TABLE table_22607062_1 (comptroller VARCHAR, ticket___office VARCHAR)', 'answer': 'SELECT comptroller FROM table_22607062_1 WHERE ticket___office = "Prohibition"'}
model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.

You must output the SQL query that answers the question.
### Input:
Which Class has a Frequency MHz larger than 91.5, and a City of license of hyannis, nebraska?

### Context:
CREATE TABLE table_name_12 (class VARCHAR, frequency_mhz VARCHAR, city_of_license VARCHAR)

### Response:
SELECT * FROM table_name_12 WHERE class > '91.5' AND city_of_license = 'hyannis'

### Input:
Which Class has a Frequency MHz larger than 91.5, and a City of license of hyannis, nebraska?

### Context:
CREATE TABLE table_name_12 (class VARCHAR, frequency_mhz VARCHAR, city_of_lic


### After fine tuning

In [13]:
# lora_path = estimator.model_data
lora_path = "s3://sagemaker-us-west-2-376678947624/pytorch-training-2023-10-27-21-24-37-446/output/model.tar.gz"

In [14]:
!aws s3 cp {lora_path} .

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
download: s3://sagemaker-us-west-2-376678947624/pytorch-training-2023-10-27-21-24-37-446/output/model.tar.gz to ./model.tar.gz


In [16]:
!mkdir -p lora

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [15]:
!tar -xzf model.tar.gz -C lora

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.creationtim

In [9]:
from peft import PeftModel
model = PeftModel.from_pretrained(model, "lora")

In [10]:
eval_prompt = """You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.

You must output the SQL query that answers the question.
### Input:
Which Class has a Frequency MHz larger than 91.5, and a City of license of hyannis, nebraska?

### Context:
CREATE TABLE table_name_12 (class VARCHAR, frequency_mhz VARCHAR, city_of_license VARCHAR)

### Response:
"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.

You must output the SQL query that answers the question.
### Input:
Which Class has a Frequency MHz larger than 91.5, and a City of license of hyannis, nebraska?

### Context:
CREATE TABLE table_name_12 (class VARCHAR, frequency_mhz VARCHAR, city_of_license VARCHAR)

### Response:
 SELECT class FROM table_name_12 WHERE frequency_mhz > 91.5 AND city_of_license = "hyannis, nebraska"
