## This example notebook uses Axolotl to fine-tune large foundation models

[Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) is a tool designed to streamline the fine-tuning of various AI models, offering support for multiple configurations and architectures.

Features:

- Train various Huggingface models such as llama, pythia, falcon, mpt
- Supports fullfinetune, lora, qlora, relora, and gptq
- Customize configurations using a simple yaml file or CLI overwrite
- Load different dataset formats, use custom formats, or bring your own tokenized datasets
- Integrated with xformer, flash attention, rope scaling, and multipacking
- Works with single GPU or multiple GPUs via FSDP or Deepspeed
- Easily run with Docker locally or on the cloud

In [1]:
%pip install -Uq sagemaker
%pip install -Uq datasets
!pip install -Uq transformers==4.33.1 
!pip install -Uq bitsandbytes peft accelerate
!pip install scipy

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [12]:
import boto3
import sagemaker
import json
from sagemaker import Model, image_uris, serializers, deserializers
from sagemaker.local import LocalSession
import time
from pathlib import Path
from utils import download_model

boto3_session=boto3.session.Session()
# boto3_session=boto3.session.Session()

smr = boto3_session.client("sagemaker-runtime") # sagemaker runtime client for invoking the endpoint
sm = boto3_session.client("sagemaker") 
s3_rsr = boto3_session.resource("s3")
role = sagemaker.get_execution_role()  

sess = sagemaker.session.Session(boto3_session, sagemaker_client=sm, sagemaker_runtime_client=smr)  # sagemaker session for interacting with different AWS APIs
bucket = sess.default_bucket()  # sagemaker session for interacting with different AWS APIs
region = sess._region_name  # region name of the current SageMaker Studio environment
s3_prefix = "llama2-7b-spider"

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


### Download Model

In [2]:
# uncomment to download model
local_model_path = download_model("TheBloke/Llama-2-7B-fp16", "Llama-2-7B-fp16")

Fetching 13 files:   0%|          | 0/13 [00:00<?, ?it/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

(…)9903c2ebf7d09970a973ef44d1402239/LICENSE:   0%|          | 0.00/7.02k [00:00<?, ?B/s]

(…)ebf7d09970a973ef44d1402239/USE_POLICY.md:   0%|          | 0.00/4.77k [00:00<?, ?B/s]

(…)bf7d09970a973ef44d1402239/.gitattributes:   0%|          | 0.00/1.52k [00:00<?, ?B/s]

(…)0a973ef44d1402239/generation_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

(…)39903c2ebf7d09970a973ef44d1402239/Notice:   0%|          | 0.00/112 [00:00<?, ?B/s]

(…)c2ebf7d09970a973ef44d1402239/config.json:   0%|          | 0.00/554 [00:00<?, ?B/s]

(…)a973ef44d1402239/special_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

(…)f44d1402239/pytorch_model.bin.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

(…)bf7d09970a973ef44d1402239/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

(…)70a973ef44d1402239/tokenizer_config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

pytorch_model-00001-of-00002.bin:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

In [3]:
if list(s3_rsr.Bucket(bucket).objects.filter(Prefix=s3_prefix)) :
    print("Model already exists on the S3 bucket")
    print(f"If you want to upload a new model, please delete the existing model from the S3 bucket with the following command: \n !aws s3 rm --recursive s3://{bucket}/{s3_prefix}")
    s3_model_location = f"s3://{bucket}/{s3_prefix}"
else:
    s3_model_location = sess.upload_data(path=local_model_path.as_posix(), bucket=bucket, key_prefix=s3_prefix)

### Download Data and upload to S3
[Spider dataset with schema](https://huggingface.co/datasets/b-mc2/sql-create-context)

In [4]:
import datasets

# download the training data mhenrichsen/alpaca_2k_test using the HuggingFace datasets library and save output as json
dataset = datasets.load_dataset("b-mc2/sql-create-context")
print(dataset)

data_path = Path("data")
data_path.mkdir(exist_ok=True)

dataset["train"].to_pandas().to_json("data/spider_create_context_train.json", orient="records", lines=True)
s3_data = sess.upload_data(path="data/spider_create_context_train.json", bucket=bucket, key_prefix=f"{s3_prefix}/data")

print(f"Uploaded training data file to {s3_data}")

Downloading readme:   0%|          | 0.00/3.35k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/21.8M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['question', 'context', 'answer'],
        num_rows: 78577
    })
})
Uploaded training data file to s3://sagemaker-us-west-2-376678947624/llama2-7b-spider/data/spider_create_context_train.json


In [5]:
!aws s3 ls $s3_data

2023-10-28 13:16:56   19871585 spider_create_context_train.json


In [19]:
from sagemaker.pytorch import PyTorch
from sagemaker.debugger import TensorBoardOutputConfig
import time

str_time = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())

tb_output_config = TensorBoardOutputConfig(s3_output_path=f"s3://{bucket}/{s3_prefix}/tensorboard/{str_time}",
    container_local_output_path="/opt/ml/output/tensorboard")

hyperparameters = {
    "config": "llama2-7b-qlora.yml",
    "deepspeed": "axolotl/deepspeed/zero2.json"
}

# local_sess = LocalSession()
# local_sess.config = {'local': {'local_code': True}}


estimator = PyTorch(
    source_dir = "src",
    entry_point="axolotl/src/axolotl/cli/train.py",
    sagemaker_session=sess,
    role=role,
    instance_count=1, 
    hyperparameters=hyperparameters,
    instance_type="ml.g5.2xlarge", 
    framework_version="2.0.1",
    py_version="py310",
    disable_profiler=True,
    max_run=60*60*24*2,
    keep_alive_period_in_seconds=3600,
    tensorboard_output_config=tb_output_config,
    environment = {"HUGGINGFACE_HUB_CACHE": "/tmp", 
                    "LIBRARY_PATH": "/opt/conda/lib/",
                    "TRANSFORMERS_CACHE": "/tmp",
                    "NCCL_P2P_LEVEL": "NVL"},
    distribution={"torch_distributed": {"enabled": True}} 
)

In [None]:
estimator.fit({"model": s3_model_location, "train": s3_data})

INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.


Using provided s3_resource


INFO:sagemaker:Creating training-job with name: pytorch-training-2023-10-28-16-34-30-833


2023-10-28 16:34:31 Starting - Starting the training job...
2023-10-28 16:34:47 Starting - Preparing the instances for training......
2023-10-28 16:35:54 Downloading - Downloading input data...........................................................[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2023-10-28 16:45:39,563 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2023-10-28 16:45:39,576 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2023-10-28 16:45:39,585 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2023-10-28 16:45:39,588 sagemaker_pytorch_container.training INFO     Invoking TorchDistributed...[0m
[34m2023-10-28 16:45:39,588 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2023-10-28 16:45:41,243 sagema


2023-10-28 16:45:37 Training - Training image download completed. Training in progress.[34mCollecting hf_transfer (from axolotl==0.3.0->-r requirements.txt (line 6))[0m
[34mDownloading hf_transfer-0.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.9 MB)[0m
[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.9/3.9 MB 114.5 MB/s eta 0:00:00[0m
[34mCollecting bert-score==0.3.13 (from axolotl==0.3.0->-r requirements.txt (line 6))[0m
[34mDownloading bert_score-0.3.13-py3-none-any.whl (61 kB)[0m
[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.1/61.1 kB 16.8 MB/s eta 0:00:00[0m
[34mCollecting evaluate==0.4.0 (from axolotl==0.3.0->-r requirements.txt (line 6))[0m
[34mDownloading evaluate-0.4.0-py3-none-any.whl (81 kB)[0m
[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 81.4/81.4 kB 20.7 MB/s eta 0:00:00[0m
[34mCollecting rouge-score==0.1.2 (from axolotl==0.3.0->-r requirements.txt (line 6))[0m
[34mDownloading rouge_score-0.1.2.tar.gz (17 kB)[0m
[34mPreparing met

[34mCollecting svgwrite (from wavedrom->markdown2[all]->fschat==0.2.29->axolotl==0.3.0->-r requirements.txt (line 6))[0m
[34mDownloading svgwrite-1.4.3-py3-none-any.whl (67 kB)[0m
[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 67.1/67.1 kB 24.0 MB/s eta 0:00:00[0m
[34mDownloading tensorboard-2.15.0-py3-none-any.whl (5.6 MB)[0m
[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.6/5.6 MB 116.7 MB/s eta 0:00:00[0m
[34mDownloading pydantic-1.10.13-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)[0m
[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 102.3 MB/s eta 0:00:00[0m
[34mDownloading fschat-0.2.29-py3-none-any.whl (200 kB)[0m
[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 200.7/200.7 kB 49.1 MB/s eta 0:00:00[0m
[34mDownloading absl_py-2.0.0-py3-none-any.whl (130 kB)[0m
[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 130.2/130.2 kB 31.4 MB/s eta 0:00:00[0m
[34mDownloading bitsandbytes-0.41.1-py3-none-any.whl (92.6 MB)[0m
[34m━━━━━━━━━━━━━━━━━━━━━━

[34mInstalling collected packages: sentencepiece, pathtools, nh3, bitsandbytes, appdirs, addict, xxhash, termcolor, tensorboard-data-server, svgwrite, sniffio, smmap, shortuuid, setproctitle, sentry-sdk, safetensors, rouge, regex, pynvml, pydantic, pyasn1-modules, oauthlib, multidict, markdown2, markdown, humanfriendly, hf_transfer, h11, grpcio, frozenlist, docker-pycreds, cachetools, async-timeout, art, absl-py, yarl, wavedrom, uvicorn, tiktoken, scikit-learn, responses, requests-oauthlib, nltk, huggingface-hub, google-auth, gitdb, fire, coloredlogs, anyio, aiosignal, xformers, tokenizers, starlette, rouge-score, httpcore, google-auth-oauthlib, GitPython, flash-attn, deepspeed, aiohttp, wandb, transformers, tensorboard, httpx, fastapi, peft, fschat, datasets, bert-score, optimum, evaluate, auto-gptq, axolotl[0m
[34mAttempting uninstall: pydantic[0m
[34mFound existing installation: pydantic 2.4.1[0m
[34mUninstalling pydantic-2.4.1:[0m
[34mSuccessfully uninstalled pydantic-2.4.

[34mThe cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.[0m
[34m0it [00:00, ?it/s][0m
[34m0it [00:00, ?it/s][0m
[34m[2023-10-28 16:46:38,980] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)[0m
[34mdP            dP   dP 
                              88            88   88 
   .d8888b. dP.  .dP .d8888b. 88 .d8888b. d8888P 88 
   88'  `88  `8bd8'  88'  `88 88 88'  `88   88   88 
   88.  .88  .d88b.  88.  .88 88 88.  .88   88   88 
   `88888P8 dP'  `dP `88888P' dP `88888P'   dP   dP[0m
[34m[2023-10-28 16:46:42,709] [INFO] [axolotl.normalize_config:122] [PID:197] [RANK:0] GPU memory usage baseline: 0.000GB (+0.302GB misc)#033[39m[0m
[34mYou are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. T

[34mMap (num_proc=8):  38%|███▊      | 30166/78577 [00:13<00:20, 2355.56 examples/s][0m
[34mMap (num_proc=8):  39%|███▊      | 30404/78577 [00:13<00:21, 2291.81 examples/s][0m
[34mMap (num_proc=8):  39%|███▉      | 30648/78577 [00:13<00:20, 2296.22 examples/s][0m
[34mMap (num_proc=8):  39%|███▉      | 30883/78577 [00:14<00:21, 2264.74 examples/s][0m
[34mMap (num_proc=8):  40%|███▉      | 31120/78577 [00:14<00:21, 2203.44 examples/s][0m
[34mMap (num_proc=8):  40%|███▉      | 31355/78577 [00:14<00:22, 2135.86 examples/s][0m
[34mMap (num_proc=8):  40%|████      | 31575/78577 [00:14<00:23, 1997.90 examples/s][0m
[34mMap (num_proc=8):  40%|████      | 31823/78577 [00:14<00:23, 2007.48 examples/s][0m
[34mMap (num_proc=8):  41%|████      | 32036/78577 [00:14<00:24, 1891.11 examples/s][0m
[34mMap (num_proc=8):  41%|████      | 32271/78577 [00:14<00:23, 1998.15 examples/s][0m
[34mMap (num_proc=8):  41%|████▏     | 32513/78577 [00:14<00:22, 2079.05 examples/s][0m
[34mMap (

[34mMap (num_proc=8):  96%|█████████▌| 75102/78577 [00:33<00:01, 2353.23 examples/s][0m
[34mMap (num_proc=8):  96%|█████████▌| 75352/78577 [00:33<00:01, 2312.94 examples/s][0m
[34mMap (num_proc=8):  96%|█████████▋| 75636/78577 [00:34<00:01, 2434.56 examples/s][0m
[34mMap (num_proc=8):  97%|█████████▋| 75896/78577 [00:34<00:01, 2399.71 examples/s][0m
[34mMap (num_proc=8):  97%|█████████▋| 76140/78577 [00:34<00:01, 2276.11 examples/s][0m
[34mMap (num_proc=8):  97%|█████████▋| 76385/78577 [00:34<00:00, 2237.03 examples/s][0m
[34mMap (num_proc=8):  98%|█████████▊| 76655/78577 [00:34<00:00, 2283.60 examples/s][0m
[34mMap (num_proc=8):  98%|█████████▊| 76904/78577 [00:34<00:00, 2293.90 examples/s][0m
[34mMap (num_proc=8):  98%|█████████▊| 77167/78577 [00:34<00:00, 2356.20 examples/s][0m
[34mMap (num_proc=8):  99%|█████████▊| 77420/78577 [00:34<00:00, 2208.02 examples/s][0m
[34mMap (num_proc=8):  99%|█████████▉| 77646/78577 [00:34<00:00, 1957.96 examples/s][0m
[34mMap (

[34mMap (num_proc=8):  94%|█████████▎| 72901/77791 [00:09<00:00, 6530.27 examples/s][0m
[34mMap (num_proc=8):  95%|█████████▍| 73764/77791 [00:10<00:00, 6925.72 examples/s][0m
[34mMap (num_proc=8):  96%|█████████▌| 74468/77791 [00:10<00:00, 6885.47 examples/s][0m
[34mMap (num_proc=8):  97%|█████████▋| 75504/77791 [00:10<00:00, 7272.42 examples/s][0m
[34mMap (num_proc=8):  98%|█████████▊| 76316/77791 [00:10<00:00, 7194.27 examples/s][0m
[34mMap (num_proc=8):  99%|█████████▉| 77188/77791 [00:10<00:00, 7201.29 examples/s][0m
[34mMap (num_proc=8): 100%|██████████| 77791/77791 [00:10<00:00, 7273.24 examples/s][0m
[34mMap (num_proc=8):   0%|          | 0/786 [00:00<?, ? examples/s][0m
[34mMap (num_proc=8):  13%|█▎        | 99/786 [00:00<00:01, 674.50 examples/s][0m
[34mMap (num_proc=8): 100%|██████████| 786/786 [00:00<00:00, 2888.17 examples/s][0m
[34m[2023-10-28 16:47:38,437] [INFO] [axolotl.calculate_total_num_steps:438] [PID:197] [RANK:0] calculating total_num_tokens#

[34m0%|          | 1/720 [00:34<6:52:08, 34.39s/it][0m
[34m{'loss': 1.1016, 'learning_rate': 0.0, 'epoch': 0.01}[0m
[34m0%|          | 1/720 [00:34<6:52:08, 34.39s/it][0m
[34m[2023-10-28 16:50:06,050] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m[2023-10-28 16:50:06,058] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:197] [RANK:0] generating packed batches#033[39m[0m
[34m[2023-10-28 16:50:06,058] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:197] [RANK:0] 56bcedc7505086c9ea4dc0cf83dd8be99917f944e643943c0bfe2b71a7416760#033[39m[0m
[34m[2023-10-28 16:50:06,062] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m[2023-10-28 16:50:08,556] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_toke

[34m4%|▍         | 32/720 [18:16<6:28:03, 33.84s/it][0m
[34m{'loss': 0.0889, 'learning_rate': 0.00019408450704225353, 'epoch': 0.18}[0m
[34m4%|▍         | 32/720 [18:16<6:28:03, 33.84s/it][0m
[34m5%|▍         | 33/720 [18:50<6:27:30, 33.84s/it][0m
[34m{'loss': 0.0806, 'learning_rate': 0.00019380281690140847, 'epoch': 0.18}[0m
[34m5%|▍         | 33/720 [18:50<6:27:30, 33.84s/it][0m
[34m5%|▍         | 34/720 [19:23<6:26:55, 33.84s/it][0m
[34m{'loss': 0.0699, 'learning_rate': 0.0001935211267605634, 'epoch': 0.19}[0m
[34m5%|▍         | 34/720 [19:23<6:26:55, 33.84s/it][0m
[34m5%|▍         | 35/720 [19:57<6:26:22, 33.84s/it][0m
[34m{'loss': 0.078, 'learning_rate': 0.00019323943661971832, 'epoch': 0.19}[0m
[34m5%|▍         | 35/720 [19:57<6:26:22, 33.84s/it][0m
[34m5%|▌         | 36/720 [20:31<6:25:50, 33.85s/it][0m
[34m{'loss': 0.0793, 'learning_rate': 0.00019295774647887326, 'epoch': 0.2}[0m
[34m5%|▌         | 36/720 [20:31<6:25:50, 33.85s/it][0m
[34m[2023-10

[34m9%|▉         | 63/720 [35:58<6:10:31, 33.84s/it][0m
[34m{'loss': 0.0349, 'learning_rate': 0.00018535211267605635, 'epoch': 0.35}[0m
[34m9%|▉         | 63/720 [35:58<6:10:31, 33.84s/it][0m
[34m9%|▉         | 64/720 [36:32<6:09:57, 33.84s/it][0m
[34m{'loss': 0.04, 'learning_rate': 0.00018507042253521126, 'epoch': 0.36}[0m
[34m9%|▉         | 64/720 [36:32<6:09:57, 33.84s/it][0m
[34m9%|▉         | 65/720 [37:05<6:09:24, 33.84s/it][0m
[34m{'loss': 0.0367, 'learning_rate': 0.0001847887323943662, 'epoch': 0.36}[0m
[34m9%|▉         | 65/720 [37:05<6:09:24, 33.84s/it][0m
[34m9%|▉         | 66/720 [37:39<6:08:53, 33.84s/it][0m
[34m{'loss': 0.0442, 'learning_rate': 0.00018450704225352114, 'epoch': 0.37}[0m
[34m9%|▉         | 66/720 [37:39<6:08:53, 33.84s/it][0m
[34m9%|▉         | 67/720 [38:13<6:08:17, 33.84s/it][0m
[34m{'loss': 0.0381, 'learning_rate': 0.00018422535211267606, 'epoch': 0.37}[0m
[34m9%|▉         | 67/720 [38:13<6:08:17, 33.84s/it][0m
[34m9%|▉    

[34m13%|█▎        | 93/720 [53:06<5:53:44, 33.85s/it][0m
[34m{'loss': 0.0389, 'learning_rate': 0.00017690140845070425, 'epoch': 0.52}[0m
[34m13%|█▎        | 93/720 [53:06<5:53:44, 33.85s/it][0m
[34m13%|█▎        | 94/720 [53:40<5:53:08, 33.85s/it][0m
[34m13%|█▎        | 94/720 [53:40<5:53:08, 33.85s/it][0m
[34m{'loss': 0.0354, 'learning_rate': 0.00017661971830985917, 'epoch': 0.52}[0m
[34m13%|█▎        | 95/720 [54:14<5:52:32, 33.84s/it][0m
[34m{'loss': 0.037, 'learning_rate': 0.0001763380281690141, 'epoch': 0.53}[0m
[34m13%|█▎        | 95/720 [54:14<5:52:32, 33.84s/it][0m
[34m13%|█▎        | 96/720 [54:48<5:52:00, 33.85s/it][0m
[34m{'loss': 0.0303, 'learning_rate': 0.00017605633802816902, 'epoch': 0.53}[0m
[34m13%|█▎        | 96/720 [54:48<5:52:00, 33.85s/it][0m
[34m13%|█▎        | 97/720 [55:21<5:51:24, 33.84s/it][0m
[34m{'loss': 0.0368, 'learning_rate': 0.00017577464788732396, 'epoch': 0.54}[0m
[34m13%|█▎        | 97/720 [55:21<5:51:24, 33.84s/it][0m
[

[34m17%|█▋        | 123/720 [1:10:14<5:36:58, 33.87s/it][0m
[34m{'loss': 0.0275, 'learning_rate': 0.0001684507042253521, 'epoch': 0.68}[0m
[34m17%|█▋        | 123/720 [1:10:14<5:36:58, 33.87s/it][0m
[34m17%|█▋        | 124/720 [1:10:48<5:36:20, 33.86s/it][0m
[34m{'loss': 0.0223, 'learning_rate': 0.00016816901408450705, 'epoch': 0.69}[0m
[34m17%|█▋        | 124/720 [1:10:48<5:36:20, 33.86s/it][0m
[34m17%|█▋        | 125/720 [1:11:22<5:35:43, 33.85s/it][0m
[34m{'loss': 0.0262, 'learning_rate': 0.000167887323943662, 'epoch': 0.69}[0m
[34m17%|█▋        | 125/720 [1:11:22<5:35:43, 33.85s/it][0m
[34m18%|█▊        | 126/720 [1:11:56<5:35:06, 33.85s/it][0m
[34m{'loss': 0.0268, 'learning_rate': 0.0001676056338028169, 'epoch': 0.7}[0m
[34m18%|█▊        | 126/720 [1:11:56<5:35:06, 33.85s/it][0m
[34m18%|█▊        | 127/720 [1:12:30<5:34:32, 33.85s/it][0m
[34m{'loss': 0.0351, 'learning_rate': 0.00016732394366197184, 'epoch': 0.7}[0m
[34m18%|█▊        | 127/720 [1:12:30<

[34m21%|██        | 152/720 [1:26:49<5:23:23, 34.16s/it][0m
[34m{'loss': 0.0279, 'learning_rate': 0.0001602816901408451, 'epoch': 0.84}[0m
[34m21%|██        | 152/720 [1:26:49<5:23:23, 34.16s/it][0m
[34m21%|██▏       | 153/720 [1:27:22<5:21:55, 34.07s/it][0m
[34m{'loss': 0.0232, 'learning_rate': 0.00016, 'epoch': 0.85}[0m
[34m21%|██▏       | 153/720 [1:27:22<5:21:55, 34.07s/it][0m
[34m21%|██▏       | 154/720 [1:27:56<5:20:43, 34.00s/it][0m
[34m{'loss': 0.0273, 'learning_rate': 0.00015971830985915495, 'epoch': 0.85}[0m
[34m21%|██▏       | 154/720 [1:27:56<5:20:43, 34.00s/it][0m
[34m22%|██▏       | 155/720 [1:28:30<5:19:44, 33.95s/it][0m
[34m{'loss': 0.0244, 'learning_rate': 0.00015943661971830987, 'epoch': 0.86}[0m
[34m22%|██▏       | 155/720 [1:28:30<5:19:44, 33.95s/it][0m
[34m22%|██▏       | 156/720 [1:29:04<5:18:51, 33.92s/it][0m
[34m{'loss': 0.0279, 'learning_rate': 0.0001591549295774648, 'epoch': 0.87}[0m
[34m22%|██▏       | 156/720 [1:29:04<5:18:51, 33

[34m[2023-10-28 18:33:08,576] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 5981037#033[39m[0m
[34m[2023-10-28 18:33:08,576] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:197] [RANK:0] generating packed batches#033[39m[0m
[34m[2023-10-28 18:33:08,707] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:197] [RANK:0] 9b552f6299ea956339a049be965f47b35a071d052f4306f6fa9aef71df7c2145#033[39m[0m
[34m[2023-10-28 18:33:09,067] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 5981037#033[39m[0m
[34m25%|██▌       | 181/720 [1:44:02<7:25:08, 49.55s/it][0m
[34m{'loss': 0.0198, 'learning_rate': 0.00015211267605633804, 'epoch': 1.0}[0m
[34m25%|██▌       | 181/720 [1:44:02<7:25:08, 49.55s/it][0m
[34m25%|██▌       | 182/720 [1:44:36<6:42:02, 44.84s/it][0m
[34m{'loss': 0.026, 'learning_rate': 0.0001518309

[34m[2023-10-28 18:53:29,228] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m80%|████████  | 4/5 [00:07<00:02,  2.09s/it][0m
[34m[2023-10-28 18:53:31,820] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m100%|██████████| 5/5 [00:10<00:00,  2.27s/it][0m
[34m{'eval_loss': 0.02307100035250187, 'eval_runtime': 12.8835, 'eval_samples_per_second': 61.008, 'eval_steps_per_second': 30.504, 'epoch': 1.2}[0m
[34m30%|███       | 216/720 [2:04:00<4:44:17, 33.84s/it][0m
[34m100%|██████████| 5/5 [00:10<00:00,  2.27s/it][0m
[34m30%|███       | 217/720 [2:04:34<5:16:08, 37.71s/it][0m
[34m{'loss': 0.0185, 'learning_rate': 0.0001419718309859155, 'epoch': 1.2}[0m
[34m30%|███       | 217/720 [2:04:34<5:16:08, 37.71s/it][0m
[34m30%|███       | 218/720 [2:05:07<5:05:47, 36.55s/it][0m
[3

[34m[2023-10-28 19:13:55,345] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m40%|████      | 2/5 [00:02<00:03,  1.28s/it][0m
[34m[2023-10-28 19:13:57,907] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m60%|██████    | 3/5 [00:05<00:03,  1.82s/it][0m
[34m[2023-10-28 19:14:00,469] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m80%|████████  | 4/5 [00:07<00:02,  2.09s/it][0m
[34m[2023-10-28 19:14:03,063] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m100%|██████████| 5/5 [00:10<00:00,  2.27s/it][0m
[34m{'eval_loss': 0.020420636981725693, 'eval_runtime': 12.8873, 'eval_sam

[34m40%|████      | 288/720 [2:44:49<4:03:37, 33.84s/it][0m
[34m{'loss': 0.0251, 'learning_rate': 0.0001219718309859155, 'epoch': 1.6}[0m
[34m40%|████      | 288/720 [2:44:49<4:03:37, 33.84s/it][0m
[34m[2023-10-28 19:34:21,485] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m[2023-10-28 19:34:21,493] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:197] [RANK:0] generating packed batches#033[39m[0m
[34m[2023-10-28 19:34:21,493] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:197] [RANK:0] 56bcedc7505086c9ea4dc0cf83dd8be99917f944e643943c0bfe2b71a7416760#033[39m[0m
[34m[2023-10-28 19:34:21,497] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m[2023-10-28 19:34:23,993] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_e

[34m44%|████▍     | 317/720 [3:01:24<3:47:18, 33.84s/it][0m
[34m{'loss': 0.0222, 'learning_rate': 0.00011380281690140846, 'epoch': 1.76}[0m
[34m44%|████▍     | 317/720 [3:01:24<3:47:18, 33.84s/it][0m
[34m44%|████▍     | 318/720 [3:01:57<3:46:46, 33.85s/it][0m
[34m{'loss': 0.0172, 'learning_rate': 0.00011352112676056339, 'epoch': 1.76}[0m
[34m44%|████▍     | 318/720 [3:01:57<3:46:46, 33.85s/it][0m
[34m44%|████▍     | 319/720 [3:02:31<3:46:12, 33.85s/it][0m
[34m{'loss': 0.025, 'learning_rate': 0.00011323943661971832, 'epoch': 1.77}[0m
[34m44%|████▍     | 319/720 [3:02:31<3:46:12, 33.85s/it][0m
[34m44%|████▍     | 320/720 [3:03:05<3:45:38, 33.85s/it][0m
[34m{'loss': 0.0187, 'learning_rate': 0.00011295774647887324, 'epoch': 1.78}[0m
[34m44%|████▍     | 320/720 [3:03:05<3:45:38, 33.85s/it][0m
[34m45%|████▍     | 321/720 [3:03:39<3:45:03, 33.84s/it][0m
[34m{'loss': 0.0224, 'learning_rate': 0.00011267605633802819, 'epoch': 1.78}[0m
[34m45%|████▍     | 321/720 [3:0

[34m50%|█████     | 362/720 [3:27:51<4:26:22, 44.64s/it][0m
[34m{'loss': 0.0154, 'learning_rate': 0.00010112676056338028, 'epoch': 2.01}[0m
[34m50%|█████     | 362/720 [3:27:51<4:26:22, 44.64s/it][0m
[34m50%|█████     | 363/720 [3:28:25<4:06:20, 41.40s/it][0m
[34m{'loss': 0.012, 'learning_rate': 0.00010084507042253521, 'epoch': 2.01}[0m
[34m50%|█████     | 363/720 [3:28:25<4:06:20, 41.40s/it][0m
[34m51%|█████     | 364/720 [3:28:59<3:52:09, 39.13s/it][0m
[34m{'loss': 0.0181, 'learning_rate': 0.00010056338028169015, 'epoch': 2.02}[0m
[34m51%|█████     | 364/720 [3:28:59<3:52:09, 39.13s/it][0m
[34m51%|█████     | 365/720 [3:29:32<3:42:05, 37.54s/it][0m
[34m{'loss': 0.0193, 'learning_rate': 0.00010028169014084508, 'epoch': 2.02}[0m
[34m51%|█████     | 365/720 [3:29:32<3:42:05, 37.54s/it][0m
[34m51%|█████     | 366/720 [3:30:06<3:34:53, 36.42s/it][0m
[34m{'loss': 0.0172, 'learning_rate': 0.0001, 'epoch': 2.03}[0m
[34m51%|█████     | 366/720 [3:30:06<3:34:53, 36

[34m[2023-10-28 20:36:43,635] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m80%|████████  | 4/5 [00:07<00:02,  2.09s/it][0m
[34m[2023-10-28 20:36:46,227] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m100%|██████████| 5/5 [00:10<00:00,  2.27s/it][0m
[34m{'eval_loss': 0.01725311391055584, 'eval_runtime': 12.8838, 'eval_samples_per_second': 61.007, 'eval_steps_per_second': 30.503, 'epoch': 2.2}[0m
[34m55%|█████▌    | 396/720 [3:47:14<3:02:41, 33.83s/it][0m
[34m100%|██████████| 5/5 [00:10<00:00,  2.27s/it][0m
[34m55%|█████▌    | 397/720 [3:47:48<3:22:57, 37.70s/it][0m
[34m{'loss': 0.0168, 'learning_rate': 9.126760563380283e-05, 'epoch': 2.2}[0m
[34m55%|█████▌    | 397/720 [3:47:48<3:22:57, 37.70s/it][0m
[34m55%|█████▌    | 398/720 [3:48:22<3:16:06, 36.54s/it][0m
[3

[34m[2023-10-28 20:57:09,405] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m40%|████      | 2/5 [00:02<00:03,  1.28s/it][0m
[34m[2023-10-28 20:57:11,967] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m60%|██████    | 3/5 [00:05<00:03,  1.82s/it][0m
[34m[2023-10-28 20:57:14,529] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m80%|████████  | 4/5 [00:07<00:02,  2.09s/it][0m
[34m[2023-10-28 20:57:17,121] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m100%|██████████| 5/5 [00:10<00:00,  2.27s/it][0m
[34m{'eval_loss': 0.018914448097348213, 'eval_runtime': 12.8941, 'eval_sam

[34m65%|██████▌   | 468/720 [4:28:03<2:22:08, 33.84s/it][0m
[34m{'loss': 0.0142, 'learning_rate': 7.126760563380283e-05, 'epoch': 2.6}[0m
[34m65%|██████▌   | 468/720 [4:28:03<2:22:08, 33.84s/it][0m
[34m[2023-10-28 21:17:35,432] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m[2023-10-28 21:17:35,440] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:197] [RANK:0] generating packed batches#033[39m[0m
[34m[2023-10-28 21:17:35,440] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:197] [RANK:0] 56bcedc7505086c9ea4dc0cf83dd8be99917f944e643943c0bfe2b71a7416760#033[39m[0m
[34m[2023-10-28 21:17:35,444] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m[2023-10-28 21:17:37,940] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_e

[34m69%|██████▉   | 497/720 [4:44:37<2:05:46, 33.84s/it][0m
[34m{'loss': 0.0177, 'learning_rate': 6.309859154929578e-05, 'epoch': 2.76}[0m
[34m69%|██████▉   | 497/720 [4:44:37<2:05:46, 33.84s/it][0m
[34m69%|██████▉   | 498/720 [4:45:11<2:05:12, 33.84s/it][0m
[34m{'loss': 0.0183, 'learning_rate': 6.28169014084507e-05, 'epoch': 2.76}[0m
[34m69%|██████▉   | 498/720 [4:45:11<2:05:12, 33.84s/it][0m
[34m69%|██████▉   | 499/720 [4:45:45<2:04:38, 33.84s/it][0m
[34m{'loss': 0.0146, 'learning_rate': 6.253521126760565e-05, 'epoch': 2.77}[0m
[34m69%|██████▉   | 499/720 [4:45:45<2:04:38, 33.84s/it][0m
[34m69%|██████▉   | 500/720 [4:46:19<2:04:03, 33.83s/it][0m
[34m{'loss': 0.0177, 'learning_rate': 6.225352112676056e-05, 'epoch': 2.77}[0m
[34m69%|██████▉   | 500/720 [4:46:19<2:04:03, 33.83s/it][0m
[34m70%|██████▉   | 501/720 [4:46:53<2:03:29, 33.83s/it][0m
[34m{'loss': 0.0175, 'learning_rate': 6.197183098591549e-05, 'epoch': 2.78}[0m
[34m70%|██████▉   | 501/720 [4:46:53<

[34m73%|███████▎  | 527/720 [5:01:45<1:48:50, 33.84s/it][0m
[34m{'loss': 0.0201, 'learning_rate': 5.464788732394367e-05, 'epoch': 2.92}[0m
[34m73%|███████▎  | 527/720 [5:01:45<1:48:50, 33.84s/it][0m
[34m73%|███████▎  | 528/720 [5:02:19<1:48:16, 33.84s/it][0m
[34m{'loss': 0.0152, 'learning_rate': 5.43661971830986e-05, 'epoch': 2.93}[0m
[34m73%|███████▎  | 528/720 [5:02:19<1:48:16, 33.84s/it][0m
[34m73%|███████▎  | 529/720 [5:02:53<1:47:43, 33.84s/it][0m
[34m{'loss': 0.0154, 'learning_rate': 5.408450704225352e-05, 'epoch': 2.93}[0m
[34m73%|███████▎  | 529/720 [5:02:53<1:47:43, 33.84s/it][0m
[34m74%|███████▎  | 530/720 [5:03:27<1:47:08, 33.84s/it][0m
[34m{'loss': 0.015, 'learning_rate': 5.380281690140845e-05, 'epoch': 2.94}[0m
[34m74%|███████▎  | 530/720 [5:03:27<1:47:08, 33.84s/it][0m
[34m74%|███████▍  | 531/720 [5:04:01<1:46:34, 33.84s/it][0m
[34m{'loss': 0.0165, 'learning_rate': 5.352112676056338e-05, 'epoch': 2.95}[0m
[34m74%|███████▍  | 531/720 [5:04:01<1

[34m77%|███████▋  | 553/720 [5:17:14<1:34:44, 34.04s/it][0m
[34m{'loss': 0.0119, 'learning_rate': 4.7323943661971834e-05, 'epoch': 3.07}[0m
[34m77%|███████▋  | 553/720 [5:17:14<1:34:44, 34.04s/it][0m
[34m77%|███████▋  | 554/720 [5:17:48<1:34:00, 33.98s/it][0m
[34m{'loss': 0.0127, 'learning_rate': 4.704225352112676e-05, 'epoch': 3.07}[0m
[34m77%|███████▋  | 554/720 [5:17:48<1:34:00, 33.98s/it][0m
[34m77%|███████▋  | 555/720 [5:18:22<1:33:19, 33.94s/it][0m
[34m{'loss': 0.0136, 'learning_rate': 4.676056338028169e-05, 'epoch': 3.08}[0m
[34m77%|███████▋  | 555/720 [5:18:22<1:33:19, 33.94s/it][0m
[34m77%|███████▋  | 556/720 [5:18:56<1:32:40, 33.91s/it][0m
[34m{'loss': 0.0133, 'learning_rate': 4.647887323943662e-05, 'epoch': 3.08}[0m
[34m77%|███████▋  | 556/720 [5:18:56<1:32:40, 33.91s/it][0m
[34m77%|███████▋  | 557/720 [5:19:30<1:32:03, 33.89s/it][0m
[34m{'loss': 0.0142, 'learning_rate': 4.619718309859155e-05, 'epoch': 3.09}[0m
[34m77%|███████▋  | 557/720 [5:19:3

[34m81%|████████  | 582/720 [5:33:49<1:19:19, 34.49s/it][0m
[34m{'loss': 0.0129, 'learning_rate': 3.9154929577464786e-05, 'epoch': 3.23}[0m
[34m81%|████████  | 582/720 [5:33:49<1:19:19, 34.49s/it][0m
[34m81%|████████  | 583/720 [5:34:22<1:18:18, 34.29s/it][0m
[34m{'loss': 0.016, 'learning_rate': 3.887323943661972e-05, 'epoch': 3.23}[0m
[34m81%|████████  | 583/720 [5:34:22<1:18:18, 34.29s/it][0m
[34m81%|████████  | 584/720 [5:34:56<1:17:25, 34.16s/it][0m
[34m{'loss': 0.0097, 'learning_rate': 3.859154929577465e-05, 'epoch': 3.24}[0m
[34m81%|████████  | 584/720 [5:34:56<1:17:25, 34.16s/it][0m
[34m81%|████████▏ | 585/720 [5:35:30<1:16:39, 34.07s/it][0m
[34m{'loss': 0.0146, 'learning_rate': 3.8309859154929575e-05, 'epoch': 3.25}[0m
[34m81%|████████▏ | 585/720 [5:35:30<1:16:39, 34.07s/it][0m
[34m81%|████████▏ | 586/720 [5:36:04<1:15:56, 34.00s/it][0m
[34m{'loss': 0.0102, 'learning_rate': 3.802816901408451e-05, 'epoch': 3.25}[0m
[34m81%|████████▏ | 586/720 [5:36:0

[34m85%|████████▌ | 613/720 [5:51:30<1:07:14, 37.71s/it][0m
[34m{'loss': 0.0105, 'learning_rate': 3.0422535211267606e-05, 'epoch': 3.4}[0m
[34m85%|████████▌ | 613/720 [5:51:30<1:07:14, 37.71s/it][0m
[34m85%|████████▌ | 614/720 [5:52:04<1:04:34, 36.55s/it][0m
[34m{'loss': 0.0113, 'learning_rate': 3.0140845070422537e-05, 'epoch': 3.41}[0m
[34m85%|████████▌ | 614/720 [5:52:04<1:04:34, 36.55s/it][0m
[34m85%|████████▌ | 615/720 [5:52:38<1:02:32, 35.73s/it][0m
[34m{'loss': 0.0126, 'learning_rate': 2.9859154929577465e-05, 'epoch': 3.41}[0m
[34m85%|████████▌ | 615/720 [5:52:38<1:02:32, 35.73s/it][0m
[34m86%|████████▌ | 616/720 [5:53:12<1:00:57, 35.16s/it][0m
[34m{'loss': 0.011, 'learning_rate': 2.9577464788732395e-05, 'epoch': 3.42}[0m
[34m86%|████████▌ | 616/720 [5:53:12<1:00:57, 35.16s/it][0m
[34m86%|████████▌ | 617/720 [5:53:46<59:40, 34.76s/it][0m
[34m{'loss': 0.0139, 'learning_rate': 2.9295774647887326e-05, 'epoch': 3.42}[0m
[34m86%|████████▌ | 617/720 [5:53:4

[34m[2023-10-28 23:00:57,045] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m80%|████████  | 4/5 [00:07<00:02,  2.09s/it][0m
[34m[2023-10-28 23:00:59,638] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m100%|██████████| 5/5 [00:10<00:00,  2.27s/it][0m
[34m{'eval_loss': 0.018453557044267654, 'eval_runtime': 12.8836, 'eval_samples_per_second': 61.008, 'eval_steps_per_second': 30.504, 'epoch': 3.6}[0m
[34m90%|█████████ | 648/720 [6:11:28<40:35, 33.83s/it][0m
[34m100%|██████████| 5/5 [00:10<00:00,  2.27s/it][0m
[34m90%|█████████ | 649/720 [6:12:01<44:36, 37.70s/it][0m
[34m90%|█████████ | 649/720 [6:12:01<44:36, 37.70s/it][0m
[34m{'loss': 0.0134, 'learning_rate': 2.028169014084507e-05, 'epoch': 3.6}[0m
[34m90%|█████████ | 650/720 [6:12:35<42:37, 36.54s/it][0m
[34m{'los

[34m[2023-10-28 23:21:28,039] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m80%|████████  | 4/5 [00:07<00:02,  2.09s/it][0m
[34m[2023-10-28 23:21:30,632] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m100%|██████████| 5/5 [00:10<00:00,  2.27s/it][0m
[34m{'eval_loss': 0.01807444542646408, 'eval_runtime': 12.8845, 'eval_samples_per_second': 61.003, 'eval_steps_per_second': 30.502, 'epoch': 3.79}[0m
[34m95%|█████████▌| 684/720 [6:31:59<20:18, 33.84s/it][0m
[34m100%|██████████| 5/5 [00:10<00:00,  2.27s/it][0m
[34m95%|█████████▌| 685/720 [6:32:32<21:59, 37.70s/it][0m
[34m95%|█████████▌| 685/720 [6:32:32<21:59, 37.70s/it][0m
[34m{'loss': 0.0135, 'learning_rate': 1.0140845070422535e-05, 'epoch': 3.8}[0m
[34m95%|█████████▌| 686/720 [6:33:06<20:42, 36.54s/it][0m
[34m95%|

[34m[2023-10-28 23:41:58,950] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m80%|████████  | 4/5 [00:07<00:02,  2.09s/it][0m
[34m[2023-10-28 23:42:01,543] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:197] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 57782#033[39m[0m
[34m100%|██████████| 5/5 [00:10<00:00,  2.27s/it][0m
[34m{'eval_loss': 0.017803680151700974, 'eval_runtime': 12.8854, 'eval_samples_per_second': 60.999, 'eval_steps_per_second': 30.5, 'epoch': 3.99}[0m
[34m100%|██████████| 720/720 [6:52:29<00:00, 33.83s/it][0m
[34m100%|██████████| 5/5 [00:10<00:00,  2.27s/it][0m
[34m{'train_runtime': 24786.4228, 'train_samples_per_second': 12.554, 'train_steps_per_second': 0.029, 'train_loss': 0.03561719365987099, 'epoch': 3.99}[0m
[34m100%|██████████| 720/720 [6:53:06<00:00, 33.83s/it][0m
[34m100%|██████████| 720/720 [6:53:06<00:00, 34.43

## Check Tensorboard report

In [26]:
f"s3://{bucket}/{s3_prefix}/tensorboard/{str_time}"

's3://sagemaker-us-west-2-376678947624/llama2-7b-spider/tensorboard/2023-10-28-16-34-30'

## Test model performance before and after fine tuning

In [27]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    local_model_path,
    load_in_8bit=True,
    torch_dtype=torch.float16,
    device_map="auto",
)

INFO:accelerate.utils.modeling:We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [28]:
tokenizer = AutoTokenizer.from_pretrained(local_model_path)

### Before fine tuning

In [30]:
eval_prompt = """You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.

You must output the SQL query that answers the question.
### Input:
Which Class has a Frequency MHz larger than 91.5, and a City of license of hyannis, nebraska?

### Context:
CREATE TABLE table_name_12 (class VARCHAR, frequency_mhz VARCHAR, city_of_license VARCHAR)

### Response:
"""
# {'question': 'Name the comptroller for office of prohibition', 'context': 'CREATE TABLE table_22607062_1 (comptroller VARCHAR, ticket___office VARCHAR)', 'answer': 'SELECT comptroller FROM table_22607062_1 WHERE ticket___office = "Prohibition"'}
model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))

You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.

You must output the SQL query that answers the question.
### Input:
Which Class has a Frequency MHz larger than 91.5, and a City of license of hyannis, nebraska?

### Context:
CREATE TABLE table_name_12 (class VARCHAR, frequency_mhz VARCHAR, city_of_license VARCHAR)

### Response:
SELECT * FROM table_name_12 WHERE frequency_mhz > 91.5 AND city_of_license = 'hyannis, nebraska'

### Input:
Which Class has a Frequency MHz larger than 91.5, and a City of license of hyannis, nebraska?

### Context:
CREATE TABLE table_name_12 (class VARCHAR, frequency_mhz


### After fine tuning

In [33]:
lora_path = estimator.model_data
# lora_path = "s3://sagemaker-us-west-2-376678947624/pytorch-training-2023-10-28-16-34-30-833/output/model.tar.gz"

In [34]:
!aws s3 cp {lora_path} .

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
download: s3://sagemaker-us-west-2-376678947624/pytorch-training-2023-10-28-16-34-30-833/output/model.tar.gz to ./model.tar.gz


In [35]:
!mkdir -p lora

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [36]:
!tar -xzf model.tar.gz -C lora

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
tar: Ignoring unknown extended header keyword `LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword `LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword `LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword `LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword `LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword `LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword `LIBARCHIVE.creationtime'


In [37]:
from peft import PeftModel
model = PeftModel.from_pretrained(model, "lora")

In [38]:
eval_prompt = """You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.

You must output the SQL query that answers the question.
### Input:
Which Class has a Frequency MHz larger than 91.5, and a City of license of hyannis, nebraska?

### Context:
CREATE TABLE table_name_12 (class VARCHAR, frequency_mhz VARCHAR, city_of_license VARCHAR)

### Response:
"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))

You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.

You must output the SQL query that answers the question.
### Input:
Which Class has a Frequency MHz larger than 91.5, and a City of license of hyannis, nebraska?

### Context:
CREATE TABLE table_name_12 (class VARCHAR, frequency_mhz VARCHAR, city_of_license VARCHAR)

### Response:
 SELECT class FROM table_name_12 WHERE frequency_mhz > 91.5 AND city_of_license = "hyannis, nebraska"
