Now let's get to the fun part -- training a model. I'll start by installing the dependencies.


In [1]:
%%capture
%pip install peft==0.5.0 python-dotenv==2.0.0

!git clone https://github.com/OpenAccess-AI-Collective/axolotl
%pip install -e "./axolotl[flash-attn]"

Note to the reader: since we'll be basing our fine-tuned model on Meta's Llama 2, you need to apply for access to the weights (which will be automatically granted). Follow the steps on [HuggingFace](https://huggingface.co/meta-llama/Llama-2-7b-hf), then create a read-only access token [here](https://huggingface.co/settings/tokens) and copy it into your .env file.


In [3]:
import dotenv
import os

dotenv.load_dotenv()

has_token = os.getenv("HUGGING_FACE_HUB_TOKEN") is not None

print(f"Hugging Face token set: {has_token}")


Hugging Face token set: True


I'll use the [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) library to manage this training run. It includes a lot of neat tricks that speed up training without sacrificing quality.

In this case I'm using 8-bit training to use less GPU RAM, and sample packing to maximize GPU utilization. You can read more about the available options at https://github.com/OpenAccess-AI-Collective/axolotl.

The training run options are defined in [training-config.yaml](./training-config.yaml).


In [8]:
!accelerate launch ./axolotl/scripts/finetune.py training-config.yaml

The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `1`
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--dynamo_backend` was set to a value of `'no'`

                           dP            dP   dP
                           88            88   88
.d8888b. dP.  .dP .d8888b. 88 .d8888b. d8888P 88
88'  `88  `8bd8'  88'  `88 88 88'  `88   88   88
88.  .88  .d88b.  88.  .88 88 88.  .88   88   88
`88888P8 dP'  `dP `88888P' dP `88888P'   dP   dP

[2023-08-24 20:18:54,867] [INFO] [axolotl.normalize_config:72] [PID:125016] GPU memory usage baseline: 0.000GB (+0.674GB misc)[39m
[2023-08-24 20:18:54,867] [INFO] [axolotl.scripts.train:189] [PID:125016] loading tokenizer... meta-llama/Llama-2-7b-hf[39m
[2023-08-24 20:18:55,078] [DEBUG] [axolotl.load_tokenizer:64] [PID:125016] EOS: 2 / </s>[39m
[2023-08-24 20:18:55,078] [DEBUG] [axolotl.load_tokenizer:65] [PID:125016]

Sweet! I now have a new directory `./models/run1`. This contains my trained model, which I can use to classify more recipes.

There's one more step though. I trained our model using [LoRA](https://huggingface.co/docs/peft/conceptual_guides/lora), which is a memory-efficient training method. But the inference library we'll use for testing doesn't support LoRA models directly yet, so we need to "merge" our LoRA model to transform it into a standard Llama2-shaped model. I've defined a small helper to do that called `merge_lora_model` that I'll use below.


In [4]:
from utils import merge_lora_model

print("Merging model (this could take a while)")
final_model_dir = merge_lora_model("training-config.yaml")
print(f"Final model saved to '{final_model_dir}'")


Merging model (this could take a while)
Loading base model


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading PEFT model
Running merge_and_unload
Model saved to ./models/run1/merged
Final model saved to './models/run1/merged'


Ok, I have a model, but is it actually any good? I'll run some evaluations in [./3-evaluate.ipynb](./3-evaluate.ipynb) to check.
