# üîç Predicting Item Prices from Descriptions (Part 6)
---
- Data Curation & Preprocessing
- Model Benchmarking ‚Äì Traditional ML vs LLMs
- E5 Embeddings & RAG
- Fine-Tuning GPT-4o Mini
- Evaluating LLaMA 3.1 8B Quantized
- ‚û°Ô∏è Fine-Tuning LLaMA 3.1 with QLoRA
- Evaluating Fine-Tuned LLaMA
- Summary & Leaderboard

---

# ‚öôÔ∏è Part 6: Fine-Tuning LLaMA 3.1 with QLoRA

- ‚öôÔ∏è Hardware: ‚ö†Ô∏è GPU required - use Google Colab
- üõ†Ô∏è Requirements: üîë HF Token, wandb API Key
- Tasks:
    - Load and split dataset (Train/validation); set up Weights & Biases logging
    - Load quantized LLaMA 3.1 8B and tokenizer
    - Prepare data with a collator for fine-tuning
    - Configure QLoRA (LoRAConfig), training settings (SFTConfig), and tune key hyperparameters
    - Fine-tune and push best model to Hugging Face Hub

‚ö†Ô∏è I attempted to fine-tune the model on the full 400K dataset using an A100 on Google Colab, but it consistently crashed. So for now, I‚Äôm training on a 20K subset to understand the process, play with hyperparameters, track progress in Weights & Biases, and push the best checkpoint to the Hub.

‚è±Ô∏è Training on 20,000 examples took over 2 hours.

The full model fine-tuned on the complete 400K dataset is available thanks to our instructor, Ed ‚Äî much appreciated!  
We‚Äôll dive into that model in the next notebook ‚Äî **stay tuned** üòâ

In [None]:
# pip installs
!pip install -q datasets transformers torch peft bitsandbytes trl accelerate wandb

[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m491.2/491.2 kB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m363.4/363.4 MB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m13.8/13.8 MB[0m [31m80.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m24.6/24.6 MB[0m [31m77.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m883.7/883.7 kB[0m [31m53.9 MB/s[0m eta [36m0:00:00[0m


In [None]:
# imports

import os, torch, wandb
from datetime import datetime
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, EarlyStoppingCallback
from peft import LoraConfig
from trl import SFTTrainer, SFTConfig, DataCollatorForCompletionOnlyLM
from google.colab import userdata

In [None]:
# Log in to HuggingFace

hf_token = userdata.get('HF_TOKEN')
login(hf_token, add_to_git_credential=True)

# üîÄ Load Dataset from HF and Split into Train/Validation

In [None]:
HF_USER = "Lizk75" # your HF name here!

DATASET_NAME = f"{HF_USER}/pricer-data"
dataset = load_dataset(DATASET_NAME)
train = dataset['train']
test = dataset['test']
split_ratio = 0.1  # 10% for validation

##############################################################################
# Optional: limit training dataset to TRAIN_SIZE for testing/debugging
# Comment the two lines below to use the full dataset
TRAIN_SIZE = 20000
train = train.select(range(TRAIN_SIZE))
##############################################################################

total_size = len(train)
val_size = int(total_size * split_ratio)

val_data = train.select(range(val_size))
train_data = train.select(range(val_size, total_size))


README.md:   0%|          | 0.00/416 [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/905k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/400000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/2000 [00:00<?, ? examples/s]

In [None]:
print(f"Train data size     : {len(train_data)}")
print(f"Validation data size: {len(val_data)}")
print(f"Test data size      : {len(test)}")

Train data size     : 18000
Validation data size: 2000
Test data size      : 2000


# üõ†Ô∏è Hugging Face Configuration

In [None]:
PROJECT_NAME = "llama3-pricer"

# Run name for saving the model in the hub

RUN_NAME =  f"{datetime.now():%Y-%m-%d_%H.%M.%S}-size{total_size}"
PROJECT_RUN_NAME = f"{PROJECT_NAME}-{RUN_NAME}"
HUB_MODEL_NAME = f"{HF_USER}/{PROJECT_RUN_NAME}"
HUB_MODEL_NAME

'Lizk75/llama3-pricer-2025-04-08_18.44.04-size20000'

# üõ†Ô∏è wandb Configuration

In [None]:
# Log in to Weights & Biases

wandb_api_key = userdata.get('WANDB_API_KEY')
os.environ["WANDB_API_KEY"] = wandb_api_key
wandb.login()

# Configure Weights & Biases to record against our project

LOG_TO_WANDB = True

os.environ["WANDB_PROJECT"] = PROJECT_NAME
os.environ["WANDB_LOG_MODEL"] = "checkpoint" if LOG_TO_WANDB else "end"
os.environ["WANDB_WATCH"] = "gradients"

if LOG_TO_WANDB:
  wandb.init(project=PROJECT_NAME, name=RUN_NAME)

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33mlizk75[0m ([33mlizk75-lisek[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


# üì• Load the Tokenizer and Model

In [None]:
BASE_MODEL = "meta-llama/Meta-Llama-3.1-8B"

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,  # Reduce the precision to 4 bits
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4"
)

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=quant_config,
    device_map="auto",
)
base_model.generation_config.pad_token_id = tokenizer.pad_token_id

print(f"Memory footprint: {base_model.get_memory_footprint() / 1e6:.1f} MB")

tokenizer_config.json:   0%|          | 0.00/50.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/826 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/185 [00:00<?, ?B/s]

Memory footprint: 5591.5 MB


# ‚öôÔ∏è Fine-tune our LLaMA 3 8B (4-bit quantized) model with QLoRA
- 1. Prepare the Data with a Data Collator
- 2. Define the QLoRA Configuration (LoraConfig)
- 3. Set the Training Parameters (SFTConfig)
- 4. Initialize the Fine-Tuning Trainer (SFTTrainer)
- 5. Run Fine-Tuning and Push to Hub

## üîÑ 1. Prepare the Data with a Data Collator

We only want the model to learn the price, not the product description. Everything before "Price is $" is context, not training target. HuggingFace‚Äôs DataCollatorForCompletionOnlyLM handles this masking automatically:

1. Tokenizes the response_template ("Price is $")
2. Finds its token position in each input
3. Masks all tokens before it (context)
4. Trains the model only on tokens after it (the price)


Example:

Input: "Product: Red T-shirt. Price is $12.99"

Masked: "Product: Red T-shirt. Price is $" ‚Üí masked (no loss)

"12.99" ‚Üí not masked (model is trained to predict this)

So the model learns to generate 12.99 given the context, but isn‚Äôt trained to repeat or memorize the description.

In [None]:
response_template = "Price is $"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)

## üß† 2. Define the QLoRA Configuration (LoraConfig)

In [None]:
LORA_R = 32
LORA_ALPHA = 64
TARGET_MODULES = ["q_proj", "v_proj", "k_proj", "o_proj"]
LORA_DROPOUT = 0.1

lora_parameters = LoraConfig(
    r=LORA_R,
    lora_alpha=LORA_ALPHA,
    target_modules=TARGET_MODULES,
    lora_dropout=LORA_DROPOUT,
    bias="none",
    task_type="CAUSAL_LM", # Specifies we're doing causal language modeling
)

## ‚öôÔ∏è 3. Set the Training Parameters (SFTConfig)

In [None]:
# üì¶ Training Setup:
EPOCHS = 1
BATCH_SIZE = 16                     # A100 GPU can go up to 16
GRADIENT_ACCUMULATION_STEPS = 2
MAX_SEQUENCE_LENGTH = 182          # Max token length per input

# ‚öôÔ∏è Optimization:
LEARNING_RATE = 1e-4
LR_SCHEDULER_TYPE = 'cosine'
WARMUP_RATIO = 0.03
OPTIMIZER = "paged_adamw_32bit"

# üíæ Checkpointing & Logging:
SAVE_STEPS = 200        # Checkpoint
STEPS = 20              # Log every 20 steps
save_total_limit = 10   # Keep latest 10 only


LOG_TO_WANDB = True

HUB_MODEL_NAME = f"{HF_USER}/{PROJECT_RUN_NAME}"

train_parameters = SFTConfig(
    # Output & Run
    output_dir=PROJECT_RUN_NAME,
    run_name=RUN_NAME,
    dataset_text_field="text",
    max_seq_length=MAX_SEQUENCE_LENGTH,

    # Training
    num_train_epochs=EPOCHS,
    per_device_train_batch_size=BATCH_SIZE,
    gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,
    max_steps=-1,
    group_by_length=True,

    # Evaluation
    eval_strategy="steps",
    eval_steps=STEPS,
    per_device_eval_batch_size=1,

    # Optimization
    learning_rate=LEARNING_RATE,
    lr_scheduler_type=LR_SCHEDULER_TYPE,
    warmup_ratio=WARMUP_RATIO,
    optim=OPTIMIZER,
    weight_decay=0.001,
    max_grad_norm=0.3,

    # Precision
    fp16=False,
    bf16=True,

    # Logging & Saving
    logging_steps=STEPS,            # See loss after each {STEP} batches
    save_strategy="steps",
    save_steps=SAVE_STEPS,          # Model Checkpointed locally
    save_total_limit=save_total_limit,
    report_to="wandb" if LOG_TO_WANDB else None,

    # Hub
    push_to_hub=True,
    hub_strategy="end",  # Only push once, at the end
    load_best_model_at_end=True, # Loads the best eval_loss checkpoint
    metric_for_best_model="eval_loss", # Monitors eval_loss
    greater_is_better=False, # Lower eval_loss = better model
)



## üß© 4. Initialize the Fine-Tuning Trainer (SFTTrainer)
Combining everything

In [None]:
# The latest version of trl is showing a warning about labels - please ignore this warning
fine_tuning = SFTTrainer(
    model=base_model,
    train_dataset=train_data,
    eval_dataset=val_data,
    peft_config=lora_parameters,    # QLoRA config
    args=train_parameters,          # SFTConfig
    data_collator=collator,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=5)] # Early stop if no val improvement for 5 steps
)

Converting train dataset to ChatML:   0%|          | 0/18000 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/18000 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/18000 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/18000 [00:00<?, ? examples/s]

Converting eval dataset to ChatML:   0%|          | 0/2000 [00:00<?, ? examples/s]

Applying chat template to eval dataset:   0%|          | 0/2000 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/2000 [00:00<?, ? examples/s]

Truncating eval dataset:   0%|          | 0/2000 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


## üöÄ 5. Run Fine-Tuning and Push to Hub

In [None]:
fine_tuning.train()
print(f"‚úÖ Best model pushed to HF Hub: {HUB_MODEL_NAME}")

Step,Training Loss,Validation Loss
20,2.2862,1.895691
40,1.92,1.881497
60,1.9196,1.884208
80,1.8686,1.862127
100,1.8868,1.85751
120,1.8824,1.858189
140,1.8457,1.856585
160,1.8345,1.853515
180,1.8566,1.848313
200,1.8552,1.840907


[34m[1mwandb[0m: Adding directory to artifact (./llama3-pricer-2025-04-08_18.44.04-size20000/checkpoint-200)... Done. 0.7s
[34m[1mwandb[0m: Adding directory to artifact (./llama3-pricer-2025-04-08_18.44.04-size20000/checkpoint-400)... Done. 0.7s
[34m[1mwandb[0m: Adding directory to artifact (./llama3-pricer-2025-04-08_18.44.04-size20000/checkpoint-562)... Done. 0.7s
No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


‚úÖ Best model pushed to HF Hub: Lizk75/llama3-pricer-2025-04-08_18.44.04-size20000


This chart shows training loss vs evaluation loss over steps during fine-tuning of Llama 31 8B 4-Bit FT (20K Samples).

- Blue line (train/loss): Decreasing overall, with some noise. Final value: 1.8596.
- Orange line (eval/loss): Smoother and consistently lower than training loss. Final value: 1.8103.

- No overfitting: Eval loss < train loss throughout ‚Äî a good sign.
- Stable convergence: Both curves flatten around step 500, suggesting the model is reaching training stability.
- Final eval loss is low, indicating decent generalization to unseen data.

This fine-tuning run looks healthy. We can likely push further with more data - 400K run.

In [None]:
if LOG_TO_WANDB:
  wandb.finish()

0,1
eval/loss,‚ñà‚ñá‚ñá‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÑ‚ñÑ‚ñÜ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ
eval/mean_token_accuracy,‚ñÉ‚ñÅ‚ñÑ‚ñÖ‚ñÑ‚ñÉ‚ñÜ‚ñÑ‚ñÑ‚ñÑ‚ñÖ‚ñÖ‚ñÖ‚ñÜ‚ñá‚ñá‚ñà‚ñá‚ñà‚ñá‚ñá‚ñá‚ñà‚ñà‚ñà‚ñá‚ñà‚ñÜ
eval/num_tokens,‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñá‚ñá‚ñá‚ñá‚ñà‚ñà
eval/runtime,‚ñÇ‚ñÉ‚ñÅ‚ñÇ‚ñÑ‚ñÉ‚ñÑ‚ñÑ‚ñÇ‚ñÑ‚ñÉ‚ñá‚ñÖ‚ñÑ‚ñÜ‚ñÇ‚ñÑ‚ñÖ‚ñà‚ñÜ‚ñÖ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñà‚ñÜ‚ñÖ
eval/samples_per_second,‚ñá‚ñÜ‚ñà‚ñá‚ñÖ‚ñÜ‚ñÖ‚ñÖ‚ñá‚ñÖ‚ñÜ‚ñÇ‚ñÑ‚ñÖ‚ñÉ‚ñá‚ñÖ‚ñÉ‚ñÅ‚ñÉ‚ñÑ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÅ‚ñÉ‚ñÑ
eval/steps_per_second,‚ñá‚ñÜ‚ñà‚ñá‚ñÖ‚ñÜ‚ñÖ‚ñÖ‚ñá‚ñÖ‚ñÜ‚ñÇ‚ñÑ‚ñÖ‚ñÉ‚ñá‚ñÖ‚ñÉ‚ñÅ‚ñÉ‚ñÑ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÅ‚ñÉ‚ñÑ
train/epoch,‚ñÅ‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñá‚ñá‚ñá‚ñá‚ñà‚ñà‚ñà
train/global_step,‚ñÅ‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñá‚ñá‚ñá‚ñá‚ñá‚ñá‚ñá‚ñà‚ñà
train/grad_norm,‚ñá‚ñà‚ñÖ‚ñÖ‚ñÑ‚ñÜ‚ñÑ‚ñÇ‚ñÑ‚ñÖ‚ñÇ‚ñÇ‚ñÑ‚ñÉ‚ñÉ‚ñÜ‚ñÅ‚ñÜ‚ñÇ‚ñÑ‚ñÉ‚ñÉ‚ñÖ‚ñÑ‚ñÑ‚ñá‚ñÇ‚ñÇ
train/learning_rate,‚ñà‚ñà‚ñà‚ñà‚ñà‚ñá‚ñá‚ñá‚ñá‚ñÜ‚ñÜ‚ñÖ‚ñÖ‚ñÖ‚ñÑ‚ñÑ‚ñÉ‚ñÉ‚ñÉ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ

0,1
eval/loss,1.80983
eval/mean_token_accuracy,0.67383
eval/num_tokens,3202344.0
eval/runtime,286.5711
eval/samples_per_second,6.979
eval/steps_per_second,6.979
total_flos,1.4527312239447245e+17
train/epoch,0.99911
train/global_step,562.0
train/grad_norm,1.17486


Now that our best model is pushed to Hugging Face, let‚Äôs put it to the test.

üîú See you in the next notebook.