# Week 7 Day 1

Fine-tune an open-source model to Predict Product Prices

Please see this notebook in Google Colab:

https://colab.research.google.com/drive/15rqdMTJwK76icPBxNoqhI7Ww8UM-Y7ni?usp=sharing

In [18]:
!pip install -q bitsandbytes


In [19]:
!pip install -q pydrive gdown

In [6]:
# imports

import os
import re
import math
import torch
import transformers

from tqdm import tqdm
from dotenv import load_dotenv
from huggingface_hub import login

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, set_seed
from peft import LoraConfig, PeftModel
from datetime import datetime

In [7]:
# environment

load_dotenv()
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')
os.environ['ANTHROPIC_API_KEY'] = os.getenv('ANTHROPIC_API_KEY', 'your-key-if-not-using-env')
os.environ['HF_TOKEN'] = os.getenv('HF_TOKEN', 'your-key-if-not-using-env')

In [8]:
# Log in to HuggingFace

hf_token = os.environ['HF_TOKEN']
login(hf_token, add_to_git_credential=True)

Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


In [9]:
%matplotlib inline

In [10]:
# Constants

BASE_MODEL = "meta-llama/Meta-Llama-3.1-8B"
FINETUNED_MODEL = f"nndk91/ft:gpt-4o-mini-2024-07-18:personal:pricer:AsnYjTRg"

# Hyperparameters for QLoRA Fine-Tuning

LORA_R = 32
LORA_ALPHA = 64
TARGET_MODULES = ["q_proj", "v_proj", "k_proj", "o_proj"]

In [11]:
base_model = AutoModelForCausalLM.from_pretrained(BASE_MODEL, device_map="auto")

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Some parameters are on the meta device because they were offloaded to the disk and cpu.


In [12]:
print(f"Memory footprint: {base_model.get_memory_footprint() / 1e9:,.1f} GB")

Memory footprint: 32.1 GB


In [21]:
!pip install -q "transformers>=4.45.1"

In [25]:
# Load the Base Model on CPU
model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    device_map="cpu",  # Ensure the model runs on CPU
    torch_dtype="float32"
)

# Apply dynamic quantization for CPU inference
quantized_model = torch.quantization.quantize_dynamic(
    model,  # Model to quantize
    {torch.nn.Linear},  # Specify the layers to quantize
    dtype=torch.qint8  # Quantize to 8-bit integers
)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

OSError: The paging file is too small for this operation to complete. (os error 1455)

In [None]:
# Load the Base Model using 8 bit

quant_config = BitsAndBytesConfig(load_in_8bit=True)

base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=quant_config,
    device_map="auto",
)

In [14]:
print(f"Memory footprint: {base_model.get_memory_footprint() / 1e9:,.1f} GB")

Memory footprint: 32.1 GB
