# Fine-Tuning a Qwen 1.5 Model and Logging to a Model Registry

This notebook demonstrates the process of fine-tuning a small-scale Qwen model (`Qwen/Qwen1.5-0.5B-Chat`) on a public instruction-based dataset. We will use Parameter-Efficient Fine-Tuning (PEFT) with LoRA to make the process memory-efficient.

**Key Steps:**
1.  **Setup**: Install required libraries and import necessary modules.
2.  **Configuration**: Define all parameters for the model, dataset, and training.
3.  **Data Preparation**: Load and prepare the dataset for instruction fine-tuning.
4.  **Model Loading and Fine-Tuning**: Load the pre-trained model and tokenizer, and then fine-tune it using `trl`'s `SFTTrainer`.
5.  **Evaluation**: Compare the performance of the base model with the fine-tuned model.
6.  **Model Logging**: Log the fine-tuned model and its metrics to a model registry.

## 1. Setup

First, we'll install the necessary Python libraries and import all the required modules for the entire workflow.

In [None]:
!pip install -q -U transformers datasets accelerate peft trl bitsandbytes

In [1]:
import torch
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, PeftModel
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
)
from trl import SFTTrainer
import frogml # Assuming frogml is the library for your JFrog integration

  from .autonotebook import tqdm as notebook_tqdm


## 2. Configuration

We'll define all our configurations in one place. This makes the notebook cleaner and easier to modify for future experiments.

In [2]:
# Model and tokenizer configuration
model_id = "Qwen/Qwen1.5-0.5B-Chat"
new_model_adapter = "qwen-0.5b-devops-adapter"

# Dataset configuration
dataset_name = "Szaid3680/Devops"

# LoRA configuration
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

# Training arguments
training_args = TrainingArguments(
    output_dir="./qwen-finetuned",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=8,
    learning_rate=2e-4,
    logging_steps=10,
    max_steps=100,
    fp16=False, # Ensure this is False for CPU/MPS
)

## 3. Data Preparation

We will load the `Szaid3680/Devops` dataset, split it into training and evaluation sets, and define a formatting function for instruction-based fine-tuning.

In [3]:
dataset = load_dataset(dataset_name, split="train")
dataset = dataset.train_test_split(test_size=0.1)
train_dataset = dataset["train"]
eval_dataset = dataset["test"]

# For a quick demo, we'll use a small subset of the data
train_dataset = train_dataset.select(range(2))
eval_dataset = eval_dataset.select(range(2))

def format_instruction(example):
    """Formats the dataset examples into a structured prompt."""
    instruction = example.get('Instruction', '')
    inp = example.get('Prompt', '')
    response = example.get('Response', '')
    
    full_prompt = f"<s>[INST] {instruction}\n{inp} [/INST] {response} </s>"
    return full_prompt

# Let's look at a sample from the training set
print("Sample from the training dataset:")
print(train_dataset[0])

Sample from the training dataset:
{'Response': "\n\n\n\n\n\n\n\n            0\n        \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nI strongly recommend you to keep track of user's data on your server. \nThat way, even if the user deletes the app, the data will still be available after retrieving it from the server. Also, this will make you able to sync the same data between different platforms, not only devices.\nAfter you retrieve the data, you might want to store it in NSUserDefaults and updated when needed.\n\n\n\n\n\n\n\n\nShare\n\n\nImprove this answer\n\n\n\n                        Follow\n                    \n\n\n\n\n\n\n\n\n\n            answered May 22, 2014 at 6:46\n\n\n\n\n\n\nSebydddSebyddd\n\n4,30522 gold badges3939 silver badges4343 bronze badges\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nAdd a comment\n\xa0|\xa0\n\n\n\n\n", 'Instruction': '\nI have implemented in-app-purchases like Mission packs or "full version" before. However, I am now looking into selling in-game credits. \nWhat are 

## 4. Model Loading and Fine-Tuning

Now, we'll load the base model and tokenizer. Then, we will apply the LoRA configuration and start the fine-tuning process.

In [4]:

tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token


In [10]:
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cpu" # Use CPU for local demo
)
# Apply LoRA configuration to the model
model = get_peft_model(model, lora_config)

# Create the SFTTrainer
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    peft_config=lora_config,
    formatting_func=format_instruction,
    args=training_args,
)

print("--- Starting Fine-Tuning ---")
trainer.train()
print("--- Fine-Tuning Complete ---")

Applying formatting function to train dataset: 100%|██████████| 2/2 [00:00<00:00, 124.25 examples/s]
Converting train dataset to ChatML: 100%|██████████| 2/2 [00:00<00:00, 854.76 examples/s]
Adding EOS to train dataset: 100%|██████████| 2/2 [00:00<00:00, 945.09 examples/s]
Tokenizing train dataset: 100%|██████████| 2/2 [00:00<00:00, 135.71 examples/s]
Truncating train dataset: 100%|██████████| 2/2 [00:00<00:00, 609.37 examples/s]
Applying formatting function to eval dataset: 100%|██████████| 2/2 [00:00<00:00, 653.83 examples/s]
Converting eval dataset to ChatML: 100%|██████████| 2/2 [00:00<00:00, 1065.49 examples/s]
Adding EOS to eval dataset: 100%|██████████| 2/2 [00:00<00:00, 899.39 examples/s]
Tokenizing eval dataset: 100%|██████████| 2/2 [00:00<00:00, 406.42 examples/s]
Truncating eval dataset: 100%|██████████| 2/2 [00:00<00:00, 463.48 examples/s]


--- Starting Fine-Tuning ---




Step,Training Loss
10,3.8066
20,2.2832
30,1.0085
40,0.1991
50,0.0344
60,0.011
70,0.0083
80,0.0073
90,0.0068
100,0.0069


--- Fine-Tuning Complete ---


## 5. Evaluation

Let's evaluate the fine-tuned model and compare its response to the base model's response for a sample DevOps-related prompt.

In [11]:
metrics = trainer.evaluate()
print("--- Evaluation Metrics ---")
print(metrics)



--- Evaluation Metrics ---
{'eval_loss': 5.667745113372803, 'eval_runtime': 3.329, 'eval_samples_per_second': 0.601, 'eval_steps_per_second': 0.3}


In [7]:
# Save the trained model adapter
trainer.model.save_pretrained(new_model_adapter)

In [5]:
# Merge the LoRA adapter with the base model for easy inference
base_model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cpu")
finetuned_model = PeftModel.from_pretrained(base_model, new_model_adapter)
finetuned_model = finetuned_model.merge_and_unload()

# Define a prompt for evaluation
prompt = "How do I expose a deployment in Kubernetes using a service?"
messages = [
    {"role": "system", "content": "You are a helpful DevOps assistant."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=True
)

# Generate response from the fine-tuned model
print("------------------- FINE-TUNED MODEL RESPONSE -------------------")
model_inputs = tokenizer([text], return_tensors="pt").to("cpu")
generated_ids = finetuned_model.generate(model_inputs.input_ids, max_new_tokens=256)
response_finetuned = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(response_finetuned)

# Generate response from the original base model for comparison
print("\n------------------- BASE MODEL RESPONSE -------------------")
original_model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cpu")
generated_ids_base = original_model.generate(model_inputs.input_ids, max_new_tokens=256)
response_base = tokenizer.decode(generated_ids_base[0], skip_special_tokens=True)
print(response_base)

Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
  warn("The installed version of bitsandbytes was compiled without GPU support. "


'NoneType' object has no attribute 'cadam32bit_grad_fp32'
------------------- FINE-TUNED MODEL RESPONSE -------------------
system
You are a helpful DevOps assistant.
user
How do I expose a deployment in Kubernetes using a service?
assistant
To expose a deployment in Kubernetes using a service, you can follow these steps:

1. Define the desired service: Start by defining the service that you want to expose. This should include the name of the service, the API group that it belongs to (if applicable), and any other details that you need to specify.

2. Create the Kubernetes API resource: Use the Kubernetes CLI to create an API resource for your service. This will involve creating a secret with the name "service_name" and setting its value to the desired service name.

3. Add the app definition: In the API resource, add the app definition for your deployment. This includes specifying the desired configuration (e.g., resources, API version, etc.) and any additional settings that you need 

## 6. Model Logging

Finally, we log our fine-tuned model, its tokenizer, and the evaluation metrics to the model registry.

In [None]:
# REPLACE WITH YOUR OWN FILESYSTEM BASE PATH WHERE THE PROJECTS RESIDE
base_projects_directory = "your_projects_path"

try:
    import frogml

    frogml.huggingface.log_model(   
    model= finetuned_model,
        tokenizer= tokenizer,
        repository="llms",    # The JFrog repository to upload the model to.
        model_name="finetuned_qwen",     # The uploaded model name
        version="",     # Optional. The uploaded model version
        parameters={"finetuning-dataset": dataset_name},
        code_dir=f"{base_projects_directory}/qwak-examples/llm_finetuning/code_dir",
        dependencies=[f"{base_projects_directory}/qwak-examples/llm_finetuning/main/conda.yaml"],
        metrics = metrics,
        predict_file=f"{base_projects_directory}/qwak-examples/llm_finetuning/code_dir/predict.py"
        )
    print("--- Model Logged Successfully ---")
except Exception as e:
    print(f"An error occurred during model logging: {e}")

INFO:frogml.sdk.model_version.utils.model_log_config:No version provided; using current datetime as the version
INFO:HuggingfaceModelVersionManager:Logging model finetuned_qwen to llms
INFO:JmlCustomerClient:Customer exists in JML.
INFO:JmlCustomerClient:Getting project key for repository llms
INFO:frogml.sdk.model_version.utils.files_tools:Code directory, predict file and dependencies are provided. Setup template files for model_name finetuned_qwen
/private/var/folders/mt/wvz9xr_s7k3cwk3r0b96hyn00000gn/T/tmpm7sazn05/finetuned_qwen.pretrained_model/added_tokens.json: 100%|██████████| 80.0/80.0 [00:00<00:00, 1.78MB/s]
/private/var/folders/mt/wvz9xr_s7k3cwk3r0b96hyn00000gn/T/tmpm7sazn05/finetuned_qwen.pretrained_model/tokenizer_config.json: 100%|██████████| 1.33k/1.33k [00:00<00:00, 49.7MB/s]
/private/var/folders/mt/wvz9xr_s7k3cwk3r0b96hyn00000gn/T/tmpm7sazn05/finetuned_qwen.pretrained_model/config.json: 100%|██████████| 729/729 [00:00<00:00, 24.3MB/s]
/private/var/folders/mt/wvz9xr_s7k3

2025-08-05 12:45:51,638 - INFO - frogml.storage.logging._log_config.frog_ml.__upload_model:533 - Model: "finetuned_qwen", version: "2025-08-05-09-45-26-787" has been uploaded successfully





--- Model Logged Successfully ---
