# Fine-Tuning a Qwen 1.5 Model and Logging to a Model Registry

This notebook demonstrates the process of fine-tuning a small-scale Qwen model (`Qwen/Qwen1.5-0.5B-Chat`) on a public instruction-based dataset. We will use Parameter-Efficient Fine-Tuning (PEFT) with LoRA to make the process memory-efficient.

**Key Steps:**
1.  **Setup**: Install required libraries and import necessary modules.
2.  **Configuration**: Define all parameters for the model, dataset, and training.
3.  **Data Preparation**: Load and prepare the dataset for instruction fine-tuning.
4.  **Model Loading and Fine-Tuning**: Load the pre-trained model and tokenizer, and then fine-tune it using `trl`'s `SFTTrainer`.
5.  **Evaluation**: Compare the performance of the base model with the fine-tuned model.
6.  **Model Logging**: Log the fine-tuned model and its metrics to a model registry.

## 1. Setup

First, we'll install the necessary Python libraries and import all the required modules for the entire workflow.

In [None]:
!pip install -q -U transformers datasets accelerate peft trl bitsandbytes

In [1]:
import torch
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, PeftModel
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
)
from trl import SFTTrainer
import frogml # Assuming frogml is the library for your JFrog integration

  from .autonotebook import tqdm as notebook_tqdm


## 2. Configuration

We'll define all our configurations in one place. This makes the notebook cleaner and easier to modify for future experiments.

In [None]:
# Model and tokenizer configuration
model_id = "Qwen/Qwen1.5-0.5B-Chat"
new_model_adapter = "qwen-0.5b-devops-adapter"

# Dataset configuration
dataset_name = "Szaid3680/Devops"

# LoRA configuration
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

# Training arguments
training_args = TrainingArguments(
    output_dir="./qwen-finetuned",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=8,
    learning_rate=2e-4,
    logging_steps=10,
    max_steps=1,
    fp16=False, # Ensure this is False for CPU/MPS
)

## 3. Data Preparation

We will load the `Szaid3680/Devops` dataset, split it into training and evaluation sets, and define a formatting function for instruction-based fine-tuning.

In [4]:
dataset = load_dataset(dataset_name, split="train")
dataset = dataset.train_test_split(test_size=0.1)
train_dataset = dataset["train"]
eval_dataset = dataset["test"]

# For a quick demo, we'll use a small subset of the data
train_dataset = train_dataset.select(range(2))
eval_dataset = eval_dataset.select(range(2))

def format_instruction(example):
    """Formats the dataset examples into a structured prompt."""
    instruction = example.get('Instruction', '')
    inp = example.get('Prompt', '')
    response = example.get('Response', '')
    
    full_prompt = f"<s>[INST] {instruction}\n{inp} [/INST] {response} </s>"
    return full_prompt

# Let's look at a sample from the training set
print("Sample from the training dataset:")
print(train_dataset[0])

Sample from the training dataset:
{'Response': 'yes, as thekubectl and kptsays, the first step in getting prepared to install cluster is installinggcloudthat is CLI that manages authentication, local configuration, developer workflow, interactions withGoogle Cloud APIs.\nWithout is you simply cant work with objects(in your case you need to enablekpt anthoscli beta) and perform tasks likecreating a Compute Engine VM instance, managing a Google Kubernetes\nEngine cluster, and deploying an App Engine application, either from\nthe command line or in scripts and other automations..', 'Instruction': 'I am trying to use KubeFlow on GCP and I am following thiscodelab, but "click-to-deploy" is no longer supported so I followed the documentation of "kubectl and kpt". However, I keep getting this "You cannot perform this action because the Cloud SDK component manager is disabled for this installation." error and none of the solutions I found worked. I have 2 other friends told me they tried to ma

## 4. Model Loading and Fine-Tuning

Now, we'll load the base model and tokenizer. Then, we will apply the LoRA configuration and start the fine-tuning process.

In [5]:

tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token


In [6]:
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cpu" # Use CPU for local demo
)
# Apply LoRA configuration to the model
model = get_peft_model(model, lora_config)

# Create the SFTTrainer
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    peft_config=lora_config,
    formatting_func=format_instruction,
    args=training_args,
)

print("--- Starting Fine-Tuning ---")
trainer.train()
print("--- Fine-Tuning Complete ---")

  warn("The installed version of bitsandbytes was compiled without GPU support. "


'NoneType' object has no attribute 'cadam32bit_grad_fp32'


Applying formatting function to train dataset: 100%|██████████| 2/2 [00:00<00:00, 287.99 examples/s]
Converting train dataset to ChatML: 100%|██████████| 2/2 [00:00<00:00, 782.81 examples/s]
Adding EOS to train dataset: 100%|██████████| 2/2 [00:00<00:00, 848.19 examples/s]
Tokenizing train dataset: 100%|██████████| 2/2 [00:00<00:00, 109.06 examples/s]
Truncating train dataset: 100%|██████████| 2/2 [00:00<00:00, 602.41 examples/s]
Applying formatting function to eval dataset: 100%|██████████| 2/2 [00:00<00:00, 625.18 examples/s]
Converting eval dataset to ChatML: 100%|██████████| 2/2 [00:00<00:00, 1086.33 examples/s]
Adding EOS to eval dataset: 100%|██████████| 2/2 [00:00<00:00, 1113.58 examples/s]
Tokenizing eval dataset: 100%|██████████| 2/2 [00:00<00:00, 422.49 examples/s]
Truncating eval dataset: 100%|██████████| 2/2 [00:00<00:00, 875.64 examples/s]


--- Starting Fine-Tuning ---




Step,Training Loss
10,3.3769


--- Fine-Tuning Complete ---


## 5. Evaluation

Let's evaluate the fine-tuned model and compare its response to the base model's response for a sample DevOps-related prompt.

In [7]:
metrics = trainer.evaluate()
print("--- Evaluation Metrics ---")
print(metrics)



--- Evaluation Metrics ---
{'eval_loss': 3.707406520843506, 'eval_runtime': 3.3473, 'eval_samples_per_second': 0.598, 'eval_steps_per_second': 0.299}


In [8]:
# Save the trained model adapter
trainer.model.save_pretrained(new_model_adapter)

In [9]:
# Merge the LoRA adapter with the base model for easy inference
base_model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cpu")
finetuned_model = PeftModel.from_pretrained(base_model, new_model_adapter)
finetuned_model = finetuned_model.merge_and_unload()

# Define a prompt for evaluation
prompt = "How do I expose a deployment in Kubernetes using a service?"
messages = [
    {"role": "system", "content": "You are a helpful DevOps assistant."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=True
)


# Generate response from the fine-tuned model
print("------------------- FINE-TUNED MODEL RESPONSE -------------------")
model_inputs = tokenizer([text], return_tensors="pt").to("cpu")
# Store the length of the input prompt tokens
input_ids_len = model_inputs['input_ids'].shape[1]
generated_ids = finetuned_model.generate(model_inputs.input_ids, max_new_tokens=256)

# We keep all batch items (:) and slice each one from the end of the input length onwards.
response_only_ids = generated_ids[:, input_ids_len:]
response_finetuned = tokenizer.decode(response_only_ids[0], skip_special_tokens=True)
print(response_finetuned)

# Generate response from the original base model for comparison
print("\n------------------- BASE MODEL RESPONSE -------------------")
original_model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cpu")
generated_ids_base = original_model.generate(model_inputs.input_ids, max_new_tokens=256)
response_only_ids_base = generated_ids_base[:, input_ids_len:]
response_base = tokenizer.decode(response_only_ids_base[0], skip_special_tokens=True)
print(response_base)

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


------------------- FINE-TUNED MODEL RESPONSE -------------------
To expose a deployment in Kubernetes using a service, you can follow these steps:

  1. Create a Kubernetes deployment for the service by running `kubectl create deployment deployment_name` in the terminal.
  2. Define the service and its labels, such as `apiVersion: apps/v1`, `kind: Deployment`, and `metadata如下:
```
        type: Deployment
      spec:
        selector:
          matchLabels:
            app: my-app
          namespace: default
        template:
          metadata:
            labels:
              app: my-app
          spec:
            containers:
              - name: my-app
                image: my-app:latest
                ports:
                  - containerPort: 80
```

In this example, we define a `Deployment` with an ID of `deployment_name`. We also specify that the deployment should be applied to the `default` namespace.

  3. Run `kubectl apply -f deployment.yaml` in the terminal to create 

## 6. Model Logging

Finally, we log our fine-tuned model, its tokenizer, and the evaluation metrics to the model registry.

In [None]:
# REPLACE WITH YOUR OWN FILESYSTEM BASE PATH WHERE THE PROJECTS RESIDE
base_projects_directory = "/root/jfrog"  # change it to your own projects path where the examples repo was cloned to

try:
    import frogml

    frogml.huggingface.log_model(   
    model= finetuned_model,
        tokenizer= tokenizer,
        repository="llm",    # The JFrog repository to upload the model to.
        model_name="finetuned_qwen",     # The uploaded model name
        version="",     # Optional. The uploaded model version
        parameters={"finetuning-dataset": dataset_name},
        code_dir=f"{base_projects_directory}/qwak-examples/llm_finetuning/code_dir",
        dependencies=[f"{base_projects_directory}/qwak-examples/llm_finetuning/main/conda.yaml"],
        metrics = metrics,
        predict_file=f"{base_projects_directory}/qwak-examples/llm_finetuning/code_dir/predict.py"
        )
    print("--- Model Logged Successfully ---")
except Exception as e:
    print(f"An error occurred during model logging: {e}")

INFO:frogml.sdk.model_version.utils.model_log_config:No version provided; using current datetime as the version
INFO:HuggingfaceModelVersionManager:Logging model finetuned_llm to llms
INFO:JmlCustomerClient:Customer exists in JML.
INFO:JmlCustomerClient:Getting project key for repository llms
INFO:frogml.sdk.model_version.utils.files_tools:Code directory, predict file and dependencies are provided. Setup template files for model_name finetuned_llm
/private/var/folders/mt/wvz9xr_s7k3cwk3r0b96hyn00000gn/T/tmp8a70qeg9/finetuned_llm.pretrained_model/tokenizer_config.json: 100%|██████████| 970/970 [00:00<00:00, 4.20MB/s]
/private/var/folders/mt/wvz9xr_s7k3cwk3r0b96hyn00000gn/T/tmp8a70qeg9/finetuned_llm.pretrained_model/special_tokens_map.json: 100%|██████████| 250/250 [00:00<00:00, 280kB/s]
/private/var/folders/mt/wvz9xr_s7k3cwk3r0b96hyn00000gn/T/tmp8a70qeg9/finetuned_llm.pretrained_model/added_tokens.json: 100%|██████████| 80.0/80.0 [00:00<00:00, 1.63MB/s]
/private/var/folders/mt/wvz9xr_s7

2025-08-20 18:00:19,084 - INFO - frogml.storage.logging._log_config.frog_ml.__upload_model:533 - Model: "finetuned_llm", version: "2025-08-20-14-55-37-138" has been uploaded successfully





--- Model Logged Successfully ---
