# Fine-Tuning a Qwen 1.5 Model and Logging to a Model Registry

This notebook demonstrates the process of fine-tuning a small-scale Qwen model (`Qwen/Qwen1.5-0.5B-Chat`) on a public instruction-based dataset. We will use Parameter-Efficient Fine-Tuning (PEFT) with LoRA to make the process memory-efficient.

**Key Steps:**
1.  **Setup**: Install required libraries and import necessary modules.
2.  **Configuration**: Define all parameters for the model, dataset, and training.
3.  **Data Preparation**: Load and prepare the dataset for instruction fine-tuning.
4.  **Model Loading and Fine-Tuning**: Load the pre-trained model and tokenizer, and then fine-tune it using `trl`'s `SFTTrainer`.
5.  **Evaluation**: Compare the performance of the base model with the fine-tuned model.
6.  **Model Logging**: Log the fine-tuned model and its metrics to a model registry.

## 1. Setup

First, we'll install the necessary Python libraries and import all the required modules for the entire workflow.

In [None]:
!pip install -q -U transformers datasets accelerate peft trl bitsandbytes

In [1]:
import torch
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, PeftModel
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
)
from trl import SFTTrainer
import frogml # Assuming frogml is the library for your JFrog integration

  from .autonotebook import tqdm as notebook_tqdm


## 2. Configuration

We'll define all our configurations in one place. This makes the notebook cleaner and easier to modify for future experiments.

In [2]:
# Model and tokenizer configuration
model_id = "Qwen/Qwen1.5-0.5B-Chat"
new_model_adapter = "qwen-0.5b-devops-adapter"

# Dataset configuration
dataset_name = "Szaid3680/Devops"

# LoRA configuration
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

# Training arguments
training_args = TrainingArguments(
    output_dir="./qwen-finetuned",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=8,
    learning_rate=2e-4,
    logging_steps=10,
    max_steps=100,
    fp16=False, # Ensure this is False for CPU/MPS
)

## 3. Data Preparation

We will load the `Szaid3680/Devops` dataset, split it into training and evaluation sets, and define a formatting function for instruction-based fine-tuning.

In [3]:
dataset = load_dataset(dataset_name, split="train")
dataset = dataset.train_test_split(test_size=0.1)
train_dataset = dataset["train"]
eval_dataset = dataset["test"]

# For a quick demo, we'll use a small subset of the data
train_dataset = train_dataset.select(range(2))
eval_dataset = eval_dataset.select(range(2))

def format_instruction(example):
    """Formats the dataset examples into a structured prompt."""
    instruction = example.get('Instruction', '')
    inp = example.get('Prompt', '')
    response = example.get('Response', '')
    
    full_prompt = f"<s>[INST] {instruction}\n{inp} [/INST] {response} </s>"
    return full_prompt

# Let's look at a sample from the training set
print("Sample from the training dataset:")
print(train_dataset[0])

Sample from the training dataset:
{'Response': 'Establishing a database connection is a pretty expensive operation. Ideally a web application should be using a connection pool, so that you create create pool of database sessions initially and they remain there for the life of the application. The app tier will ask for a connection from the pool as it needs to interact with the database.So utopia is to see an initial set of LOGON records and then no LOGOFF records until your shut the application down.ShareFollowansweredMar 11, 2022 at 8:03Connor McDonaldConnor McDonald10.9k11 gold badge1212 silver badges1919 bronze badgesAdd a comment|', 'Instruction': 'I have a web application built by ASP.NET Web API and the database is Oracle.When I published the site on the IIS and run it, I recognized the following:I found many records in the viewDBA_AUDIT_SESSIONand that\'s recordsLOGOFF/LOGONin the order.After that, I let the site open for a while on a tab in the Chrome Browser without any intera

## 4. Model Loading and Fine-Tuning

Now, we'll load the base model and tokenizer. Then, we will apply the LoRA configuration and start the fine-tuning process.

In [4]:
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cpu" # Use CPU for local demo
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

# Apply LoRA configuration to the model
model = get_peft_model(model, lora_config)

# Create the SFTTrainer
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    peft_config=lora_config,
    formatting_func=format_instruction,
    args=training_args,
)

print("--- Starting Fine-Tuning ---")
trainer.train()
print("--- Fine-Tuning Complete ---")

  warn("The installed version of bitsandbytes was compiled without GPU support. "


'NoneType' object has no attribute 'cadam32bit_grad_fp32'


Applying formatting function to train dataset: 100%|██████████| 2/2 [00:00<00:00, 387.16 examples/s]
Adding EOS to train dataset: 100%|██████████| 2/2 [00:00<00:00, 945.20 examples/s]
Tokenizing train dataset: 100%|██████████| 2/2 [00:00<00:00, 84.95 examples/s]
Truncating train dataset: 100%|██████████| 2/2 [00:00<00:00, 497.75 examples/s]
Applying formatting function to eval dataset: 100%|██████████| 2/2 [00:00<00:00, 616.54 examples/s]
Adding EOS to eval dataset: 100%|██████████| 2/2 [00:00<00:00, 840.96 examples/s]
Tokenizing eval dataset: 100%|██████████| 2/2 [00:00<00:00, 289.59 examples/s]
Truncating eval dataset: 100%|██████████| 2/2 [00:00<00:00, 722.78 examples/s]


--- Starting Fine-Tuning ---




Step,Training Loss
10,3.2438
20,2.2005
30,1.2756
40,0.4578
50,0.1014
60,0.0284
70,0.0123
80,0.0089
90,0.0078
100,0.0076


--- Fine-Tuning Complete ---


## 5. Evaluation

Let's evaluate the fine-tuned model and compare its response to the base model's response for a sample DevOps-related prompt.

In [5]:
metrics = trainer.evaluate()
print("--- Evaluation Metrics ---")
print(metrics)



--- Evaluation Metrics ---
{'eval_loss': 5.611345291137695, 'eval_runtime': 3.4177, 'eval_samples_per_second': 0.585, 'eval_steps_per_second': 0.293}


In [7]:
# Save the trained model adapter
trainer.model.save_pretrained(new_model_adapter)

In [None]:
# Merge the LoRA adapter with the base model for easy inference
base_model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cpu")
finetuned_model = PeftModel.from_pretrained(base_model, new_model_adapter)
finetuned_model = finetuned_model.merge_and_unload()

# Define a prompt for evaluation
prompt = "How do I expose a deployment in Kubernetes using a service?"
messages = [
    {"role": "system", "content": "You are a helpful DevOps assistant."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=True
)

# Generate response from the fine-tuned model
print("------------------- FINE-TUNED MODEL RESPONSE -------------------")
model_inputs = tokenizer([text], return_tensors="pt").to("cpu")
generated_ids = finetuned_model.generate(model_inputs.input_ids, max_new_tokens=256)
response_finetuned = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(response_finetuned)

# Generate response from the original base model for comparison
print("\n------------------- BASE MODEL RESPONSE -------------------")
original_model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cpu")
generated_ids_base = original_model.generate(model_inputs.input_ids, max_new_tokens=256)
response_base = tokenizer.decode(generated_ids_base[0], skip_special_tokens=True)
print(response_base)

--- Fine-Tuned Model Response ---
system
You are a helpful DevOps assistant.
user
How do I expose a deployment in Kubernetes using a service?
assistant
To expose a deployment in Kubernetes using a service, you can follow these steps:

  1. Create a Kubernetes resource group for your application and label it with the appropriate namespace.
  2. Define the service that you want to expose. You can use the `app` label to specify the name of your app, and the container image to specify the runtime environment.
  3. In the Kubernetes API, you can find services by their name. Find the service you created in step 2, then select the associated resource group.
  4. Make the desired setup and apply any other settings as desired.

For example, if your application needs to be accessible over a specific port (e.g., 80), you would set the service to listen on that port. If your application does not need to be accessible over a specific port, you would use the default setting of serving all requests.


## 6. Model Logging

Finally, we log our fine-tuned model, its tokenizer, and the evaluation metrics to the model registry.

In [None]:
try:
    import frogml

    frogml.huggingface.log_model(   
    model= finetuned_model,
        tokenizer= tokenizer,
        repository="llms",    # The JFrog repository to upload the model to.
        model_name="finetuned_qwen",     # The uploaded model name
        version="v0-2.2",     # Optional. The uploaded model version
        properties = {"dataset": "Szaid3680-Devops"},
        metrics = metrics
        )
    print("--- Model Logged Successfully ---")
except Exception as e:
    print(f"An error occurred during model logging: {e}")

INFO:HuggingfaceModelVersionManager:Logging model finetuned_qwen to llms
INFO:JmlCustomerClient:Getting project key for repository llms
INFO:JmlCustomerClient:Customer exists in JML.
/private/var/folders/mt/wvz9xr_s7k3cwk3r0b96hyn00000gn/T/tmp6kwjnj6m/finetuned_qwen.pretrained_model/tokenizer_config.json: 100%|██████████| 970/970 [00:00<00:00, 2.69kB/s]
/private/var/folders/mt/wvz9xr_s7k3cwk3r0b96hyn00000gn/T/tmp6kwjnj6m/finetuned_qwen.pretrained_model/config.json: 100%|██████████| 684/684 [00:00<00:00, 13.5MB/s]

/private/var/folders/mt/wvz9xr_s7k3cwk3r0b96hyn00000gn/T/tmp6kwjnj6m/finetuned_qwen.pretrained_model/added_tokens.json: 100%|██████████| 80.0/80.0 [00:00<00:00, 2.35MB/s]
/private/var/folders/mt/wvz9xr_s7k3cwk3r0b96hyn00000gn/T/tmp6kwjnj6m/finetuned_qwen.pretrained_model/tokenizer_config.json: 100%|██████████| 970/970 [00:00<00:00, 1.33kB/s]
/private/var/folders/mt/wvz9xr_s7k3cwk3r0b96hyn00000gn/T/tmp6kwjnj6m/finetuned_qwen.pretrained_model/merges.txt: 100%|██████████| 1.67M/