This base code follows the steps required to finetune a LLM through the **LoRA** method.

The overall process takes approximately 6 hours with a basic GPU unit, so the code is only for reference in case no GPU is provided.

# Install dependencies

Make sure that your environment has the following packages installed to work properly:

* *bitsandbytes* for quantization
* *peft* for implementing LoRA
* *Transformers* for loading the HuggingFace models
* *datasets* for loading and using the fine-tuned dataset
* *trl* for the trainer class

*pip install bitsandbytes peft transformers trl tensorboardX wandb*

# Create a Hugging Face token to push the model into the HF Hub

Make sure to have a Hugging Face token with a **write role** to be able to push the code into the Hub. Also, make sure to use that token as an environment variable as *HF_TOKEN*

# Import the libraries

In [2]:
from enum import Enum
from functools import partial
import pandas as pd
import torch
import json

from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
from datasets import load_dataset
from trl import SFTConfig, SFTTrainer
from peft import LoraConfig, TaskType

import os

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
seed = 42
set_seed(seed)

In [4]:
with open("../hf_token.txt", "r") as f:
    hf_token = f.readline()

os.environ["HF_TOKEN"] = hf_token

# Process the dataset into inputs

It is necessary to prepare the training set into the format that the LLM requires to make sure it learns the format we desire.

This sample will use the dataset for function calling *NousResearch/hermes-function-calling-v1* by adding some thinking processes from **deepseek-ai/DeepSeek-R1-Distill-Qwen-32B**.

The importance of processing the data resides in the model requiring a specific data format to properly follow the conversation with the user, as well as to know when to call the functions.

In [None]:
model_name = "google/gemma-2-2b-it"
dataset_name = "Jofthomas/hermes-function-calling-thinking-V1"
tokenizer = AutoTokenizer.from_pretrained(model_name)

tokenizer.chat_template = "{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{{ '<start_of_turn>' + message['role'] + '\n' + message['content'] | trim + '<end_of_turn><eos>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"

In [5]:
def preprocess(sample):
    messages = sample["messages"]
    first_message = messages[0]

    # Instead of adding a system message, we merge the content into the first user message
    if first_message["role"] == "system":
        system_message_content = first_message["content"]

        # Merge system content with the first user message
        messages[1]["content"] = system_message_content + "Also, before making a call to a function, take the time to plan the function to take. Make that thinking process between <think>Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>\n\n" + messages[1]["content"]

        # Remove the system message from the conversation
        messages.pop(0)

    return {"text": tokenizer.apply_chat_template(messages, tokenize = False)}

In [None]:
# Get the data for finetuning
dataset = load_dataset(dataset_name)
dataset = dataset.rename_column("conversations", "messages")

# A Dedicated Dataset for this Unit

The *NousResearch/hermes-funciton-calling-v1* dataset is considered a reference for finetuning LLMs. However, it doesn't have a **thinking** step in the messages of the dataset.

It has been demonstrated that giving the thinking time to the LLM before generating an answer or performing an action significantly improves the performance and quality of the responses.

In this case, the sample dataset that is being used takes as reference the aforementioned dataset, but it was previously given to DeepSeek-R1 to include some *\<think\>* tokens before any function call.

In [None]:
dataset = dataset.map(preprocess, remove_columns = "messages")
dataset = dataset["train"].train_test_split(0.1)
print(dataset)

# Checking the inputs

The following lines of code will show how the input messages will look like. In general, the conversations will have the following structure:

* A **user message** containing the necessary information with the list of available tools between the *\<tools\> \</tools\>* tokens, followed by the user query.
* An **assistant message** (model) that works with two phases: the *thinking* phase contained in the *\<think\> \</think\>* tokens, and an *act* phase contained in the *\<tool_call\> \</tool_call\>* tokens.
* In the case the model has a *\<tools_call\>* token, we will append the results of that particular action in a new **Tool** message between the *\<tool_response\> \</tool_response\>* tokens.

In [None]:
print(dataset["train"][8]["text"])

In [None]:
# Review the tokens used for the model (sanity check)
print(tokenizer.pad_token)
print(tokenizer.eos_token)

# Modifying the tokenizer

By default, the tokenizer splits the given text into sub-words (tokens). However, this is not the behavior we're looking for the new included tokens.

Even though we have included the *\<think\>*, *\<tool_call\>*, and *\<tool_response\>* on the dataset, we still need to indicate the tokenizer to treat them as tokens.

Finally, we need to modify our chat_template in the tokenizer to reflect those changes as well.

In [None]:
class ChatmlSpecialTokens(str, Enum):
    tools = "<tools>"
    eotools = "</tools>"
    think = "<think>"
    eothink = "</think>"
    tool_call = "<tool_call>"
    eotool_call = "</tool_call>"
    tool_response = "<tool_response>"
    eotool_response = "</tool_response>"
    pad_token = "<pad>"
    eos_token = "<eos>"

    @classmethod
    def list(cls):
        return [c.value for c in cls]

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name,
                                          pad_token = ChatmlSpecialTokens.pad_token.value,
                                          additional_special_tokens = ChatmlSpecialTokens.list())
tokenizer.chat_template = "{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{{ '<start_of_turn>' + message['role'] + '\n' + message['content'] | trim + '<end_of_turn><eos>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"

model = AutoModelForCausalLM.from_pretrained(model_name,
                                             attn_implementation = "eager",
                                             device_map = "auto")
model.resize_token_embeddings(len(tokenizer))
model.to(torch.bfloat16)

# Configure the LoRA

This block is in charge of defining the parameters of the adapter we'll add and train to the LLM. We need to define the size and importance of the adapters during training.

In [None]:
# rank_dimension: for LoRA update amtrices (smaller = more compression)
rank_dimension = 16

# lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation)
lora_alpha = 64

# lora_dropout: dropout probability for LoRA layers (helps prevent overfitting)
lora_dropout = 0.05

peft_config = LoraConfig(r = rank_dimension,
                         lora_alpha = lora_alpha,
                         lora_dropout = lora_dropout,
                         target_modules = ["gate_proj","q_proj","lm_head","o_proj","k_proj","embed_tokens","down_proj","up_proj","v_proj"], # specify the layers in the transformer that will be targeted for training
                         task_type = TaskType.CAUSAL_LM)

# Defining the Trainer and Fine-Tuning Hyperparameters

In [None]:
hf_username = "germanebr"

# The directory where the trained model checkpoints, logs, and other artifacts will be saved. It will also be the default name of the model when pushed to the hub if not redefined later.
output_dir = "gemma-2-2B-it-thinking-function_calling-V0"
per_device_train_batch_size = 1
per_device_eval_batch_size = 1
gradient_accumulation_steps = 4
logging_steps = 5
learning_rate = 1e-4 # The initial learning rate for the optimizer.

max_grad_norm = 1.0
num_train_epochs=1
warmup_ratio = 0.1
lr_scheduler_type = "cosine"
max_seq_length = 1500

training_arguments = SFTConfig(output_dir = output_dir,
                               per_device_train_batch_size = per_device_train_batch_size,
                               per_device_eval_batch_size = per_device_eval_batch_size,
                               gradient_accumulation_steps = gradient_accumulation_steps,
                               save_strategy = "no",
                               eval_strategy = "epoch",
                               logging_steps = logging_steps,
                               learning_rate = learning_rate,
                               max_grad_norm = max_grad_norm,
                               weight_decay = 0.1,
                               warmup_ratio = warmup_ratio,
                               lr_scheduler_type = lr_scheduler_type,
                               report_to = "tensorboard",
                               bf16 = True,
                               hub_private_repo = False,
                               push_to_hub = False,
                               num_train_epochs = num_train_epochs,
                               gradient_checkpointing = True,
                               gradient_checkpointing_kwargs = {"use_reentrant": False},
                               packing = True,
                               max_seq_length = max_seq_length)

In [None]:
# Training with a Supervised Fine-Tuning Trainer (SFTTrainer)
trainer = SFTTrainer(model = model,
                     args = training_arguments,
                     train_dataset = dataset["train"],
                     eval_dataset = dataset["test"],
                     processing_class = tokenizer,
                     peft_config = peft_config)

In [None]:
# Training the model
trainer.train()
trainer.save_model()

# Push the New Model and the Tokenizer to the HF Hub

The model will be pushed under the given HuggingFace username + the output_dir we specified earlier

In [None]:
trainer.push_to_hub(f"{hf_username}/{output_dir}")

In [None]:
tokenizer.eos_token = "<eos>"
# push the tokenizer to hub ( replace with your username and your previously specified
tokenizer.push_to_hub(f"{hf_username}/{output_dir}", token = True)

# Test the Model

In order to test it, we need to follow the next steps:

1. Load the adapter from the hub
2. Load the base model: "**google/gemma-2-2b-it**" from the hub
3. Resize the model to with the new tokens we introduced during the finetuning

In [None]:
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from datasets import load_dataset
import torch

In [None]:
bnb_config = BitsAndBytesConfig(load_in_4bit = True,
                                bnb_4bit_quant_type = "nf4",
                                bnb_4bit_compute_dtype = torch.bfloat16,
                                bnb_4bit_use_double_quant = True)

peft_model_id = f"{hf_username}/{output_dir}" # replace with your newly trained adapter
device = "auto"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path,
                                             device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
model.resize_token_embeddings(len(tokenizer))
model = PeftModel.from_pretrained(model, peft_model_id)
model.to(torch.bfloat16)
model.eval()

In [None]:
print(dataset["test"][8]["text"])

In [None]:
# This prompt is a sub-sample of one of the test set examples. In this example we start the generation after the model generation starts.
prompt="""<bos><start_of_turn>human
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert from one currency to another', 'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to convert'}, 'from_currency': {'type': 'string', 'description': 'The currency to convert from'}, 'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}}, {'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations', 'parameters': {'type': 'object', 'properties': {'start_location': {'type': 'string', 'description': 'The starting location'}, 'end_location': {'type': 'string', 'description': 'The ending location'}}, 'required': ['start_location', 'end_location']}}}] </tools>Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{tool_call}
</tool_call>Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>

Hi, I need to convert 500 USD to Euros. Can you help me with that?<end_of_turn><eos>
<start_of_turn>model
<think>"""

inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
inputs = {k: v.to("cuda") for k,v in inputs.items()}
outputs = model.generate(**inputs,
                         max_new_tokens = 300,# Adapt as necessary
                         do_sample = True,
                         top_p = 0.95,
                         temperature = 0.01,
                         repetition_penalty = 1.0,
                         eos_token_id = tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0]))