# TRL SFT Tool Calling Training - Complete Tutorial

This notebook demonstrates the end-to-end workflow for:
1. Generating a tool-calling dataset with DeepFabric
2. Formatting it for HuggingFace TRL SFTTrainer
3. Loading and training a model with tool calling capabilities

## Prerequisites

Install required packages:
```bash
pip install deepfabric transformers trl datasets peft accelerate
```

## References
- [HuggingFace TRL SFTTrainer Tool Calling](https://huggingface.co/docs/trl/en/sft_trainer#tool-calling-with-sft)
- [Fine-tuning for Tool Calling](https://www.stephendiehl.com/posts/fine_tuning_tools/)

## Step 1: Generate Tool-Calling Dataset with DeepFabric

First, we'll use DeepFabric to generate a dataset with tool-calling examples that include:
- Realistic tool usage scenarios
- Step-by-step reasoning
- Proper tool parameter construction

In [None]:
import asyncio
import json
import os

from deepfabric.config import DeepFabricConfig
from deepfabric.dataset import Dataset
from deepfabric.generator import DataSetGenerator
from deepfabric.tree import Tree

### Configure Dataset Generation

Define the configuration for generating our tool-calling dataset:

In [None]:
config = {
    "dataset_system_prompt": """You are a helpful AI assistant with access to various tools.
Use tools when needed to answer questions accurately.""",

    "topic_tree": {
        "topic_prompt": "Real-world tasks requiring AI assistant tool usage",
        "provider": "openai",
        "model": "gpt-4o-mini",
        "temperature": 0.7,
        "depth": 2,
        "degree": 3,
    },

    "data_engine": {
        "generation_system_prompt": """Generate realistic AI assistant scenarios with tool usage.
Show clear reasoning about tool selection and parameter construction.""",
        "provider": "openai",
        "model": "gpt-4o-mini",
        "temperature": 0.8,
        "conversation_type": "agent_cot_tools",
        "available_tools": ["web_search", "calculator", "get_weather"],
        "max_tools_per_query": 2,
    },

    "dataset": {
        "creation": {
            "num_steps": 5,
            "batch_size": 2,
            "sys_msg": True
        },
        "save_as": "trl_raw.jsonl",
        "formatters": [
            {
                "name": "trl_sft",
                "template": "builtin://trl_sft_tools",
                "output": "trl_formatted.jsonl",
                "config": {
                    "include_system_prompt": True,
                    "validate_tool_schemas": True,
                    "remove_available_tools_field": False,
                }
            }
        ]
    }
}

print("📊 Configuration:")
print(f"  - Topic depth: {config['topic_tree']['depth']}")
print(f"  - Topic degree: {config['topic_tree']['degree']}")
print(f"  - Samples: {config['dataset']['creation']['num_steps'] * config['dataset']['creation']['batch_size']}")
print(f"  - Conversation type: {config['data_engine']['conversation_type']}")

### Generate Topic Tree

Create diverse topic paths for our dataset:

In [None]:
print("Generating topic tree...")

df_config = DeepFabricConfig(**config)
tree_params = df_config.get_topic_tree_params()
tree = Tree(**tree_params)

async def _build_tree():
    async for _ in tree.build_async():
        pass

await _build_tree()
print(f"✓ Generated {len(tree.tree_paths)} topic paths")

### Generate Agent Tool-Calling Samples

Create the actual training examples with tool usage:

In [None]:
print("Generating agent tool-calling samples...")

engine_params = df_config.get_engine_params()
generator = DataSetGenerator(**engine_params)

dataset = generator.create_data(
    num_steps=df_config.dataset.creation.num_steps,
    batch_size=df_config.dataset.creation.batch_size,
    sys_msg=df_config.dataset.creation.sys_msg,
    topic_model=tree,
)

print(f"✓ Generated {len(dataset)} samples")

### Save Raw Dataset

Save the unformatted dataset for reference:

In [None]:
print("💾 Saving dataset...")

dataset.save("trl_raw.jsonl")
print("✓ Saved raw dataset to trl_raw.jsonl")

### Apply TRL SFT Formatter

Convert the dataset to TRL-compatible format with OpenAI function calling schema:

In [None]:
print("🔧 Applying TRL SFT formatter...")

formatter_configs = df_config.get_formatter_configs()
formatted_datasets = dataset.apply_formatters(formatter_configs)
formatted_dataset = formatted_datasets["trl_sft"]

print(f"✓ Formatted {len(formatted_dataset)} samples for TRL")
print("✓ Saved formatted dataset to trl_formatted.jsonl")

### Inspect Example

Let's look at what the formatted data looks like:

In [None]:
if len(formatted_dataset) > 0:
    print("📋 Example formatted sample:")
    example = formatted_dataset[0]
    
    print(f"  - Messages: {len(example.get('messages', []))} message(s)")
    print(f"  - Tools: {len(example.get('tools', []))} tool(s)")
    
    if "tools" in example and example["tools"]:
        print(f"\n  First tool schema:")
        tool = example["tools"][0]
        print(f"    - Name: {tool['function']['name']}")
        print(f"    - Description: {tool['function']['description']}")
        print(f"    - Parameters: {list(tool['function']['parameters']['properties'].keys())}")

## Step 2: Load Dataset for Training

Convert the formatted dataset to HuggingFace Dataset format:

In [None]:
from datasets import Dataset as HFDataset

print("Loading dataset for training...")

# Load samples from file
samples = []
with open("trl_formatted.jsonl") as f:
    for line in f:
        samples.append(json.loads(line))

# Convert to HuggingFace Dataset
hf_dataset = HFDataset.from_list(samples)

print(f"✓ Loaded {len(hf_dataset)} samples")
print(f"  - Features: {list(hf_dataset.features.keys())}")

# Verify format
first_sample = hf_dataset[0]
print("\n✓ Sample validation:")
print(f"  - Has 'messages': {'messages' in first_sample}")
print(f"  - Has 'tools': {'tools' in first_sample}")
if "tools" in first_sample:
    print(f"  - Number of tools: {len(first_sample['tools'])}")

## Step 3: Setup TRL SFTTrainer

Configure the model, tokenizer, and training arguments:

In [None]:
from peft import LoraConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import SFTConfig, SFTTrainer

print("Setting up training components...")

### Select Model

Choose a model appropriate for your use case:
- **For testing**: `Qwen/Qwen2.5-0.5B-Instruct` (small, fast)
- **For production**: `Qwen/Qwen2.5-7B-Instruct` or `meta-llama/Llama-3.1-8B-Instruct`

In [None]:
model_name = "Qwen/Qwen2.5-0.5B-Instruct"  # Small model for testing

print(f"Model: {model_name}")
print("\nLoading tokenizer and model...")

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    trust_remote_code=True,
)

print("✓ Model loaded")

### Configure LoRA

LoRA (Low-Rank Adaptation) allows efficient fine-tuning by only updating a small number of parameters:

In [None]:
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
)

print("LoRA Configuration:")
print(f"  - r: {peft_config.r}")
print(f"  - alpha: {peft_config.lora_alpha}")
print(f"  - target modules: {peft_config.target_modules}")

### Training Configuration

Set up training hyperparameters:

In [None]:
training_args = SFTConfig(
    output_dir="./trl_tool_calling_model",
    num_train_epochs=3,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    logging_steps=10,
    save_steps=100,
    warmup_steps=50,
    max_seq_length=2048,
    # Tool calling specific
    dataset_text_field=None,  # We'll handle formatting
    packing=False,
)

print("Training Configuration:")
print(f"  - Epochs: {training_args.num_train_epochs}")
print(f"  - Batch size: {training_args.per_device_train_batch_size}")
print(f"  - Learning rate: {training_args.learning_rate}")
print(f"  - Max sequence length: {training_args.max_seq_length}")

### Initialize Trainer

Create the SFTTrainer with all components:

In [None]:
print("Initializing SFTTrainer...")

trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=hf_dataset,
    tokenizer=tokenizer,
    peft_config=peft_config,
)

print("✓ Trainer initialized")

## Step 4: Train the Model

⚠️ **Note**: This is a demonstration with a small dataset. For production:
- Use a larger dataset (1000+ samples)
- Use a larger model (7B+ parameters)
- Train for more epochs with validation
- Monitor metrics and adjust hyperparameters

Set `RUN_TRAINING=true` in the environment to actually run training:

In [None]:
RUN_TRAINING = os.environ.get("RUN_TRAINING", "false").lower() == "true"

if RUN_TRAINING:
    print("🏋️  Starting training...")

    try:
        trainer.train()
        print("\n✓ Training completed!")

        # Save the model
        trainer.save_model("./trl_tool_calling_model/final")
        print("✓ Model saved to ./trl_tool_calling_model/final")

    except Exception as e:
        print(f"\n❌ Training failed: {e}")
        print("This is expected in a demo environment without GPU/proper setup")
else:
    print("Skipping training (set RUN_TRAINING=true to train)")
    print("Training requires: GPU, sufficient memory, and time")

## Step 5: Using the Trained Model

After training, you can use the model for inference:

In [None]:
# This is example code - uncomment to use after training
"""
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load the trained model
base_model = "Qwen/Qwen2.5-0.5B-Instruct"
adapter_path = "./trl_tool_calling_model/final"

tokenizer = AutoTokenizer.from_pretrained(base_model)
base = AutoModelForCausalLM.from_pretrained(base_model)
model = PeftModel.from_pretrained(base, adapter_path)

# Example inference with tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"}
                },
                "required": ["location"]
            }
        }
    }
]

messages = [
    {"role": "system", "content": "You are a helpful assistant with tools."},
    {"role": "user", "content": "What's the weather in Paris?"}
]

# Format with chat template and generate
input_text = tokenizer.apply_chat_template(messages, tools=tools, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(response)
"""
print("📖 See the cell above for inference example code")

## Summary

You've completed the full workflow:

1. ✅ Generated a tool-calling dataset with DeepFabric
2. ✅ Formatted it for TRL SFTTrainer
3. ✅ Configured training with LoRA
4. ⏸️ (Optional) Trained the model
5. 📖 Learned how to use the trained model

### Next Steps

1. Review the generated datasets (`trl_raw.jsonl`, `trl_formatted.jsonl`)
2. Adjust configuration for your use case
3. Scale up dataset size and model size
4. Add evaluation and validation splits
5. Monitor training metrics and adjust hyperparameters

### Files Created

- `trl_raw.jsonl` - Raw DeepFabric dataset
- `trl_formatted.jsonl` - TRL-formatted dataset with OpenAI schema
- `./trl_tool_calling_model/` - Training checkpoints (if training was run)
- `./trl_tool_calling_model/final/` - Final trained model (if training was run)