<div align="center">
<a href="https://rapidfire.ai/"><img src="https://raw.githubusercontent.com/RapidFireAI/rapidfireai/main/docs/images/RapidFire - Blue bug -white text.svg" width="115"></a>
<a href="https://discord.gg/6vSTtncKNN"><img src="https://raw.githubusercontent.com/RapidFireAI/rapidfireai/main/docs/images/discord-button.svg" width="145"></a>
<a href="https://oss-docs.rapidfire.ai/"><img src="https://raw.githubusercontent.com/RapidFireAI/rapidfireai/main/docs/images/documentation-button.svg" width="125"></a>
<br/>
Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/RapidFireAI/rapidfireai">GitHub</a></i> ⭐
<br/>
To install RapidFire AI on your own machine, see the <a href="https://oss-docs.rapidfire.ai/en/latest/walkthrough.html">Install and Get Started</a> guide in our docs.
</div>

# RapidFire AI Tutorial: SFT with Trackio Experiment Tracking

This tutorial demonstrates how to use **[Trackio](https://github.com/gradio-app/trackio)** as the experiment tracking backend for RapidFire AI. Trackio is a lightweight, local-first experiment tracking library that provides:

- **No server required**: Runs entirely locally with no external dependencies
- **Simple API**: Just `trackio.init()`, `trackio.log()`, and `trackio.finish()`
- **Beautiful dashboard**: View metrics with `trackio show` command
- **Gradio integration**: Built by the Gradio team for seamless ML workflow integration

We'll fine-tune a model on customer support data using Supervised Fine-Tuning (SFT) while tracking all metrics with Trackio.

### Configure Trackio as the Tracking Backend

RapidFire AI supports multiple tracking backends: MLflow, TensorBoard, and Trackio. Here we configure Trackio as the **standalone** tracking backend by setting environment variables **before** importing RapidFire components.

In [None]:
import os

# Enable Trackio as the tracking backend
os.environ["RF_TRACKIO_ENABLED"] = "true"

# Disable other tracking backends for standalone Trackio usage
os.environ["RF_MLFLOW_ENABLED"] = "false"
os.environ["RF_TENSORBOARD_ENABLED"] = "false"

print("✅ Trackio configured as standalone tracking backend")

### Import RapidFire Components

In [None]:
from rapidfireai import Experiment
from rapidfireai.automl import List, RFGridSearch, RFModelConfig, RFLoraConfig, RFSFTConfig

### Understanding Trackio: What Gets Tracked

[Trackio](https://github.com/gradio-app/trackio) is a free, open-source experiment tracking library from Hugging Face. When enabled in RapidFire AI, it automatically captures:

**Training Metrics (logged every `logging_steps`):**
- `loss` - Training loss at each logging step
- `learning_rate` - Current learning rate (useful for seeing scheduler effects)
- `epoch` and `step` - Progress indicators

**Evaluation Metrics (logged every `eval_steps`):**
- `eval_loss` - Validation loss
- Custom metrics from your `compute_metrics` function (e.g., `rougeL`, `bleu`)

**Run Configuration:**
- All hyperparameters (learning rate, batch size, LoRA settings, etc.)
- Model name and training arguments

**How RapidFire AI integrates with Trackio:**

Under the hood, RapidFire AI wraps Trackio's simple API:
```python
# What RapidFire does automatically:
trackio.init(project="experiment-name", name="run-name", config={...})
trackio.log({"loss": 0.5, "step": 100})  # Called during training
trackio.finish()  # Called when run completes
```

You don't need to call these directly—RapidFire handles it. Just enable Trackio and run your experiments!

### Load Dataset and Specify Train and Eval Partitions

In [None]:
from datasets import load_dataset

dataset=load_dataset("bitext/Bitext-customer-support-llm-chatbot-training-dataset")

# Select a subset of the dataset for demo purposes
train_dataset=dataset["train"].select(range(128))
eval_dataset=dataset["train"].select(range(100,124))
train_dataset=train_dataset.shuffle(seed=42)
eval_dataset=eval_dataset.shuffle(seed=42)

### Define Data Processing Function

In [None]:
def sample_formatting_function(row):
    """Function to preprocess each example from dataset"""
    # Special tokens for formatting
    SYSTEM_PROMPT = "You are a helpful and friendly customer support assistant. Please answer the user's query to the best of your ability."
    return {
        "prompt": [
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": row["instruction"]},
            
        ],
        "completion": [
            {"role": "assistant", "content": row["response"]}
        ]
    }

### Initialize Experiment

When Trackio is enabled, RapidFire AI will automatically initialize Trackio with the experiment name and log all training metrics.

In [None]:
# Every experiment instance must be uniquely named
experiment = Experiment(experiment_name="exp1-sft-trackio-demo", mode="fit")

### Define Custom Eval Metrics Function

In [None]:
def sample_compute_metrics(eval_preds):  
    """Optional function to compute eval metrics based on predictions and labels"""
    predictions, labels = eval_preds

    # Standard text-based eval metrics: Rouge and BLEU
    import evaluate
    rouge = evaluate.load("rouge")
    bleu = evaluate.load("bleu")

    rouge_output = rouge.compute(predictions=predictions, references=labels, use_stemmer=True)
    rouge_l = rouge_output["rougeL"]
    bleu_output = bleu.compute(predictions=predictions, references=labels)
    bleu_score = bleu_output["bleu"]

    return {
        "rougeL": round(rouge_l, 4),
        "bleu": round(bleu_score, 4),
    }

### Define Multi-Config Knobs for Model, LoRA, and SFT Trainer using RapidFire AI Wrapper APIs

In [None]:
# 2 LoRA PEFT configs lite with different adapter capacities
peft_configs_lite = List([
    RFLoraConfig(
        r=8,
        lora_alpha=16,
        lora_dropout=0.1,
        target_modules=["q_proj", "v_proj"],  # Standard transformer naming
        bias="none"
    ),
    RFLoraConfig(
        r=32,
        lora_alpha=64,
        lora_dropout=0.1,
        target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],  # Standard naming
        bias="none"
    )
])

# 2 base models x 2 peft configs = 4 combinations in total
config_set_lite = List([
    RFModelConfig(
        model_name="TinyLlama/TinyLlama-1.1B-Chat-v1.0",  # 1.1B model
        peft_config=peft_configs_lite,
        training_args=RFSFTConfig(
            learning_rate=1e-3,  # Higher LR for very small model
            lr_scheduler_type="linear",
            per_device_train_batch_size=4,
            per_device_eval_batch_size=4,
            max_steps=128,
            gradient_accumulation_steps=1,   # No accumulation needed
            logging_steps=2,
            eval_strategy="steps",
            eval_steps=4,
            fp16=True,
        ),
        model_type="causal_lm",
        model_kwargs={"device_map": "auto", "torch_dtype": "auto", "use_cache": False},
        formatting_func=sample_formatting_function,
        compute_metrics=sample_compute_metrics,
        generation_config={
            "max_new_tokens": 256,
            "temperature": 0.8,  # Higher temp for tiny model
            "top_p": 0.9,
            "top_k": 30,         # Reduced top_k
            "repetition_penalty": 1.05,
        }
    ),
    RFModelConfig(
        model_name="TinyLlama/TinyLlama-1.1B-Chat-v1.0",  # 1.1B model
        peft_config=peft_configs_lite,
        training_args=RFSFTConfig(
            learning_rate=1e-4,  # Higher LR for very small model
            lr_scheduler_type="linear",
            per_device_train_batch_size=4,  # Even larger batch size
            per_device_eval_batch_size=4,
            max_steps=128,
            gradient_accumulation_steps=1,   # No accumulation needed
            logging_steps=2,
            eval_strategy="steps",
            eval_steps=4,
            fp16=True,
        ),
        model_type="causal_lm",
        model_kwargs={"device_map": "auto", "torch_dtype": "auto", "use_cache": False},
        formatting_func=sample_formatting_function,
        compute_metrics=sample_compute_metrics,
        generation_config={
            "max_new_tokens": 256,
            "temperature": 0.8,  # Higher temp for tiny model
            "top_p": 0.9,
            "top_k": 30,         # Reduced top_k
            "repetition_penalty": 1.05,
        }
    )
])

#### Define Model Creation Function for All Model Types Across Configs

In [None]:

def sample_create_model(model_config): 
     """Function to create model object for any given config; must return tuple of (model, tokenizer)"""
     from transformers import AutoModelForCausalLM, AutoTokenizer, AutoModelForSeq2SeqLM, AutoModelForMaskedLM

     model_name = model_config["model_name"]
     model_type = model_config["model_type"]
     model_kwargs = model_config["model_kwargs"]
 
     if model_type == "causal_lm":
          model = AutoModelForCausalLM.from_pretrained(model_name, **model_kwargs)
     elif model_type == "seq2seq_lm":
          model = AutoModelForSeq2SeqLM.from_pretrained(model_name, **model_kwargs)
     elif model_type == "masked_lm":
          model = AutoModelForMaskedLM.from_pretrained(model_name, **model_kwargs)
     elif model_type == "custom":
          # Handle custom model loading logic, e.g., loading your own checkpoints
          # model = ... 
          pass
     else:
          # Default to causal LM
          model = AutoModelForCausalLM.from_pretrained(model_name, **model_kwargs)
      
     tokenizer = AutoTokenizer.from_pretrained(model_name)
      
     return (model,tokenizer)

#### Generate Config Group

In [None]:
# Simple grid search across all sets of config knob values = 4 combinations in total
config_group = RFGridSearch(
    configs=config_set_lite,
    trainer_type="SFT"
)

### Run Multi-Config Training

All training metrics will be automatically logged to Trackio. You can view them in real-time using the Trackio dashboard (see next section).

In [None]:
# Launch training of all configs in the config_group with swap granularity of 4 chunks
experiment.run_fit(config_group, sample_create_model, train_dataset, eval_dataset, num_chunks=4, seed=42)

### View Metrics with Trackio Dashboard

Trackio provides a beautiful Gradio-powered dashboard to visualize your experiment metrics. You can launch it in two ways:

**Option 1: From a terminal** (recommended for real-time viewing during training)
```bash
trackio show
```

**Option 2: From Python** (opens the dashboard in your browser)
```python
import trackio
trackio.show()
```

You can also load a specific project directly:
```bash
trackio show --project "exp1-sft-trackio-demo"
```

**What You'll See in the Dashboard:**

- **Loss Curves**: Each of your 4 parallel runs will show training loss over steps
- **Evaluation Metrics**: ROUGE-L and BLEU scores from your `compute_metrics` function
- **Run Comparison**: Select/deselect runs, zoom into step ranges, apply smoothing
- **Hyperparameters**: View all logged configuration for each run

**Pro Tips:**
- Open the dashboard in a separate browser tab while training runs
- Metrics update in real-time as training progresses
- Use the project filter if you have multiple experiments

In [None]:
# Uncomment the following lines to launch the Trackio dashboard
# import trackio
# trackio.show()

### Combining Trackio Insights with RapidFire's Interactive Control (IC Ops)

One of RapidFire AI's unique features is **Interactive Control Operations (IC Ops)**, which lets you control experiments in real-time. Combined with Trackio's observability, you get a powerful workflow:

**The Workflow:**

1. **Monitor in Trackio**: Watch your parallel runs in the Trackio dashboard as they train
2. **Identify patterns**: Spot runs with poor loss curves, diverging metrics, or slow convergence
3. **Take action with IC Ops**: 
   - **Stop** underperforming runs to free up GPU resources
   - **Clone** promising configurations and modify parameters
   - **Resume** stopped runs if you want to continue training

**Example Scenario:**

You're running 4 configurations in parallel. In Trackio, you notice:
- Run 1 (lr=1e-3, r=8): Loss dropping quickly, good convergence
- Run 2 (lr=1e-3, r=32): Similar to Run 1, slightly better eval metrics
- Run 3 (lr=1e-4, r=8): Loss dropping slowly, may need more steps
- Run 4 (lr=1e-4, r=32): Loss barely moving, likely too low learning rate

**Actions you might take:**
- Stop Run 4 (not converging) to free GPU memory
- Clone Run 2 with a different scheduler to see if you can improve further
- Let Runs 1-3 continue to completion

This turns experiment tracking from passive observation into **active experiment management**.

### End Current Experiment

In [None]:
experiment.end()

### Summary

In this tutorial, you learned how to:

1. **Configure Trackio** as the standalone tracking backend for RapidFire AI
2. **Understand what gets tracked** - training metrics, eval metrics, and hyperparameters
3. **Run multi-config training** with automatic metric logging to Trackio
4. **View and compare experiments** using the Trackio dashboard
5. **Combine Trackio insights with IC Ops** for active experiment management

**Why Trackio + RapidFire AI?**
- **Free and open source**: No usage limits, no vendor lock-in
- **Local-first**: No server setup, metrics stored locally and persist between sessions
- **Real-time visibility**: Compare parallel runs as they train
- **Actionable insights**: Use Trackio data to guide IC Ops decisions

**Learn More:**
- [Trackio GitHub Repository](https://github.com/gradio-app/trackio) - Full documentation and examples
- [Trackio Documentation](https://huggingface.co/docs/trackio/index) - API reference
- [RapidFire AI Documentation](https://oss-docs.rapidfire.ai/) - Getting started guide
- [RapidFire AI + Trackio Announcement](https://github.com/RapidFireAI/rapidfireai/blob/main/tutorial_notebooks/fine-tuning/Co-Announcement%20Blog%20Trackio%20and%20RapidFire%20AI.md) - Co-authored blog post with Hugging Face

<div align="center">
<a href="https://rapidfire.ai/"><img src="https://raw.githubusercontent.com/RapidFireAI/rapidfireai/main/docs/images/RapidFire - Blue bug -white text.svg" width="115"></a>
<a href="https://discord.gg/6vSTtncKNN"><img src="https://raw.githubusercontent.com/RapidFireAI/rapidfireai/main/docs/images/discord-button.svg" width="145"></a>
<a href="https://oss-docs.rapidfire.ai/"><img src="https://raw.githubusercontent.com/RapidFireAI/rapidfireai/main/docs/images/documentation-button.svg" width="125"></a>
<br/>
Thanks for trying RapidFire AI! ⭐ <i>Star us on <a href="https://github.com/RapidFireAI/rapidfireai">GitHub</a></i> ⭐
</div>