<h1 align='center'>Synthetic Data Generation and Unsloth Tutorial</h1>

## 📚 Table of Contents:

- [Synthetic Data Kit: Data Generation](#synthetic-data-generation)
- [Unsloth: Fine-Tuning and saving the model](#fine-tuning)

## Synthetic Data Generation

In this section, we use the CLI from synthetic-data-kit to generate datasets

### Testing Synthetic Data Kit Command

Please make sure you are running vllm by opening a terminal and typing `vllm serve Unsloth/Llama-3.3-70B-Instruct   --port 8001   --max-model-len 48000   --gpu-memory-utilization 0.85`

In [9]:
!synthetic-data-kit --help

Loading config from: /usr/local/lib/python3.12/dist-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: /usr/local/lib/python3.12/dist-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1msynthetic-data-kit [OPTIONS] COMMAND [ARGS]...[0m[1m                         [0m[1m [0m
[1m                                                                                [0m
 A toolkit for preparing synthetic datasets for fine-tuning LLMs                
                                                                                
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-config[0m              [1;32m-c[0m      [1;33mPATH[0m  Path to configuration file               [2m│[0m
[2m│[0

### Exploring Synthetic Data Kit CLI

This command displays the help menu for the `synthetic-data-kit` CLI tool, showing available commands:
- **system-check**: Verify LLM provider server is running
- **ingest**: Parse documents (PDF, HTML, YouTube, etc.) into clean text
- **create**: Generate synthetic content (Q&A pairs, instructions, etc.) using LLM
- **curate**: Filter and clean generated content based on quality scores
- **save-as**: Convert data to different formats (fine-tuning format, JSON, etc.)
- **server**: Launch web interface for the toolkit

In [2]:
!synthetic-data-kit -c tutorial_config_team.yaml system-check

Loading config from: /usr/local/lib/python3.12/dist-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: /usr/local/lib/python3.12/dist-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: tutorial_config_team.yaml
Config has LLM provider set to: vllm
[1;34mEnvironment variable check:[0m
API_ENDPOINT_KEY: Not found
get_llm_provider returning: vllm
[?25l[32m vLLM server is running at [0m[4;94mhttp://localhost:8001/v1[0m
[2KAvailable models: [1m{[0m[32m'object'[0m: [32m'list'[0m, [32m'data'[0m: [1m[[0m[1m{[0m[32m'id'[0m: 
[32m'Unsloth/Llama-3.3-70B-Instruct'[0m, [32m'object'[0m: [32m'model'[0m, [32m'created'[0m: [1;36m1761720894[0m, 
[32m'owned_by'[0m: [32m'vllm'[0m, [32m'root'[0m: [32m'Unsloth/Llama-3.3-70B-Instruct'[0m, [32m'parent'[0m: [3;35mNone[0m, 
[32m'max_model_len'[0m: [1;36m48000[0m, [32m'permission'[0m: [1m[[0m[1m{[0m[

### Verifying LLM Server Status

This command checks if the vLLM server is running and accessible at `http://localhost:8001/v1`. It displays:
- Server status and endpoint
- Available models (here: Unsloth/Llama-3.3-70B-Instruct)
- Model configuration (max context length: 48000 tokens)

The system is configured to use the vLLM provider as specified in `tutorial_config.yaml`.

In [7]:
mkdir -p logical_reasoning/{sources,data/{input,parsed,generated,curated,final}}

### Creating Project Directory Structure

This command creates a well-organized directory structure for the logical reasoning project:
- `sources/`: Store original source documents (PDFs, etc.)
- `data/input/`: Input files for processing
- `data/parsed/`: Parsed text files after document ingestion
- `data/generated/`: Generated synthetic Q&A pairs
- `data/curated/`: Quality-filtered data after curation
- `data/final/`: Final formatted data ready for fine-tuning

In [10]:
pwd

'/workspace/AIAC'

### Navigating to Project Directory

Changes the current working directory to `logical_reasoning/` where all subsequent operations will take place.

In [None]:
!wget -P sources/ -q --show-progress   "https://www.csus.edu/indiv/d/dowdenb/4/logical-reasoning-archives/logical-reasoning-2017-12-02.pdf"   "https://people.cs.umass.edu/~pthomas/solutions/Liar_Truth.pdf"

### Downloading Source Documents

Downloads two PDF documents related to logical reasoning and liar/truth puzzles:
1. "Logical Reasoning" textbook from CSU Sacramento
2. "Liar and Truth Teller Puzzles" from UMass

These documents will serve as the knowledge base for generating synthetic training data. The `-q` flag runs wget in quiet mode, and `--show-progress` displays a progress bar.

In [6]:
!cp sources/* data/input/

cp: cannot stat 'sources/*': No such file or directory


### Copying Source Files to Input Directory

Copies all downloaded source documents from `sources/` to `data/input/` to prepare them for the ingestion pipeline.

In [6]:
!synthetic-data-kit ingest ./data/input/

Loading config from: /usr/local/lib/python3.12/dist-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: /usr/local/lib/python3.12/dist-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: /usr/local/lib/python3.12/dist-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
[34mProcessing directory: [0m[1;34m.[0m[1;35m/data/input/[0m
[34mFound [0m[1;36m2[0m[34m supported files to process[0m
[32m✓ Data.pdf[0m
[90m[[0m2025-10-29T07:01:39Z [33mWARN [0m lance::dataset::write::insert[90m][0m No existing dataset at data/parsed/filtered_riddlebench.lance, it will be created
[32m✓ filtered_riddlebench.pdf[0m

[1;34mProcessing Summary:[0m
Total files: [1;36m2[0m
[32mSuccessful: [0m[1;36m2[0m
[32mFailed: [0m[1;36m0[0m
[32m✅ All files processed successfully![0m


### Ingesting and Parsing Documents

This command processes the PDF files in `data/input/` using the synthetic-data-kit's **ingest** command:
- Extracts text content from PDFs
- Cleans and normalizes the text
- Saves parsed text files to `data/parsed/`

The output shows successful processing of 2 PDF files (Liar_Truth.pdf and logical-reasoning-2017-12-02.pdf).

Note: This will take about 10 minutes, set `--verbose` flag to see progress or reduce the `num-pairs` for a faster test

In [8]:
!synthetic-data-kit -c tutorial_config_team.yaml create ./data/parsed/ --type cot --num-pairs 500

Loading config from: /usr/local/lib/python3.12/dist-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: /usr/local/lib/python3.12/dist-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: tutorial_config_team.yaml
Config has LLM provider set to: vllm
get_llm_provider returning: vllm
[32m🔗 Using vllm provider[0m
[34mProcessing directory: [0m[1;34m.[0m[1;35m/data/parsed/[0m[34m for cot generation[0m
[34mFound [0m[1;36m2[0m[34m cot files to process[0m
Loading config from: tutorial_config_team.yaml
Config has LLM provider set to: vllm
L Using vllm provider
Processing 58 chunks to generate CoT examples...
Batch processing complete.                                                      
Generated 299 CoT examples total (requested: 500)
Generated 299 chain-of-thought examples
[32m✓ Data.lance[0m
Loading config from: tutorial_config_team.yaml
Config has LLM provider set t

### Generating Synthetic Q&A Pairs

This command uses the synthetic-data-kit's **create** command to generate Q&A pairs from the parsed text:
- Reads parsed text files from `data/parsed/`
- Uses the vLLM provider with Llama-3.3-70B-Instruct model
- Generates 50 Q&A pairs per file (`--num-pairs 50`)
- Type is set to `qa` for question-answer pair generation
- Outputs are saved to `data/generated/`

The process chunks the text and generates questions with corresponding answers. This took about 10 minutes for the full run. Use `--verbose` flag to see detailed progress or reduce `--num-pairs` for faster testing.

In [11]:
!synthetic-data-kit -c tutorial_config_team.yaml curate ./data/generated/ --threshold 8.5

Loading config from: /usr/local/lib/python3.12/dist-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: /usr/local/lib/python3.12/dist-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: tutorial_config_team.yaml
Config has LLM provider set to: vllm
get_llm_provider returning: vllm
[32m🔗 Using vllm provider[0m
[34mProcessing directory: [0m[1;34m.[0m[1;35m/data/generated/[0m[34m for curation[0m
[34mFound [0m[1;36m2[0m[34m JSON files to curate[0m
Loading config from: tutorial_config_team.yaml
Config has LLM provider set to: vllm
Loading config from: tutorial_config_team.yaml
Config has LLM provider set to: vllm
Processing 60 batches of QA pairs...
Batch processing complete.                                                      
Rated 299 QA pairs
Retained 81 pairs (threshold: 8.5)
Average score: 7.7
[32m✓ Data_cot_examples.json[0m
Loading config from: tutorial_conf

### Curating and Quality Filtering

This command uses the **curate** function to filter generated Q&A pairs based on quality:
- Evaluates each Q&A pair using quality metrics
- Filters pairs with quality score above threshold (7.0/10)
- Removes low-quality, inconsistent, or malformed pairs
- Saves curated data to `data/curated/`

This ensures only high-quality synthetic data is used for fine-tuning.

In [12]:
!synthetic-data-kit save-as ./data/curated/ --format ft

Loading config from: /usr/local/lib/python3.12/dist-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: /usr/local/lib/python3.12/dist-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: /usr/local/lib/python3.12/dist-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
[34mProcessing directory: [0m[1;34m.[0m[1;35m/data/curated/[0m[34m for format conversion to ft[0m
[34mFound [0m[1;36m2[0m[34m JSON files to convert to ft format[0m
[32m✓ Data_cot_examples_cleaned.json[0m
[32m✓ filtered_riddlebench_cot_examples_cleaned.json[0m

[1;34mFormat Conversion Summary [0m[1;34m([0m[1;34mft, json[0m[1;34m)[0m[1;34m:[0m
Total files: [1;36m2[0m
[32mSuccessful: [0m[1;36m2[0m
[32mFailed: [0m[1;36m0[0m
[32m✅ All files converted successfully![0m


### Converting to Fine-Tuning Format

This command uses the **save-as** function to convert curated Q&A pairs to fine-tuning format:
- Reads curated JSON files from `data/curated/`
- Converts to format `ft` (fine-tuning format with messages structure)
- Outputs are saved to `data/final/` with proper conversation format
- The resulting format is compatible with standard fine-tuning pipelines

Successfully converted 2 files to fine-tuning format.

In [5]:
import json
import glob
from pathlib import Path
from datasets import Dataset

# ===== CONFIGURATION =====
data_dir = "./data/final"  # Change this to your data directory

# ===== STEP 1: Find all FT files =====
data_path = Path(data_dir)
ft_files = glob.glob(str(data_path / "*.json"))

# ===== STEP 2: Load and convert all files =====
all_data = []

for file_path in ft_files:
    # Load the JSON file
    with open(file_path, 'r') as f:
        ft_data = json.load(f)
    
    # Convert each item
    for item in ft_data:
        if 'messages' not in item:
            continue
        
        # Extract only user and assistant messages
        conversation = []
        for msg in item['messages']:
            if msg['role'] == 'user' or msg['role'] == 'assistant':
                conversation.append({
                    "role": msg['role'],
                    "content": msg['content']
                })
        
        # Add to our data if we have at least one exchange
        if len(conversation) > 0:
            all_data.append({
                "conversations": conversation
            })

print(f"\n🎯 Total conversations: {len(all_data)}")

# ===== STEP 3: Create HuggingFace Dataset =====
dataset = Dataset.from_list(all_data)

# ===== STEP 4: Preview the data =====
print(json.dumps(dataset[0], indent=2))


🎯 Total conversations: 212
{
  "conversations": [
    {
      "content": "A family of five - grandmother, father, mother, son, and daughter - are sitting around a circular table. The grandmother is sitting opposite the father. The son is sitting next to the mother. If the daughter is not sitting next to the grandmother, who is sitting next to the daughter?",
      "role": "user"
    },
    {
      "content": "Final answer: Son. This is the unique solution because it satisfies all constraints simultaneously.",
      "role": "assistant"
    }
  ]
}


### Loading and Converting Data to HuggingFace Dataset

This cell performs comprehensive data processing:

1. **Finding Files**: Locates all JSON files in `data/final/` directory
2. **Loading Data**: Reads each JSON file containing fine-tuning formatted data
3. **Format Conversion**: Extracts user and assistant messages from the fine-tuning format
4. **Structuring Conversations**: Creates a standardized conversation format with role-content pairs
5. **Creating Dataset**: Converts the processed data into a HuggingFace Dataset object

The output shows 74 total conversations were successfully loaded and formatted. The preview displays a sample conversation showing a knight-and-knave logic puzzle with its solution.

## Fine-Tuning

### Note: Please remember to shutdown the vLLM instance!

In [1]:
import os
import json
import glob
import torch
import shutil
from pathlib import Path
from datasets import Dataset

### Importing Standard Libraries

Imports essential Python libraries for fine-tuning:
- `os`, `json`, `glob`: File system operations and JSON handling
- `torch`: PyTorch deep learning framework
- `shutil`: File operations
- `Path`: Path manipulation
- `Dataset`: HuggingFace datasets library for data handling

In [2]:
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template, standardize_sharegpt, train_on_responses_only
from trl import SFTConfig, SFTTrainer
from transformers import DataCollatorForSeq2Seq

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
INFO 10-29 16:25:30 [__init__.py:225] Automatically detected platform rocm.
🦥 Unsloth Zoo will now patch everything to make training faster!


### Importing Unsloth and Training Libraries

Imports specialized libraries for efficient fine-tuning:
- `FastLanguageModel` from Unsloth: Optimized model loading and training
- `get_chat_template`, `standardize_sharegpt`, `train_on_responses_only`: Chat formatting utilities
- `SFTConfig`, `SFTTrainer`: Supervised fine-tuning configuration and trainer from TRL
- `DataCollatorForSeq2Seq`: Handles batching and padding for sequence-to-sequence training

### Setup Unsloth model and tokenizer for ROCm without bitsandbytes

In [3]:
max_seq_length = 1024
dtype = torch.bfloat16  # Explicit bfloat16 for ROCm
load_in_4bit = False  
from huggingface_hub.utils import disable_progress_bars
disable_progress_bars()

import os, tqdm
os.environ["DISABLE_TQDM_NOTEBOOK"] = "1"
tqdm.tqdm = tqdm.std.tqdm

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.3-70B-Instruct",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    device_map="auto",
    torch_dtype=torch.bfloat16,  # Explicit for ROCm
    trust_remote_code=True,
)

print(f"✅ Loaded: Llama-3.3-70B-Instruct (bfloat16, ROCm compatible)")

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=64,  # Higher rank for 70B model
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                   "gate_proj", "up_proj", "down_proj"],
    lora_alpha=64,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

Unsloth: AMD currently is not stable with 4bit bitsandbytes. Disabling for now.
Are you certain you want to do remote code execution?
==((====))==  Unsloth 2025.10.9: Fast Llama patching. Transformers: 4.55.1. vLLM: 0.11.1rc3.dev39+gf417746ad.rocm700.
   \\   /|    . Num GPUs = 1. Max memory: 191.688 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0a0+git1c57644. ROCm Toolkit: 7.0.51831-a3e329ad8. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = None. FA2 = True]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


[2025-10-29 16:25:37] INFO modeling.py:987: We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).


Loading checkpoint shards:   0%|          | 0/30 [00:00<?, ?it/s]

✅ Loaded: Llama-3.3-70B-Instruct (bfloat16, ROCm compatible)


Unsloth 2025.10.9 patched 80 layers with 80 QKV layers, 80 O layers and 80 MLP layers.


### Loading Llama-3.3-70B Model with LoRA

This cell sets up the model for efficient fine-tuning on AMD ROCm hardware:

**Model Configuration:**
- Model: Llama-3.3-70B-Instruct (70 billion parameters)
- Data type: bfloat16 for ROCm compatibility
- No quantization (load_in_4bit=False) to avoid bitsandbytes dependency
- Max sequence length: 1024 tokens

**LoRA (Low-Rank Adaptation) Configuration:**
- Rank (r): 64 - Higher rank for the large 70B model
- Target modules: All attention and MLP layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
- LoRA alpha: 64
- Dropout: 0 (no dropout)
- Gradient checkpointing: "unsloth" for memory efficiency

LoRA enables efficient fine-tuning by only training small adapter layers instead of the entire 70B model, making it feasible to train on a single AMD MI300X GPU with 192GB HBM3 memory.

In [6]:
"""Prepare dataset with proper chat template and tensor compatibility"""
print("🔧 Preparing dataset for training...")

# Set chat template
tokenizer = get_chat_template(tokenizer, chat_template="llama-3.1")

# Ensure pad token is set
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Formatting function that ensures proper tensor conversion
def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = []
    
    for convo in convos:
        # Ensure conversation is in correct format
        if isinstance(convo, list) and all(isinstance(msg, dict) for msg in convo):
            text = tokenizer.apply_chat_template(convo, tokenize=False, add_generation_prompt=False)
            texts.append(text)
        else:
            print(f"⚠️  Skipping malformed conversation: {type(convo)}")
            continue
    
    return {"text": texts}

dataset = standardize_sharegpt(dataset)

dataset = dataset.map(formatting_prompts_func, batched=True, remove_columns=dataset.column_names)

dataset = dataset.filter(lambda x: len(x["text"].strip()) > 0)

print(f"✅ Prepared {len(dataset)} valid examples for training")

# Show sample
if len(dataset) > 0:
    print(f"📝 Sample formatted text:")
    print(dataset["text"][0][:200] + "...")

🔧 Preparing dataset for training...


Unsloth: Standardizing formats (num_proc=20):   0%|          | 0/212 [00:00<?, ? examples/s]

Map:   0%|          | 0/212 [00:00<?, ? examples/s]

Filter:   0%|          | 0/212 [00:00<?, ? examples/s]

✅ Prepared 212 valid examples for training
📝 Sample formatted text:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 July 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

A family of five - gran...


### Preparing Dataset with Chat Template

This cell formats the dataset for fine-tuning:

**Steps:**
1. **Set Chat Template**: Applies Llama-3.1 chat template formatting
2. **Configure Padding**: Sets pad token to eos token if not already set
3. **Format Conversations**: The `formatting_prompts_func` function:
   - Takes raw conversations from the dataset
   - Applies the chat template to format them properly
   - Validates conversation structure (list of dicts with role/content)
   - Filters out malformed conversations
4. **Standardize Format**: Uses `standardize_sharegpt` to normalize the data structure
5. **Apply Formatting**: Maps the formatting function across all examples
6. **Remove Empty**: Filters out any empty or invalid formatted texts

The output shows 74 valid examples were successfully prepared. A sample of the formatted text is displayed, showing the proper Llama-3.1 chat template structure with system, user, and assistant headers.

In [7]:
"""Train model with ROCm-optimized settings"""
# Ensure tokenizer has proper padding
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id

# Setup trainer with ROCm-friendly settings and proper data handling
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
    packing=False,
    args=SFTConfig(
        per_device_train_batch_size=64,  # 🚀 MI300X can handle this with 192GB HBM3!
        gradient_accumulation_steps=1,   # Effective batch size = 8*2 = 16
        warmup_steps=5,
        num_train_epochs=5,
        learning_rate=1e-4,
        logging_steps=1,
        optim="adamw_8bit",  # Pure torch optimizer
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="logical_reasoning_rocm_outputs",
        report_to="none",
        bf16=True,
        dataloader_pin_memory=False,
        remove_unused_columns=True,  # Remove unused columns to avoid tensor issues
        gradient_checkpointing=True,
        dataloader_num_workers=0,  # Single worker for ROCm stability
    ),
)

# Train only on responses
trainer = train_on_responses_only(
    trainer,
    instruction_part="<|start_header_id|>user<|end_header_id|>\n\n",
    response_part="<|start_header_id|>assistant<|end_header_id|>\n\n",
)

FastLanguageModel.for_training(model)
trainer_stats = trainer.train()


trainer_stats = trainer.train()

Unsloth: Tokenizing ["text"] (num_proc=24):   0%|          | 0/212 [00:00<?, ? examples/s]

Map (num_proc=24):   0%|          | 0/212 [00:00<?, ? examples/s]

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 212 | Num Epochs = 5 | Total steps = 20
O^O/ \_/ \    Batch size per device = 64 | Gradient accumulation steps = 1
\        /    Data Parallel GPUs = 1 | Total batch size (64 x 1 x 1) = 64
 "-____-"     Trainable parameters = 828,375,040 of 71,382,081,536 (1.16% trained)


Step,Training Loss
1,4.3654
2,4.4722
3,4.3151
4,3.2126
5,2.5491
6,1.523
7,0.6651
8,0.3334
9,0.3481
10,0.2836


Unsloth: Will smartly offload gradients to save VRAM!


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 212 | Num Epochs = 5 | Total steps = 20
O^O/ \_/ \    Batch size per device = 64 | Gradient accumulation steps = 1
\        /    Data Parallel GPUs = 1 | Total batch size (64 x 1 x 1) = 64
 "-____-"     Trainable parameters = 828,375,040 of 71,382,081,536 (1.16% trained)


Step,Training Loss
1,0.2213
2,0.161
3,0.1586
4,0.2664
5,0.1529
6,0.1643
7,0.18
8,0.1022
9,0.1427
10,0.113


### Training the Model with ROCm-Optimized Settings

This cell configures and executes the fine-tuning process:

**Training Configuration (SFTConfig):**
- **Batch size**: 64 per device - leveraging the AMD MI300X's massive 192GB HBM3 memory
- **Gradient accumulation**: 1 step
- **Warmup**: 5 steps
- **Epochs**: 1 full pass through the dataset
- **Learning rate**: 1e-4
- **Optimizer**: adamw_8bit for memory efficiency
- **Precision**: bf16 (bfloat16) for ROCm
- **Gradient checkpointing**: Enabled for memory efficiency

**Special Training Mode:**
Uses `train_on_responses_only` to compute loss only on the assistant's responses, not on the user's questions. This focuses the model on learning to generate accurate answers rather than memorizing the input format.

**Key Features:**
- DataCollatorForSeq2Seq handles variable-length sequences with proper padding
- No packing to preserve conversation structure
- Single dataloader worker for ROCm stability
- Gradient checkpointing via Unsloth for memory optimization

The model is then trained on the 74 logical reasoning conversations.

In [8]:
"""Save the trained model"""
print("\n💾 SAVING ROCM-TRAINED MODEL")

# Save LoRA adapters
lora_path = "logical_reasoning_rocm_lora_final"
model.save_pretrained(lora_path)
tokenizer.save_pretrained(lora_path)
print(f"✅ LoRA adapters saved to: {lora_path}")

# Save merged model
merged_path = "logical_reasoning_rocm_merged_final"
print("🔄 Saving merged model...")
model.save_pretrained_merged(merged_path, tokenizer, save_method="merged_16bit")
print(f"✅ Merged model saved to: {merged_path}")

print(f"\n🎉 ROCM MODEL READY!")


💾 SAVING ROCM-TRAINED MODEL
✅ LoRA adapters saved to: logical_reasoning_rocm_lora_final
🔄 Saving merged model...
Found HuggingFace hub cache directory: /root/.cache/huggingface/hub
Checking cache directory for required files...


Unsloth: Copying 30 files from cache to `logical_reasoning_rocm_merged_final`: 100%|████████████████| 30/30 [00:59<00:00,  1.99s/it]


Successfully copied all 30 files from cache to `logical_reasoning_rocm_merged_final`
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.


Unsloth: Preparing safetensor model files: 100%|████████████████████████████████████████████████| 30/30 [00:00<00:00, 457560.44it/s]
Unsloth: Merging weights into 16bit: 100%|██████████████████████████████████████████████████████████| 30/30 [03:35<00:00,  7.20s/it]


Unsloth: Merge process complete. Saved to `/workspace/AIAC/logical_reasoning_rocm_merged_final`
✅ Merged model saved to: logical_reasoning_rocm_merged_final

🎉 ROCM MODEL READY!


### Saving the Fine-Tuned Model

This cell saves the trained model in two formats:

1. **LoRA Adapters** (`logical_reasoning_rocm_lora/`):
   - Saves only the trained LoRA adapter weights (lightweight, ~few hundred MB)
   - Can be loaded later with the base model
   - Useful for sharing or deploying with the original base model

2. **Merged Model** (`logical_reasoning_rocm_merged/`):
   - Merges LoRA adapters back into the base model
   - Creates a standalone model with all weights
   - Saved in 16-bit precision for better quality
   - Ready for immediate inference without loading adapters

Both formats include the tokenizer configuration. The merged model is production-ready and can be used directly for generating answers to logical reasoning questions.

In [None]:
#fin

In [18]:
# ============================================================
# 🔁 POST-SFT: GRPO (TRL) on YOUR seating/blood-relations data
# Uses the merged SFT model you just saved.
# Trains answer behavior with verifiable reward; same model serves Q-agent and A-agent.
# ============================================================

import os, json, re, time, torch, random
from typing import List, Dict, Any, Optional
from datasets import Dataset
from transformers import AutoTokenizer, GenerationConfig
from unsloth import FastLanguageModel
from trl import GRPOConfig, GRPOTrainer

# --------------------------
# Paths (ensure these exist from your previous SFT cell)
# --------------------------
MERGED_PATH = "logical_reasoning_rocm_merged_final"     # from your SFT "merged_16bit" save
DATA_PATH   = "data/final/filtered_riddlebench_cot_examples_cleaned_ft.json"  # your real dataset (list[{"question","answer"}])
RL_OUT      = "logical_reasoning_rocm_rl_out"
FINAL_OUT   = "logical_reasoning_rocm_rl_final_merged"
os.makedirs(RL_OUT, exist_ok=True)
os.makedirs(FINAL_OUT, exist_ok=True)

# --------------------------
# Load tokenizer + model
# --------------------------
tokenizer = AutoTokenizer.from_pretrained(MERGED_PATH, use_fast=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id

# Load 16-bit merged model and re-wrap for training (Unsloth wrapper keeps ROCm-friendly kernels)
model, _ = FastLanguageModel.from_pretrained(
    model_name = MERGED_PATH,
    max_seq_length = 2048,
    dtype = None,            # use bf16 if available
    load_in_4bit = False,    # merged_16bit was saved; keep 16-bit for stability
)
model = FastLanguageModel.for_training(model)

# Optional compile on ROCm if available
if hasattr(torch, "compile"):
    try:
        model = torch.compile(model, mode="max-autotune")
    except Exception:
        pass

# --------------------------
# Prepare dataset (expects list of {"question": str, "answer": str})
# Reward is computed against "answer" column (ground_truth).
# --------------------------
def _load_qa(path: str):
    import json
    data = json.load(open(path, "r", encoding="utf-8"))
    out = []
    for sample in data:
        msgs = sample.get("messages", [])
        q, a = None, None
        for m in msgs:
            if m.get("role") == "user":
                q = m.get("content", "").strip()
            elif m.get("role") == "assistant":
                a = m.get("content", "").strip()
        if q and a:
            out.append({"question": q, "ground_truth": a})
    return out

data_list = _load_qa(DATA_PATH)
print(f"Loaded {len(data_list)} usable Q/A pairs")

# Prompting to force compact, verifiable JSON with short thinking
SYS = "You are a fast reasoning agent for seating arrangements and blood relations. Respond ONLY as compact JSON."
XML_STYLE = (
    'Format strictly as: {"answer":"<final_relation_or_person>","rationale":"<brief reasoning 1-2 lines>"}'
)

def make_prompt(q: str) -> List[Dict[str,str]]:
    # TRL GRPOTrainer supports "messages" style prompts; we pass a list of role/content dicts.
    return [
        {"role": "system", "content": f"{SYS} {XML_STYLE}"},
        {"role": "user",   "content": q},
    ]

# Build HF dataset with "messages" + "ground_truth" columns so GRPO can forward it to reward_fn
messages = [ make_prompt(ex["question"]) for ex in data_list ]
ground   = [ ex["ground_truth"] for ex in data_list ]
train_ds = Dataset.from_dict({"messages": messages, "ground_truth": ground})

# --------------------------
# Reward function (verifiable)
# Rewards:
#   +1.0 exact match (normalized)
#   +0.7 if JSON well-formed & answer present & high token overlap with ground truth
#   +0.0 otherwise
# --------------------------
_non_alnum = re.compile(r"[^a-z0-9]+")

def _norm(s: str) -> str:
    return _non_alnum.sub(" ", s.lower()).strip()

def _extract_completion_text(completion: Any) -> str:
    # TRL passes `completions` either as list[str] (standard) or list[list[{"role","content"}]] (chat)
    if isinstance(completion, list):
        # pick assistant message content
        if len(completion) > 0 and isinstance(completion[0], dict) and "content" in completion[0]:
            return completion[0]["content"]
        return " ".join([str(x) for x in completion])
    return str(completion)

def _extract_json_answer(text: str) -> Optional[str]:
    m = re.search(r"\{.*\}", text, flags=re.S)
    if not m:
        return None
    try:
        obj = json.loads(m.group(0))
        ans = obj.get("answer")
        return ans.strip() if isinstance(ans, str) else None
    except Exception:
        return None

def reward_func(completions: List[Any], ground_truth: List[str], **kwargs) -> List[float]:
    scores = []
    for comp, gt in zip(completions, ground_truth):
        txt = _extract_completion_text(comp)
        pred_ans = _extract_json_answer(txt)
        gt_n = _norm(gt)
        if pred_ans is None:
            scores.append(0.0)
            continue
        pn = _norm(pred_ans)
        if pn == gt_n and pn != "":
            scores.append(1.0)
        elif pn and gt_n:
            s_pred, s_gt = set(pn.split()), set(gt_n.split())
            j = len(s_pred & s_gt) / max(1, len(s_pred | s_gt))
            scores.append(0.7 * j)
        else:
            scores.append(0.0)
    return scores

# --------------------------
# GRPO config (fast, stable, short outputs)
#   - dapo loss to reduce length bias
#   - mask_truncated_completions True
#   - small num_generations for speed
#   - bf16 on MI300X
# --------------------------
training_args = GRPOConfig(
    output_dir = RL_OUT,
    learning_rate = 5e-6,
    weight_decay = 0.01,
    per_device_train_batch_size = 8,     # MI300X can go higher; keep modest for stability
    gradient_accumulation_steps = 1,
    max_steps = 300,                      # extend if you have time budget
    logging_steps = 10,
    save_steps = 0,
    bf16 = True,
    num_generations = 4,                  # GRPO group size
    temperature = 0.4,
    top_p = 0.9,
    max_prompt_length = 512,
    max_completion_length = 96,           # keep short for 6s inference target later
    loss_type = "dapo",
    mask_truncated_completions = True,
    scale_rewards = "batch",              # robust scaling
    beta = 0.0,                           # KL off per TRL defaults
    remove_unused_columns = False,        # we pass messages + ground_truth explicitly
)

# --------------------------
# GRPO Trainer (TRL)
# Pass the loaded model object, tokenizer as processing_class, our dataset, and reward function.
# --------------------------
trainer = GRPOTrainer(
    model = model,
    args = training_args,
    train_dataset = train_ds,
    reward_funcs = reward_func,           # can also be a list of functions
    processing_class = tokenizer,         # tokenizer with left-padding
    dataset_text_field = "messages",      # chat messages column
)

trainer.train()

# --------------------------
# Save LoRA and final merged model
# --------------------------
# If you trained with LoRA params attached, you can save just adapters as well:
try:
    model.save_lora(os.path.join(RL_OUT, "grpo_lora"))
except Exception:
    pass

# Switch to inference graph + merge to 16-bit for serving
model = FastLanguageModel.for_inference(model)
model.save_pretrained_merged(
    FINAL_OUT,
    tokenizer = tokenizer,
    save_method = "merged_16bit",
    max_shard_size = "2GB",
)
print(f"✅ RL+SFT merged model saved to: {FINAL_OUT}")

# ============================================================
# ⚡ Inference utilities with latency-friendly generation
#   A-agent (answers) target <6s, Q-agent (questions) target <10s
#   Use same FINAL_OUT model for both.
# ============================================================
final_model, _ = FastLanguageModel.from_pretrained(
    model_name = FINAL_OUT,
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = False,
)
final_model = FastLanguageModel.for_inference(final_model)

answer_gen_cfg = GenerationConfig(
    max_new_tokens = 96, temperature = 0.0, top_p = 1.0, do_sample = False
)
question_gen_cfg = GenerationConfig(
    max_new_tokens = 128, temperature = 0.3, top_p = 0.9, do_sample = True
)

def answer_agent(question: str) -> Dict[str, Any]:
    messages = make_prompt(question)
    enc = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(final_model.device)
    t0 = time.time()
    with torch.no_grad():
        out = final_model.generate(input_ids=enc, generation_config=answer_gen_cfg)
    latency = time.time() - t0
    text = tokenizer.decode(out[0], skip_special_tokens=True)
    m = re.search(r"\{.*\}", text, flags=re.S)
    obj = json.loads(m.group(0)) if m else {"answer":"", "rationale":""}
    return {"latency_sec": round(latency, 3), "raw": text[-600:], "json": obj}

def question_agent(context_hint: str) -> Dict[str, Any]:
    # Generate a new question constrained to the 2 topics. Keep concise for <10s.
    sys = "You generate ONLY a single challenging question as plain text. Topic must be seating arrangement OR blood relations. Keep to 1-3 sentences."
    messages = [
        {"role":"system","content":sys},
        {"role":"user","content":f"Context: {context_hint}\nGenerate one new question."},
    ]
    enc = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(final_model.device)
    t0 = time.time()
    with torch.no_grad():
        out = final_model.generate(input_ids=enc, generation_config=question_gen_cfg)
    latency = time.time() - t0
    text = tokenizer.decode(out[0], skip_special_tokens=True)
    # return just the last 1-3 sentences as the question
    return {"latency_sec": round(latency, 3), "question": text.strip().split("</assistant>")[-1].strip()}

# Quick smoke tests (comment out in production)
# print(answer_agent("Ravi is father of Meena. Meena is mother of Arjun. What is Ravi to Arjun?"))
# print(question_agent("Create a blood relation puzzle involving uncle, aunt, nephew."))



Unsloth: AMD currently is not stable with 4bit bitsandbytes. Disabling for now.
==((====))==  Unsloth 2025.10.9: Fast Llama patching. Transformers: 4.55.1. vLLM: 0.11.1rc3.dev39+gf417746ad.rocm700.
   \\   /|    . Num GPUs = 1. Max memory: 191.688 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0a0+git1c57644. ROCm Toolkit: 7.0.51831-a3e329ad8. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = None. FA2 = True]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/30 [00:00<?, ?it/s]



Loaded 131 usable Q/A pairs
Unsloth: The DAPO paper recommends `epsilon_high = 0.28`


NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

In [16]:
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"