# üß† Google Tunix Hack - Reasoning Model Training

**Author:** Om Borda (omborda2002)  
**Competition:** Google Tunix Hack  
**Model:** Gemma 2B  

## Output Format
```
<reasoning>step-by-step thinking</reasoning>
<answer>final answer</answer>
```

## Datasets (~250k samples)
- GSM8K (7.4k) - Math word problems
- OpenThoughts-114k (100k) - R1 distilled reasoning
- Bespoke-Stratos-17k (17k) - High quality R1
- Medical-O1 (44k) - Medical reasoning
- MetaMathQA (80k) - Augmented math

## 1. Setup

In [13]:
!pip install -q transformers datasets accelerate bitsandbytes peft trl

In [14]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset, Dataset
import random
import re
import os

print(f"PyTorch: {torch.__version__}")
print(f"CUDA: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

PyTorch: 2.8.0+cu126
CUDA: True
GPU: Tesla P100-PCIE-16GB


## 2. Configuration

In [15]:
# Detect Kaggle environment
IS_KAGGLE = os.path.exists('/kaggle')

CONFIG = {
    # Model path (Kaggle hub or HuggingFace)
    "model_name": "/kaggle/input/gemma-2/transformers/gemma-2-2b-it/1" if IS_KAGGLE else "google/gemma-2-2b-it",
    "max_seq_length": 1024,
    
    # LoRA
    "lora_r": 32,
    "lora_alpha": 64,
    "lora_dropout": 0.05,
    
    # Training
    "batch_size": 4,
    "gradient_accumulation_steps": 8,
    "learning_rate": 2e-4,
    "num_epochs": 1,
    "warmup_ratio": 0.03,
    
    # Output
    "output_dir": "/kaggle/working/gemma-reasoning" if IS_KAGGLE else "./gemma-reasoning",
}

# Dataset limits
DATASET_LIMITS = {
    "gsm8k": None,           # All 7.4k
    "openthoughts": None,   # 100k from 114k
    "stratos": None,          # All 17k
    "medical_o1": None,       # All ~44k
    "metamath": None,        # 80k from 395k
}

print(f"Running on: {'Kaggle' if IS_KAGGLE else 'Local'}")
print(f"Model: {CONFIG['model_name']}")

Running on: Kaggle
Model: /kaggle/input/gemma-2/transformers/gemma-2-2b-it/1


## 3. Data Formatters

In [16]:
def extract_think_answer(text):
    """Extract thinking from <think> tags."""
    think_match = re.search(r'<think>(.*?)</think>', text, re.DOTALL)
    thinking = think_match.group(1).strip() if think_match else ""
    if '</think>' in text:
        answer = text.split('</think>')[-1].strip()
    else:
        answer = text
    return thinking, answer

def format_gsm8k(example):
    """Format GSM8K math problems."""
    question = example.get('question', '')
    answer_text = example.get('answer', '')
    
    if '####' in answer_text:
        reasoning = answer_text.split('####')[0].strip()
        final = answer_text.split('####')[1].strip()
    else:
        reasoning = answer_text
        final = answer_text.split('\n')[-1]
    
    return {
        "instruction": question,
        "response": f"<reasoning>\n{reasoning}\n</reasoning>\n<answer>{final}</answer>"
    }

def format_openthoughts(example):
    """Format OpenThoughts-114k."""
    try:
        conversations = example.get('conversations', [])
        question, answer = "", ""
        
        for conv in conversations:
            role = conv.get('from', '')
            if role in ['human', 'user']:
                question = conv.get('value', '')
            elif role in ['gpt', 'assistant']:
                answer = conv.get('value', '')
        
        if not question or not answer:
            return None
        
        thinking, final = extract_think_answer(answer)
        if thinking:
            response = f"<reasoning>\n{thinking}\n</reasoning>\n<answer>{final}</answer>"
        else:
            response = f"<reasoning>\n{answer[:1500]}\n</reasoning>\n<answer>{answer[-300:]}</answer>"
        
        return {"instruction": question, "response": response}
    except:
        return None

def format_stratos(example):
    """Format Bespoke-Stratos-17k."""
    try:
        conversations = example.get('conversations', [])
        question, answer = "", ""
        
        for conv in conversations:
            role = conv.get('from', '')
            if role in ['human', 'user']:
                question = conv.get('value', '')
            elif role in ['gpt', 'assistant']:
                answer = conv.get('value', '')
        
        if not question or not answer:
            return None
        
        thinking, final = extract_think_answer(answer)
        if thinking:
            response = f"<reasoning>\n{thinking}\n</reasoning>\n<answer>{final}</answer>"
        else:
            response = f"<reasoning>\n{answer[:1500]}\n</reasoning>\n<answer>{answer[-300:]}</answer>"
        
        return {"instruction": question, "response": response}
    except:
        return None

def format_medical_o1(example):
    """Format Medical O1 reasoning."""
    try:
        question = example.get('Question', '')
        cot = example.get('Complex_CoT', '')
        response_text = example.get('Response', '')
        
        if not question:
            return None
        
        if cot:
            response = f"<reasoning>\n{cot[:1500]}\n</reasoning>\n<answer>{response_text}</answer>"
        else:
            response = f"<reasoning>\nAnalyzing medical question.\n</reasoning>\n<answer>{response_text}</answer>"
        
        return {"instruction": question, "response": response}
    except:
        return None

def format_metamath(example):
    """Format MetaMathQA."""
    query = example.get('query', '')
    response = example.get('response', '')
    
    if 'The answer is' in response:
        parts = response.split('The answer is')
        reasoning = parts[0].strip()
        final = parts[1].strip().rstrip('.')
    else:
        reasoning = response
        final = response.split('\n')[-1]
    
    return {
        "instruction": query,
        "response": f"<reasoning>\n{reasoning[:1500]}\n</reasoning>\n<answer>{final}</answer>"
    }

print("‚úì Formatters ready")

‚úì Formatters ready


## 4. Gemma Chat Template

In [17]:
def create_prompt(instruction: str) -> str:
    """Gemma chat format."""
    return f"<start_of_turn>user\n{instruction}\n<end_of_turn>\n<start_of_turn>model\n"

def format_for_training(example: dict) -> dict:
    """Final training format."""
    if example is None:
        return None
    prompt = create_prompt(example["instruction"])
    return {"text": prompt + example["response"] + "<end_of_turn>"}

# Example
sample = format_for_training({"instruction": "What is 2+2?", "response": "<reasoning>\n2+2=4\n</reasoning>\n<answer>4</answer>"})
print("Sample format:")
print(sample["text"])

Sample format:
<start_of_turn>user
What is 2+2?
<end_of_turn>
<start_of_turn>model
<reasoning>
2+2=4
</reasoning>
<answer>4</answer><end_of_turn>


## 5. Load Datasets

In [18]:
def load_and_format_dataset(name, config, formatter, limit, desc):
    """Load and format a single dataset."""
    print(f"\nüìä Loading {desc}...")
    try:
        if config:
            ds = load_dataset(name, config, split="train")
        else:
            ds = load_dataset(name, split="train")
        
        if limit and len(ds) > limit:
            ds = ds.shuffle(seed=42).select(range(limit))
        
        formatted = [formatter(ex) for ex in ds]
        formatted = [f for f in formatted if f is not None]
        
        print(f"   ‚úì {len(formatted):,} examples")
        return formatted
    except Exception as e:
        print(f"   ‚úó Failed: {str(e)[:50]}")
        return []

# Load all datasets
all_examples = []

all_examples += load_and_format_dataset("gsm8k", "main", format_gsm8k, DATASET_LIMITS["gsm8k"], "GSM8K")
all_examples += load_and_format_dataset("open-thoughts/OpenThoughts-114k", None, format_openthoughts, DATASET_LIMITS["openthoughts"], "OpenThoughts")
all_examples += load_and_format_dataset("bespokelabs/Bespoke-Stratos-17k", None, format_stratos, DATASET_LIMITS["stratos"], "Stratos")
all_examples += load_and_format_dataset("FreedomIntelligence/medical-o1-reasoning-SFT", "en", format_medical_o1, DATASET_LIMITS["medical_o1"], "Medical-O1-en")
all_examples += load_and_format_dataset("FreedomIntelligence/medical-o1-reasoning-SFT", "en_mix", format_medical_o1, DATASET_LIMITS["medical_o1"], "Medical-O1-mix")
all_examples += load_and_format_dataset("meta-math/MetaMathQA", None, format_metamath, DATASET_LIMITS["metamath"], "MetaMath")

print(f"\nüìä Total collected: {len(all_examples):,}")


üìä Loading GSM8K...
   ‚úì 7,473 examples

üìä Loading OpenThoughts...


Generating train split:   0%|          | 0/113957 [00:00<?, ? examples/s]

   ‚úì 113,957 examples

üìä Loading Stratos...


README.md: 0.00B [00:00, ?B/s]

data/train-00000-of-00001.parquet:   0%|          | 0.00/125M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/16710 [00:00<?, ? examples/s]

   ‚úì 16,710 examples

üìä Loading Medical-O1-en...


README.md: 0.00B [00:00, ?B/s]

medical_o1_sft.json:   0%|          | 0.00/58.2M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/19704 [00:00<?, ? examples/s]

   ‚úì 19,704 examples

üìä Loading Medical-O1-mix...


medical_o1_sft_mix.json:   0%|          | 0.00/73.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/24887 [00:00<?, ? examples/s]

   ‚úì 24,887 examples

üìä Loading MetaMath...


README.md: 0.00B [00:00, ?B/s]

MetaMathQA-395K.json:   0%|          | 0.00/396M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/395000 [00:00<?, ? examples/s]

   ‚úì 395,000 examples

üìä Total collected: 577,731


In [19]:
# Shuffle and prepare final dataset
random.seed(42)
random.shuffle(all_examples)

# Filter valid examples
valid = []
for ex in all_examples:
    if ex and len(ex.get("instruction", "")) > 10 and len(ex.get("response", "")) > 30:
        if "<reasoning>" in ex["response"] and "<answer>" in ex["response"]:
            valid.append(ex)

# Format for training
final_data = [format_for_training(ex) for ex in valid]
final_data = [f for f in final_data if f and len(f["text"]) < 4000]  # Skip very long

dataset = Dataset.from_list(final_data)
print(f"\n‚úÖ Final training dataset: {len(dataset):,} samples")


‚úÖ Final training dataset: 570,699 samples


In [20]:
# Preview sample
print("üìù Sample training example:")
print("="*60)
print(dataset[0]["text"][:800])
print("="*60)

üìù Sample training example:
<start_of_turn>user
What is $ 6 \div 3 - 2 - X + 2 \cdot 8$?
If we know the answer to the above question is 8, what is the value of unknown variable X?
<end_of_turn>
<start_of_turn>model
<reasoning>
We want to find the value of $X$ in the given expression.
Using the order of operations (PEMDAS), we can simplify the expression:
$6 \div 3 - 2 - X + 2 \cdot 8$
First, we perform the multiplication:
$6 \div 3 - 2 - X + 16$
Next, we perform the division:
$2 - 2 - X + 16$
Then, we perform the subtraction:
$0 - X + 16$
Finally, we perform the addition:
$-X + 16$
We are given that the value of the expression is 8, so we can write:
$-X + 16 = 8$
To solve for $X$, we can subtract 16 from both sides of the equation:
$-X = 8 - 16$
$-X = -8$
Dividing both sides of the equation by -1, we find:
$X = 8$
Th


## 6. Load Model

In [21]:
# Quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

model_path = CONFIG['model_name']
print(f"Model path: {model_path}")

# Get HuggingFace token from Kaggle secrets
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()

try:
    hf_token = user_secrets.get_secret("HUGGINGFACE_TOKEN")
    print("‚úì HuggingFace token found")
except:
    try:
        hf_token = user_secrets.get_secret("HF_TOKEN")
        print("‚úì HuggingFace token found")
    except:
        raise ValueError("Please add HUGGINGFACE_TOKEN or HF_TOKEN to Kaggle Secrets")

# For Kaggle, load from HuggingFace Hub with authentication
if IS_KAGGLE:
    print("Loading from HuggingFace Hub with authentication...")
    model_name = "google/gemma-2-2b-it"
else:
    model_name = model_path
    hf_token = None

# Load tokenizer
print(f"Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(model_name, token=hf_token)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
print(f"‚úì Tokenizer loaded")

# Load model
print(f"Loading model...")
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    token=hf_token,
)

model = prepare_model_for_kbit_training(model)
print("‚úì Model loaded")

Model path: /kaggle/input/gemma-2/transformers/gemma-2-2b-it/1
‚úì HuggingFace token found
Loading from HuggingFace Hub with authentication...
Loading tokenizer...


tokenizer_config.json:   0%|          | 0.00/47.0k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

‚úì Tokenizer loaded
Loading model...


config.json:   0%|          | 0.00/838 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors.index.json:   0%|          | 0.00/24.2k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/241M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

‚úì Model loaded


## 7. Apply LoRA

In [22]:
lora_config = LoraConfig(
    r=CONFIG["lora_r"],
    lora_alpha=CONFIG["lora_alpha"],
    lora_dropout=CONFIG["lora_dropout"],
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", 
                   "gate_proj", "up_proj", "down_proj"],
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

trainable params: 41,533,440 || all params: 2,655,875,328 || trainable%: 1.5638


## 8. Training

In [23]:
# Training config
training_args = SFTConfig(
    output_dir=CONFIG["output_dir"],
    num_train_epochs=CONFIG["num_epochs"],
    per_device_train_batch_size=CONFIG["batch_size"],
    gradient_accumulation_steps=CONFIG["gradient_accumulation_steps"],
    learning_rate=CONFIG["learning_rate"],
    warmup_ratio=CONFIG["warmup_ratio"],
    logging_steps=50,
    save_strategy="steps",
    save_steps=500,
    save_total_limit=2,
    bf16=True,
    optim="paged_adamw_32bit",
    lr_scheduler_type="cosine",
    report_to="none",
    gradient_checkpointing=True,
    max_length=CONFIG["max_seq_length"],
)

# Initialize trainer
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    processing_class=tokenizer,
    args=training_args,
)

# Estimate
steps = len(dataset) // (CONFIG["batch_size"] * CONFIG["gradient_accumulation_steps"])
print(f"\nüöÄ Training Plan:")
print(f"   Samples: {len(dataset):,}")
print(f"   Steps: ~{steps:,}")
print(f"   Estimated time: ~4-6 hours")

Adding EOS to train dataset:   0%|          | 0/570699 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/570699 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/570699 [00:00<?, ? examples/s]


üöÄ Training Plan:
   Samples: 570,699
   Steps: ~17,834
   Estimated time: ~4-6 hours


In [24]:
# Train!
print("\n" + "="*60)
print("üöÄ STARTING TRAINING")
print("="*60)

trainer.train()

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 1, 'pad_token_id': 1}.



üöÄ STARTING TRAINING


OutOfMemoryError: CUDA out of memory. Tried to allocate 2.25 GiB. GPU 0 has a total capacity of 15.89 GiB of which 1.92 GiB is free. Process 4591 has 13.97 GiB memory in use. Of the allocated memory 9.53 GiB is allocated by PyTorch, and 4.14 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

In [None]:
# Save model
print("\nüíæ Saving model...")
trainer.save_model()
tokenizer.save_pretrained(CONFIG["output_dir"])
print(f"‚úÖ Model saved to {CONFIG['output_dir']}")

## 9. Test Model

In [None]:
def generate_response(question, max_tokens=400):
    """Generate a response with reasoning."""
    prompt = create_prompt(question)
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    model.eval()
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_tokens,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
        )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    if "<start_of_turn>model" in response:
        response = response.split("<start_of_turn>model")[-1]
    return response.strip()

In [None]:
# Test questions
test_questions = [
    "What is 125 + 347?",
    "Solve: 2x + 5 = 13",
    "A train travels 240 km in 4 hours. What is its speed?",
    "What is the probability of rolling a 6 on a fair die?",
    "What are the symptoms of diabetes?",
]

print("="*60)
print("üß™ MODEL EVALUATION")
print("="*60)

for q in test_questions:
    print(f"\nüìù Question: {q}")
    print(f"ü§ñ Response:\n{generate_response(q)}")
    print("-"*60)

## 10. Summary

### Training Complete! ‚úÖ

**Model:** Gemma 2B fine-tuned with LoRA

**Output Format:**
```
<reasoning>step-by-step thinking</reasoning>
<answer>final answer</answer>
```

**Datasets Used:**
- GSM8K - Math word problems
- OpenThoughts - R1 distilled reasoning
- Bespoke-Stratos - High quality reasoning
- Medical-O1 - Medical reasoning
- MetaMathQA - Augmented math

**Hyperparameters:**
- LoRA rank: 32, alpha: 64
- Learning rate: 2e-4
- Batch size: 4 √ó 8 = 32 effective
- 1 epoch over ~250k samples