# Alexthia: My Custom LLM for Reasoning

**What I'm Building:** I'm fine-tuning Qwen 2.5 7B (7 billion parameters) to create my own reasoning model as part of the Axiom Stack.

**My Approach:** I'm using QLoRA to efficiently train on 10,000+ examples from the MATH, GSM8K, and ScienceQA datasets.

**Hardware:** Optimized for Kaggle P100/T4 GPUs (16GB VRAM)

**Training Time:** ~6-8 hours on P100, ~10-12 hours on T4

---

## Setup Checklist
1.  Enable GPU: Settings (right sidebar) â†’ Accelerator â†’ GPU T4 x2
2. Enable Internet: Settings â†’ Internet â†’ On
3.  Click "Run All" or run cells sequentially

## 1. Install Dependencies

In [None]:
!pip install transformers==4.45.2 datasets accelerate peft bitsandbytes trl==0.8.6 psutil

In [None]:
!rm -rf /kaggle/working/unsloth_compiled_cache

In [None]:
# Restart the kernel to ensure clean imports
import IPython
IPython.Application.instance().kernel.do_shutdown(True)

## 2. Verify GPU

In [None]:
!nvidia-smi

## 3. Load Qwen 2.5 7B with 4-bit Quantization

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-7B",
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B", trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

print(" Model and tokenizer loaded")

## 4. Add LoRA Adapters

In [None]:
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

# CRITICAL: Prepare model for k-bit training FIRST
model = prepare_model_for_kbit_training(model)

# Configure LoRA
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,  
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

# Apply LoRA
model = get_peft_model(model, lora_config)

# Verify setup
model.print_trainable_parameters()

# CRITICAL: Ensure model is in training mode
model.train()

print(" Model ready on", next(model.parameters()).device)

## 5. Load Dataset (MATH, GSM8K, SCIQ for Reasoning)

In [None]:
from datasets import load_dataset, concatenate_datasets
print(" Loading datasets...")
# MATH-Hard (2,304 examples)
math_dataset = load_dataset("lighteval/MATH-Hard", split="train")
math_dataset = math_dataset.rename_column("solution", "answer")
math_dataset = math_dataset.rename_column("problem", "question")
print(f"âœ“ MATH-Hard: {len(math_dataset)} examples")
# GSM8K (3,000 examples)
gsm8k_dataset = load_dataset("openai/gsm8k", "main", split="train").select(range(3000))
# Already has 'question' and 'answer' columns
print(f"âœ“ GSM8K: {len(gsm8k_dataset)} examples")
# SciQ (2,000 examples)
sciq_dataset = load_dataset("allenai/sciq", split="train").select(range(2000))
# Build question from support + question, answer is correct_answer
def format_sciq(example):
    question = f"{example['support']} {example['question']}" if example['support'] else example['question']
    return {"question": question, "answer": example['correct_answer']}
sciq_dataset = sciq_dataset.map(format_sciq)
print(f"âœ“ SciQ: {len(sciq_dataset)} examples")
# Keep only 'question' and 'answer' columns
math_dataset = math_dataset.select_columns(['question', 'answer'])
gsm8k_dataset = gsm8k_dataset.select_columns(['question', 'answer'])
sciq_dataset = sciq_dataset.select_columns(['question', 'answer'])
# Combine all datasets
combined_dataset = concatenate_datasets([math_dataset, gsm8k_dataset, sciq_dataset])
print(f"\n Total: {len(combined_dataset)} examples")

## 6. Format Dataset for Chat Template

In [None]:
## 6. Format Dataset for Chat Template

print(" Formatting datasets for training...")

alpaca_prompt = """Below is a math problem. Write a solution that appropriately solves the problem.
### Problem:
{}
### Solution:
{}"""

EOS_TOKEN = tokenizer.eos_token

def format_and_tokenize(examples):
    # Format as text
    texts = []
    for q, a in zip(examples["question"], examples["answer"]):
        texts.append(alpaca_prompt.format(q, a) + EOS_TOKEN)
    
    # Tokenize WITHOUT padding (dynamic padding later)
    tokenized = tokenizer(
        texts,
        truncation=True,
        max_length=2048,
        padding=False, 
    )
    tokenized["labels"] = tokenized["input_ids"].copy()
    return tokenized

tokenized_dataset = combined_dataset.map(
    format_and_tokenize,
    batched=True,
    remove_columns=combined_dataset.column_names,
)

print(f" Tokenized {len(tokenized_dataset)} examples")

# Check token length distribution
lengths = [len(ex) for ex in tokenized_dataset["input_ids"]]
print(f" Token length stats:")
print(f"   Min: {min(lengths)}, Max: {max(lengths)}, Avg: {sum(lengths)//len(lengths)}")

## 7. Configure Training

In [None]:
from transformers import Trainer, TrainingArguments, DataCollatorForLanguageModeling
import torch

# Training arguments - REMOVED gradient_checkpointing from args
training_args = TrainingArguments(
    output_dir="./outputs",
    num_train_epochs=3,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=8,
    learning_rate=2e-4,
    fp16=True,
    logging_steps=10,
    save_strategy="epoch",
    warmup_ratio=0.03,
    optim="adamw_8bit",
    weight_decay=0.01,
    lr_scheduler_type="linear",
    remove_unused_columns=False,
    seed=3407,
    max_grad_norm=0.3,
    report_to="none",
    # gradient_checkpointing=True,  # REMOVED - causes issues with PEFT
)

# Data collator
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,
)

# Initialize trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

print(f"\n{'='*60}")
print(f" Training Configuration:")
print(f"   â€¢ Total examples: {len(tokenized_dataset):,}")
print(f"   â€¢ Batch size per device: 1")
print(f"   â€¢ Gradient accumulation: 8")
print(f"   â€¢ Effective batch size: 8")
print(f"   â€¢ Epochs: 3")
print(f"   â€¢ Total steps: {len(tokenized_dataset) // 8 * 3:,}")
print(f"   â€¢ Learning rate: 2e-4")
print(f"   â€¢ Memory optimizations: 4-bit + QLoRA")
print(f"{'='*60}\n")



## 8. Start Training 

**Expected time:**
- P100: ~6-8 hours
- T4: ~10-12 hours



In [None]:
## 8. Start Training

import time

print("Starting training...")
print(f" Start time: {time.strftime('%Y-%m-%d %H:%M:%S')}")
print(f"  Estimated time: 8-10 hours on T4")
print("\n" + "="*60)

# Train the model
trainer_stats = trainer.train()

print("\n" + "="*60)
print(" Training complete!")
print(f" End time: {time.strftime('%Y-%m-%d %H:%M:%S')}")
print(f" Final loss: {trainer_stats.training_loss:.4f}")
print(f"  Total time: {trainer_stats.metrics['train_runtime']/3600:.2f} hours")
print("="*60)


## 9. Test the Model

In [None]:
model.eval()  
# Test problem
test_problem = """Below is a math problem. Write a solution that appropriately solves the problem.

### Problem:
If $x^2 + 2x - 15 = 0$, what are the possible values of $x$?

### Solution:
"""

inputs = tokenizer(test_problem, return_tensors="pt").to("cuda")

print(" Test Problem:")
print(test_problem)
print("\n Alexthia's Solution:")
print("="*50)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    use_cache=True,
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response[len(test_problem):])

## 10. Save the Model

In [None]:
model.save_pretrained("alexthia-v0.5")
tokenizer.save_pretrained("alexthia-v0.5")
print("âœ“ Model saved to alexthia-v0.5/")

## 11. (Optional) Save Full Merged Model

In [None]:
# Merge LoRA with base model and save in 16-bit (warning: ~14GB)
# Only run this if you need a standalone model

# model.save_pretrained_merged(
#     "alexthia-qwen-7b-merged",
#     tokenizer,
#     save_method="merged_16bit",
# )

# print(" Merged model saved!")

## 12. Evaluation & Benchmarking

In [None]:
# Load test set
test_dataset = load_dataset("lighteval/MATH", split="test").shuffle(seed=42).select(range(100))

print(" Running evaluation on 100 test problems...")
correct = 0
total = 0

for example in test_dataset:
    problem = example["problem"]
    true_solution = example["solution"]
    
    # Generate solution
    prompt = alpaca_prompt.format(problem, "")
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.1)
    generated = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    # Simple check: does it contain key numbers/terms?
    # (For proper eval, you'd need a math answer parser)
    total += 1
    
    if total % 10 == 0:
        print(f"Progress: {total}/100")

print(f"\n Evaluation complete!")
print(f" Sample size: {total} problems")
print("\n For detailed accuracy, use a proper MATH benchmark evaluator")

## 13. Export for Download

In [None]:

import shutil
import os

print(" Packaging model for download...")

# Create clean directory
model_dir = "alexthia-v0.5"
zip_name = "alexthia-v0.5"

# Zip the model
shutil.make_archive(zip_name, 'zip', model_dir)

# Get file size
zip_size = os.path.getsize(f"{zip_name}.zip") / (1024 * 1024)  # MB

print(f" Model packaged!")
print(f" File: {zip_name}.zip")
print(f" Size: {zip_size:.1f} MB")
print(f"\n To download:")
print(f"   1. Click 'Output' tab (right sidebar)")
print(f"   2. Find '{zip_name}.zip'")
print(f"   3. Click download icon")
print(f"\n Ready to deploy Alexthia v0.5!")

---

## ðŸŽ‰ 

I've successfully fine-tuned Qwen 2.5 7B on mathematical reasoning.

### Next Steps:
1. Download the model weights (LoRA adapters)
2. Test on my own game theory problems
3. Integrate with Flux/tenet verifier (Phase 2)


###  Pitch:
> "I fine-tuned Qwen 7B (7 billion parameters) on mathematical reasoning tasks using QLoRA, demonstrating my ability to work with state-of-the-art LLMs. This serves as the foundation for Alexthia, which will integrate formal verification through my  languages to reduce hallucination rates."

---

**Built by:** Fawaz  
**Project:** Alexthia (Axiom Stack)  
**Date:** December 2025