# üß† LMFast: Advanced Reasoning Agents

**Unlock "System 2" thinking in Small Language Models!**

## What You'll Learn
- Chain-of-Thought (CoT) prompting
- Test-Time Compute Scaling (Best-of-N)
- Self-Verification loops 
- Solver math problems with 135M parameters

## The Concept
By giving the model more time to "think" (generate multiple reasoning paths) and verify its own answers, we can significantly boost performance on logic and math tasks without retraining.

**Time to complete:** ~15 minutes

## 1Ô∏è‚É£ Setup

In [None]:
!pip install -q lmfast[all]

import lmfast
lmfast.setup_colab_env()

import torch
print(f"GPU: {torch.cuda.get_device_name(0)}")

## 2Ô∏è‚É£ Load a Reasoner Model

For reasoning, instructional tuning is key. We'll use SmolLM-135M-Instruct.

In [None]:
from lmfast.inference import SLMServer

model = SLMServer("HuggingFaceTB/SmolLM-360M-Instruct")

# Wrapper for simple string-in/string-out
def generate_fn(prompt):
    return model.generate(prompt, max_new_tokens=256, temperature=0.7)

## 3Ô∏è‚É£ Chain-of-Thought (CoT)

Standard prompting vs. CoT.

In [None]:
problem = "If I have 3 apples, eat 1, and buy 5 more, how many do I have?"

print("üî¥ Standard Generation:")
print(generate_fn(f"Question: {problem}\nAnswer:"))

print("\nüü¢ Chain of Thought:")
prompt_cot = f"Question: {problem}\nLet's think step by step.\nAnswer:"
print(generate_fn(prompt_cot))

## 4Ô∏è‚É£ Using ThinkingAgent

LMFast automates improved reasoning strategies.

In [None]:
from lmfast.reasoning import ThinkingAgent

# Create agent
agent = ThinkingAgent(generate_fn, n=5)  # n=5 candidates

hard_problem = """
A train leaves New York at 60 mph. Another leaves Boston at 50 mph. 
The distance is 220 miles. When they meet, how far is the NY train from New York?
"""

print("üß† Best-of-N Reasoning (Sampling 5 paths)...")
answer = agent.reason(hard_problem, method="best_of_n")
print(f"\nAnswer: {answer}")

## 5Ô∏è‚É£ Self-Verification

The agent generates an answer, then checks its work.

In [None]:
print("üîç Self-Verification Mode...")
# Note: 135M models struggle with verification but 1B+ excel at it

verified_answer = agent.reason(hard_problem, method="self_verify")
print(f"\nVerified Answer: {verified_answer}")

## 6Ô∏è‚É£ One-Line Reasoning API

You don't need to instantiate classes if you want a quick fix.

In [None]:
from lmfast import reason

quick_ans = reason(
    model_fn=generate_fn,
    problem="What is 15% of 80?",
    method="adaptive"  # Automatically chooses strategy based on difficulty
)

print(f"Correct Answer (12): {quick_ans}")

## 7Ô∏è‚É£ Benchmarking Scale

Let's see if adding computing (N) improves accuracy.

In [None]:
import re

math_questions = [
    ("23 + 45", "68"),
    ("12 * 8", "96"),
    ("100 / 4", "25")
]

def evaluate(n_samples):
    correct = 0
    print(f"\nTesting with N={n_samples}...")
    for q, a in math_questions:
        resp = reason(generate_fn, q, n=n_samples, method="best_of_n")
        if a in resp:
            correct += 1
    return correct / len(math_questions)

print(f"Accuracy (N=1): {evaluate(1):.1%}")
print(f"Accuracy (N=5): {evaluate(5):.1%}")
print("Notice the improvement! (Results may vary with randomness)")

## üéâ Summary

You've learned how to:
- ‚úÖ Implement "System 2" thinking with SLMs
- ‚úÖ Use `ThinkingAgent` for difficult tasks
- ‚úÖ Improve accuracy by scaling test-time compute

### Tip
- For best results, use models trained on math/code (e.g. Qwen2.5-Math).

### Next Steps
- `12_rag_agents.ipynb`: Combine reasoning with external knowledge.