# TinyLLM Demo

> **What if each neuron in a neural network was already intelligent?**

This notebook demonstrates TinyLLM - a system that treats small LLMs as intelligent neurons in a larger cognitive architecture.

**Requirements:**
- GPU runtime (T4 or better)
- ~6GB VRAM for qwen2.5:3b

**Note:** This notebook runs entirely in Colab - no local setup required!

## 1. Setup Environment

In [None]:
# Install Ollama
!curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama server in background
import subprocess
import time

subprocess.Popen(['ollama', 'serve'], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
time.sleep(5)
print("Ollama server started!")

In [None]:
# Pull the models we need
!ollama pull qwen2.5:0.5b  # Router (tiny, fast)
!ollama pull qwen2.5:3b     # Specialist

In [None]:
# Clone and install TinyLLM
!git clone https://github.com/ndjstn/tinyllm.git
%cd tinyllm
!pip install -e . -q

## 2. Verify Installation

In [None]:
# Check Ollama is running
!curl -s http://localhost:11434/api/tags | python -c "import sys,json; models=json.load(sys.stdin)['models']; print('Available models:', [m['name'] for m in models])"

In [None]:
# Test basic import
from tinyllm.core.builder import load_graph
from tinyllm.core.executor import Executor
from tinyllm.core.message import TaskPayload

print("TinyLLM imported successfully!")

## 3. Run Queries

In [None]:
import asyncio
from pathlib import Path

# Load the graph
graph = load_graph(Path("graphs/multi_domain.yaml"))
executor = Executor(graph)

async def query(text):
    """Run a query through TinyLLM."""
    task = TaskPayload(content=text)
    response = await executor.execute(task)
    return response

# Test query
response = await query("What is 15 + 27?")
print(f"Success: {response.success}")
print(f"Response: {response.content[:500]}..." if len(response.content) > 500 else f"Response: {response.content}")

In [None]:
# Try different query types
queries = [
    "Write a Python function to check if a number is prime",
    "What causes earthquakes?",
    "Calculate 15% of 240, then add 50",
]

for q in queries:
    print(f"\n{'='*60}")
    print(f"Query: {q}")
    print(f"{'='*60}")
    response = await query(q)
    print(f"Response ({len(response.content)} chars):")
    print(response.content[:800])

## 4. Run Benchmarks

In [None]:
# Run the benchmark suite
!python benchmarks/run_benchmarks.py

In [None]:
# Run adversarial tests to find weaknesses
!python benchmarks/adversarial_test.py

## 5. Visualize Results

In [None]:
# Generate visualizations
!python benchmarks/create_visuals.py

# Display the dashboard
from IPython.display import Image
Image(filename='benchmarks/results/performance_dashboard.png', width=800)

In [None]:
# Show adversarial test results
import json

with open('benchmarks/results/adversarial_test.json') as f:
    data = json.load(f)

print("Adversarial Test Summary")
print("="*40)
print(f"Overall Pass Rate: {data['summary']['pass_rate']:.1f}%")
print("\nBy Category:")
for cat, stats in data['summary']['by_category'].items():
    print(f"  {cat}: {stats['passed']}/{stats['total']} ({stats['rate']:.0f}%)")

## 6. Interactive Chat

In [None]:
# Simple interactive chat
def chat():
    """Interactive chat with TinyLLM."""
    print("TinyLLM Chat (type 'quit' to exit)")
    print("="*40)

    while True:
        try:
            user_input = input("You: ")
            if user_input.lower() in ['quit', 'exit', 'q']:
                print("Goodbye!")
                break

            response = asyncio.get_event_loop().run_until_complete(query(user_input))
            print(f"TinyLLM: {response.content}\n")
        except KeyboardInterrupt:
            print("\nGoodbye!")
            break

# Uncomment to start chat:
# chat()

## Resources

- **GitHub**: https://github.com/ndjstn/tinyllm
- **500 Task Roadmap**: [docs/TASK_ROADMAP.md](https://github.com/ndjstn/tinyllm/blob/master/docs/TASK_ROADMAP.md)
- **Benchmarks**: [benchmarks/README.md](https://github.com/ndjstn/tinyllm/blob/master/benchmarks/README.md)

### Key Findings

| Test | Pass Rate | Notes |
|------|-----------|-------|
| Standard | 100% | Basic queries work |
| Stress | 100% | Scales to extreme difficulty |
| Adversarial | ~52% | False premises, hallucinations |

### Next Steps

1. Add chain-of-thought reasoning to improve adversarial performance
2. Implement self-morphing architecture
3. Add solution memory for learning