# Baseline Experiments: Single Model Evaluation

**Experiment:** Establish baseline performance for single-model strategies

**Date:** 2025-10-26

**Goals:**
- Test local MLX and Ollama models on seed tasks
- Measure baseline latency and quality
- Create reproducible eval pipeline

In [None]:
# 1. Add repository root to path to import harness
import sys
sys.path.append('../../../')  # Go up to repo root from notebooks/

# 2. Import harness core functions for LLM interaction
from harness import (
    llm_call,                    # Single LLM call wrapper
    single_model_strategy,       # Single-model strategy function
    ExperimentConfig,            # Configuration for experiment tracking
    ExperimentResult,            # Result logging structure
    get_tracker                  # Get experiment tracker instance
)
from harness.defaults import DEFAULT_MODEL, DEFAULT_PROVIDER

# 3. Import pandas for data analysis and matplotlib for visualization
import pandas as pd
import matplotlib.pyplot as plt

# 4. Show current configuration
print("="*70)
print("üîß NOTEBOOK CONFIGURATION")
print("="*70)
print(f"üìç Provider: {DEFAULT_PROVIDER}")
print(f"ü§ñ Model: {DEFAULT_MODEL or '(default for provider)'}")
print("="*70)
print("\nüí° TO CHANGE: Add a cell with:")
print("   PROVIDER = 'mlx'  # Options: 'mlx', 'ollama', 'anthropic', 'openai'")
print("   MODEL = 'your-model-name'  # Or None for default")
print("   Then use: llm_call(..., provider=PROVIDER, model=MODEL)")
print("="*70 + "\n")

## 1. Test Ollama Connection

In [None]:
# 1. Make a simple test call to verify provider works
response = llm_call(
    'What is 2+2?',              # Simple arithmetic test
    provider=PROVIDER,           # Use configured provider
    model=MODEL                  # Use configured model
)

# 2. Print the response to verify it's working
print(f'‚úÖ Provider working!')
print(f'Model used: {response.model}')
print(f'Response: {response.text}')
print(f'Latency: {response.latency_s:.2f}s')

<cell_type>markdown</cell_type>## ‚öôÔ∏è Configure Provider (Optional)

By default, this notebook uses the auto-detected provider shown above.

**To use a different provider, run this cell:**

# 1. Make a simple test call to verify provider works
response = llm_call(
    'What is 2+2?',              # Simple arithmetic test
    provider=PROVIDER,           # Use detected provider
    model=MODEL                  # Use default model
)

# 2. Print the response to verify it's working
print(f'Response: {response.text}')

# 3. Print latency to measure performance
print(f'Latency: {response.latency_s:.2f}s')
print(f'Model: {response.model}')

In [None]:
# 1. Make a simple test call to Ollama to verify connection
response = llm_call(
    'What is 2+2?',              # Simple arithmetic test
    provider='ollama'            # Use Ollama local provider
)

# 2. Print the response to verify it's working
print(f'Response: {response.text}')

# 3. Print latency to measure performance
print(f'Latency: {response.latency_s:.2f}s')