# Baseline Experiments: Single Model Evaluation

**Experiment:** Establish baseline performance for single-model strategies

**Date:** 2025-10-26

**Goals:**
- Test local MLX and Ollama models on seed tasks
- Measure baseline latency and quality
- Create reproducible eval pipeline

In [None]:
# 1. Add parent directory to path for imports
import sys
sys.path.append('../code')

# 2. Import harness core functions for LLM interaction
from harness import (
    llm_call,                    # Single LLM call wrapper
    single_model_strategy,       # Single-model strategy function
    ExperimentConfig,            # Configuration for experiment tracking
    ExperimentResult,            # Result logging structure
    get_tracker                  # Get experiment tracker instance
)

# 3. Import pandas for data analysis and matplotlib for visualization
import pandas as pd
import matplotlib.pyplot as plt

## 1. Test Ollama Connection

In [None]:
# 1. Make a simple test call to Ollama to verify connection
response = llm_call(
    'What is 2+2?',              # Simple arithmetic test
    provider='ollama'            # Use Ollama local provider
)

# 2. Print the response to verify it's working
print(f'Response: {response.text}')

# 3. Print latency to measure performance
print(f'Latency: {response.latency_s:.2f}s')