# Model Testing and Comparison

This notebook helps you test and compare different Ollama models to find the best one for your use case.

**What you'll learn:**
- How to test different models
- Performance comparison techniques  
- Model capability assessment
- Choosing the right model for your task

In [None]:
# Setup and imports
import sys
sys.path.append('..')

import time
import requests
from langchain_ollama import ChatOllama
from langchain.schema import HumanMessage
from config import config

print("🧪 Model Testing and Comparison")
print(f"Base URL: {config.ollama_base_url}")

In [None]:
# Get available models
def get_available_models():
    """Get list of installed Ollama models."""
    try:
        response = requests.get(f"{config.ollama_base_url}/api/tags")
        if response.status_code == 200:
            data = response.json()
            return [model['name'] for model in data.get('models', [])]
    except Exception as e:
        print(f"Error getting models: {e}")
    return []

available_models = get_available_models()
print(f"Found {len(available_models)} available models:")
for model in available_models:
    print(f"  - {model}")

In [None]:
# Test basic functionality of each model
test_prompt = "Explain what Python is in one sentence."

def test_model(model_name, prompt, temperature=0.7):
    """Test a specific model with a prompt."""
    try:
        llm = ChatOllama(
            model=model_name,
            base_url=config.ollama_base_url,
            temperature=temperature
        )
        
        start_time = time.time()
        response = llm.invoke([HumanMessage(content=prompt)])
        end_time = time.time()
        
        return {
            'model': model_name,
            'response': response.content,
            'response_time': end_time - start_time,
            'success': True
        }
    except Exception as e:
        return {
            'model': model_name,
            'error': str(e),
            'success': False
        }

# Test all available models
print(f"Testing all models with prompt: '{test_prompt}'")
print("=" * 60)

results = []
for model in available_models:
    print(f"Testing {model}...")
    result = test_model(model, test_prompt)
    results.append(result)
    
    if result['success']:
        print(f"✅ Success ({result['response_time']:.2f}s)")
        print(f"Response: {result['response'][:100]}...")
    else:
        print(f"❌ Failed: {result['error']}")

In [None]:
# Performance comparison
successful_results = [r for r in results if r['success']]

if successful_results:
    print("📊 Performance Summary:")
    print("=" * 50)
    
    # Sort by response time
    sorted_results = sorted(successful_results, key=lambda x: x['response_time'])
    
    for i, result in enumerate(sorted_results, 1):
        print(f"{i}. {result['model']}: {result['response_time']:.2f}s")
    
    print(f"🏆 Fastest model: {sorted_results[0]['model']}")
else:
    print("❌ No models were successfully tested")

## Model Comparison Results

Based on the tests above, you can see:
- Which models are working correctly
- Response times for each model
- Quality of responses (review the truncated outputs)

**Recommendations:**
- Use faster models for simple tasks
- Use more capable models for complex reasoning
- Consider the trade-off between speed and quality