# Day 06: Output Consistency Tester

## 🤝 Objective
Evaluate the reliability of an AI model by measuring how consistent its answers are to the same prompt.

## 🌡️ Temperature & Determinism
AI models are probabilistic. 
- **Temperature = 0**: Deterministic (mostly consistent).
- **Temperature > 0.7**: Creative (less consistent).

In [None]:
import sys
import os
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), "../../")))

from src.evaluators.consistency import ConsistencyTester

### Step 1: Simulate Output Drift
We will simulate obtaining different outputs from a model for the same prompt.

In [None]:
tester = ConsistencyTester()

high_consistency_outputs = [
    "The capital of France is Paris.",
    "The capital of France is Paris.",
    "Paris is the capital of France."
]

low_consistency_outputs = [
    "The capital is Paris.",
    "I think it might be Lyon?",
    "France has many cities."
]

score_high = tester.calculate_consistency(high_consistency_outputs)
score_low = tester.calculate_consistency(low_consistency_outputs)

print(f"High Consistency Score: {score_high:.2f}")
print(f"Low Consistency Score: {score_low:.2f}")

### Step 2: Semantic Similarity
Notice that exact matching would fail for "Paris is the capital..." vs "The capital is Paris". Our Jaccard similarity handles word overlap better.