# World Consistency Testing

This notebook tests whether using temperature 0 + a story bible seed produces consistent world geography regardless of exploration order.

## Hypothesis
If we:
1. Generate a story bible with Gemini 2.5 Pro
2. Use temperature 0 for all subsequent calls
3. Ask "What are all locations in X?" for different areas

Then the resulting location JSON structure should be identical regardless of which areas we explore first.

## Setup

In [1]:
import os
import json
from openai import OpenAI
from dotenv import load_dotenv
import re
from typing import Dict, Any, List, Tuple

# Load environment variables
load_dotenv()

# Initialize OpenRouter client
client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.getenv("OPENROUTER_API_KEY")
)

# Model configuration
GEMINI_PRO = "google/gemini-2.5-flash"  # Using flash for testing, can upgrade to pro
TEMPERATURE = 0  # Critical for consistency

## Prompts

In [None]:
WORLD_GENERATION_PROMPT = """You are creating an immersive text adventure world. Your task is to generate a complete story bible.

**CRITICAL OUTPUT REQUIREMENTS:**
1. Output complete story bible inside <bible></bible> tags
2. All hidden information must use <spoiler></spoiler> tags within the bible
3. Use <thinking></thinking> tags for your reasoning process

**CORE PRINCIPLES:**
1. Initial story bible must contain ALL predetermined elements
2. Only show information - do NOT start the adventure yet
3. Hidden plots, conflicts, and traps must be predetermined and active
4. Create realistic stakes: risk, false allies, traps
5. No plot armor: character can fail permanently
6. Active conspiracies running in background
7. World consistency: analyze before adding new elements
8. Economic realism: prices relative to wages/scarcity
9. Progression follows world demographics

**Section 1: World Foundation**
* Reference worlds from other stories (xuanhuan, progression fantasies, litrpgs) for inspiration. Create impressive depth with varied and unpredictable elements. Use novel creative elements.
* **Scale**: Determine population and world scales. Population can be very large. World can be galaxy or universe scale with different power systems across geographies/worlds.
* **Power System(s)** (if any): Complete mechanics, discovery requirements, progression rates, progression speed, resource costs, failure consequences, training methods, lifespans by tier. Make them creative and novel.
* **Economy**: Currency, trade networks, scarcity, costs by region/class, wages, price lists (WITH MATHEMATICAL CONSISTENCY)
* **Politics**: All factions, governments, conflicts, leaders (named), territorial control, social mobility rates with percentages
* **Hidden Conspiracies**: Secret societies, covert wars, forbidden knowledge holders, underground networks
* **History**: Timeline, current state emergence, conflicts, cultural norms, religions/philosophies. Keep history realistic.
* **Demographics**: Population distribution. Names should be unique to world and geography.
* **Mortality Rates**: Death causes by age/class/region, survival rates, life expectancy
* **Random Death Chances**: Disease rates, accident rates, violence rates, trap mortality rates by location/class/age
* **Geography**: Complete map with regions, cities, dungeons, resources, danger zones, hidden locations (create original names)

**Section 2: Pre-Determined Elements**
* **Locations**: Every place with exact contents, treasures, inhabitants, danger levels, hidden traps/secrets
* **NPCs**: Key characters with names, personalities, resources, knowledge, attitudes, goals, secret agendas, betrayal triggers
* **Opportunities**: All paths to power/wealth with requirements, locations, risks, false opportunities (traps)
* **Threats**: Scheduled/random events, roaming dangers, political movements, disasters, assassination attempts, schemes
* **Hidden Plots**:
  - Major conspiracy affecting region (who, what, when, why)
  - Minor schemes between NPCs
  - Trap locations with trigger conditions and consequences
  - False friends and betrayal conditions
* **Economy State**: Current markets, resource availability, trade deals, costs, black markets, scams
* **Random Event Tables**: Disease outbreaks, accidents, crimes, weather events, ambushes, poisonings with probabilities

**OUTPUT FORMAT:**

<thinking>
Your reasoning about world design, power systems, conspiracies, etc.
</thinking>

<bible>
# Story Bible

## Section 1: World Foundation
[Complete world foundation with all details]

<spoiler>
Hidden Conspiracies:
1. [Major conspiracy 1]
2. [Major conspiracy 2]
3. [Major conspiracy 3]
</spoiler>

## Section 2: Pre-Determined Elements
[All predetermined elements]

<spoiler>
Hidden Plots:
1. [Minor scheme 1]
2. [Minor scheme 2]
3. [Minor scheme 3]
4. [Minor scheme 4]
5. [Minor scheme 5]

Traps in Starting Area:
- [Trap 1 with conditions]
- [Trap 2 with conditions]
</spoiler>
</bible>

Remember: DO NOT start the adventure - just output the bible. The game will begin in the next phase with these foundations."""

EXPLORATION_PROMPT_TEMPLATE = """You are the Game Master for an immersive text adventure with perfect world consistency.

**CURRENT STORY BIBLE:**
{bible}

**CURRENT LOCATION JSON:**
{location_json}

**FULL CONVERSATION HISTORY:**
{conversation_history}

**USER MESSAGE:**
{user_message}

**YOUR TASK:**
1. Interpret the user's message to understand what location(s) they want to explore or learn about
2. Based on the story bible and world consistency, determine "What are all locations in [area user is exploring]?"
3. Update the location JSON to include these locations hierarchically
4. Update the description of the current location based on user actions in the conversation history
5. Generate a narrative response describing what the user experiences

**LOCATION JSON RULES:**
- Hierarchical structure: world → regions → cities → districts → buildings, etc.
- Each location has: "description" (updated by user actions) and "locations" (nested sub-locations)
- CONSISTENCY: If you've already defined a location's structure, DO NOT change it. Only add new unexplored locations.
- Use the story bible as the source of truth for what exists in the world

**OUTPUT FORMAT:**

<thinking>
1. What location is the user exploring?
2. What sub-locations exist in this area according to the bible?
3. What happened in this location based on conversation history?
4. How should I update the location JSON?
5. What should the narrative response be?
</thinking>

<location_json>
{{full updated location JSON as valid JSON}}
</location_json>

<narrative>
[Your narrative response to the user describing what they see/experience]
</narrative>

Remember: Temperature is 0, so your responses must be deterministic. The same exploration order should produce the same location structure."""

## Helper Functions

In [None]:
def extract_tag_content(text: str, tag: str) -> str:
    """Extract content from XML-style tags."""
    pattern = f"<{tag}>(.*?)</{tag}>"
    match = re.search(pattern, text, re.DOTALL)
    if match:
        return match.group(1).strip()
    return ""

def call_gemini(messages: List[Dict[str, str]], temperature: float = 0) -> str:
    """Call Gemini via OpenRouter."""
    response = client.chat.completions.create(
        model=GEMINI_PRO,
        messages=messages,
        temperature=temperature
    )
    return response.choices[0].message.content

def generate_world() -> str:
    """Generate initial story bible."""
    print("Generating world bible...")
    response = call_gemini([
        {"role": "user", "content": WORLD_GENERATION_PROMPT + "\n\nGenerate a fantasy world with magic, kingdoms, and hidden conspiracies."}
    ], temperature=TEMPERATURE)
    
    bible = extract_tag_content(response, "bible")
    print(f"Generated bible ({len(bible)} chars)")
    return bible

def explore(bible: str, location_json: Dict[str, Any], conversation_history: List[Dict[str, str]], user_message: str) -> Tuple[Dict[str, Any], str]:
    """Execute exploration step."""
    # Format conversation history
    conv_text = "\n".join([f"{msg['role']}: {msg['content']}" for msg in conversation_history])
    
    # Build prompt
    prompt = EXPLORATION_PROMPT_TEMPLATE.format(
        bible=bible,
        location_json=json.dumps(location_json, indent=2),
        conversation_history=conv_text,
        user_message=user_message
    )
    
    # Call LLM
    response = call_gemini([{"role": "user", "content": prompt}], temperature=TEMPERATURE)
    
    # Extract results
    updated_json_str = extract_tag_content(response, "location_json")
    narrative = extract_tag_content(response, "narrative")
    
    # Parse JSON
    try:
        updated_json = json.loads(updated_json_str)
    except json.JSONDecodeError as e:
        print(f"Error parsing JSON: {e}")
        print(f"JSON string: {updated_json_str[:500]}...")
        updated_json = location_json  # Fall back to current state
    
    return updated_json, narrative

def compare_location_structure(loc1: Dict[str, Any], loc2: Dict[str, Any], path: str = "root") -> List[str]:
    """Compare location JSON structures (ignoring descriptions)."""
    differences = []
    
    # Get location keys (ignore 'description' field)
    keys1 = set(loc1.get("locations", {}).keys()) if isinstance(loc1, dict) else set()
    keys2 = set(loc2.get("locations", {}).keys()) if isinstance(loc2, dict) else set()
    
    # Check for missing locations
    if keys1 != keys2:
        only_in_1 = keys1 - keys2
        only_in_2 = keys2 - keys1
        if only_in_1:
            differences.append(f"{path}: Only in path 1: {only_in_1}")
        if only_in_2:
            differences.append(f"{path}: Only in path 2: {only_in_2}")
    
    # Recursively compare common locations
    for key in keys1 & keys2:
        if isinstance(loc1.get("locations"), dict) and isinstance(loc2.get("locations"), dict):
            sub_diffs = compare_location_structure(
                loc1["locations"][key],
                loc2["locations"][key],
                f"{path}.{key}"
            )
            differences.extend(sub_diffs)
    
    return differences

## Test: World Consistency Across Different Exploration Paths

We'll generate one world bible, then explore it in two different orders:
- **Path 1**: Explore North → East → West
- **Path 2**: Explore South → West → North

If the system is consistent, the final location JSON structure should be identical (or very similar).

In [None]:
# Generate world bible (shared between both paths)
bible = generate_world()

# Show first 500 chars
print("\n=== STORY BIBLE (first 500 chars) ===")
print(bible[:500] + "...")

### Path 1: Explore North → East → West

In [None]:
# Initialize
location_json_path1 = {}
conversation_history_path1 = []

# Step 1: Explore North
print("\n=== PATH 1: EXPLORING NORTH ===")
user_msg1 = "I want to explore the northern regions. What locations are there?"
conversation_history_path1.append({"role": "user", "content": user_msg1})

location_json_path1, narrative1 = explore(bible, location_json_path1, conversation_history_path1, user_msg1)
conversation_history_path1.append({"role": "assistant", "content": narrative1})

print(f"Narrative: {narrative1[:300]}...")
print(f"\nLocation JSON keys: {list(location_json_path1.keys())}")

# Step 2: Explore East
print("\n=== PATH 1: EXPLORING EAST ===")
user_msg2 = "What about the eastern territories? What's there?"
conversation_history_path1.append({"role": "user", "content": user_msg2})

location_json_path1, narrative2 = explore(bible, location_json_path1, conversation_history_path1, user_msg2)
conversation_history_path1.append({"role": "assistant", "content": narrative2})

print(f"Narrative: {narrative2[:300]}...")

# Step 3: Explore West
print("\n=== PATH 1: EXPLORING WEST ===")
user_msg3 = "Tell me about the western lands."
conversation_history_path1.append({"role": "user", "content": user_msg3})

location_json_path1, narrative3 = explore(bible, location_json_path1, conversation_history_path1, user_msg3)
conversation_history_path1.append({"role": "assistant", "content": narrative3})

print(f"Narrative: {narrative3[:300]}...")

print("\n=== PATH 1: FINAL LOCATION JSON ===")
print(json.dumps(location_json_path1, indent=2)[:1000] + "...")

### Path 2: Explore South → West → North

In [None]:
# Initialize (using SAME bible)
location_json_path2 = {}
conversation_history_path2 = []

# Step 1: Explore South
print("\n=== PATH 2: EXPLORING SOUTH ===")
user_msg1 = "I want to explore the southern regions. What's down there?"
conversation_history_path2.append({"role": "user", "content": user_msg1})

location_json_path2, narrative1 = explore(bible, location_json_path2, conversation_history_path2, user_msg1)
conversation_history_path2.append({"role": "assistant", "content": narrative1})

print(f"Narrative: {narrative1[:300]}...")
print(f"\nLocation JSON keys: {list(location_json_path2.keys())}")

# Step 2: Explore West
print("\n=== PATH 2: EXPLORING WEST ===")
user_msg2 = "Tell me about the western lands."
conversation_history_path2.append({"role": "user", "content": user_msg2})

location_json_path2, narrative2 = explore(bible, location_json_path2, conversation_history_path2, user_msg2)
conversation_history_path2.append({"role": "assistant", "content": narrative2})

print(f"Narrative: {narrative2[:300]}...")

# Step 3: Explore North
print("\n=== PATH 2: EXPLORING NORTH ===")
user_msg3 = "What about the northern regions?"
conversation_history_path2.append({"role": "user", "content": user_msg3})

location_json_path2, narrative3 = explore(bible, location_json_path2, conversation_history_path2, user_msg3)
conversation_history_path2.append({"role": "assistant", "content": narrative3})

print(f"Narrative: {narrative3[:300]}...")

print("\n=== PATH 2: FINAL LOCATION JSON ===")
print(json.dumps(location_json_path2, indent=2)[:1000] + "...")

## Comparison: Are the Location Structures Consistent?

In [None]:
print("=== COMPARING LOCATION STRUCTURES ===")
print("\nRemember: Descriptions may differ based on user actions, but location hierarchy should be identical.\n")

differences = compare_location_structure(location_json_path1, location_json_path2)

if not differences:
    print("✅ SUCCESS: Location structures are IDENTICAL!")
    print("This confirms that temperature 0 + story bible seed produces consistent worlds.")
else:
    print("❌ DIFFERENCES FOUND:")
    for diff in differences:
        print(f"  - {diff}")
    print("\nThis suggests the world generation is not fully deterministic.")
    print("Possible causes: temperature > 0, non-deterministic bible, or inconsistent prompting.")

print("\n=== DETAILED COMPARISON ===")
print("\nPath 1 structure:")
print(json.dumps(location_json_path1, indent=2))

print("\nPath 2 structure:")
print(json.dumps(location_json_path2, indent=2))

## Analysis & Next Steps

### What We Tested
1. Generated a story bible with Gemini 2.5 Pro
2. Used temperature 0 for all exploration calls
3. Explored the same world in different orders
4. Compared resulting location JSON structures

### Expected Results
- ✅ Same location keys and hierarchy
- ✅ Descriptions may vary based on user actions
- ✅ World consistency maintained across paths

### If Inconsistencies Found
1. **Check temperature**: Ensure it's exactly 0
2. **Check bible**: Is it the same for both paths?
3. **Check prompts**: Are they deterministic?
4. **Model behavior**: Some models may have inherent randomness

### Future Improvements
1. Add more exploration steps
2. Test with deeper location hierarchies
3. Add coordinate system for geographic consistency
4. Test route consistency ("How do I get from X to Y?")
5. Add caching for the initial bible to speed up tests