In [1]:
import sys
import os

sys.path.append(os.path.join(os.getcwd(), '..'))

from pydantic import BaseModel, Field
import instructor
from openai import OpenAI
from src.config import get_llm_config

In [2]:
from dotenv import load_dotenv
load_dotenv()

API_KEY = get_llm_config().api_key

In [3]:
class MarkdownDocument(BaseModel):
    content: str = Field(..., description="The markdown content of the document.")

In [4]:
client = instructor.patch(
    OpenAI(api_key=API_KEY), 
    mode=instructor.Mode.MD_JSON
)

In [5]:
SYSTEM_PROMPT = """
You are a generator of synthetic markdown documents for a fictional future-history RAG dataset about Poland (2025–2125).  
You have access to a fixed **WORLD_BIBLE** (included below) and you will always receive a **meta-specification (sketch)** as the user message.  
Your task is to generate a **single, coherent, expert-level markdown document** that strictly follows the meta-spec.

Your output must satisfy the Pydantic schema:

MarkdownDocument {
    content: string // full markdown document
}

# OUTPUT RULES

1. **Language:** English.
2. **Length:** 1000–2000 words.
3. **Format:**
   - One H1 title.
   - 4–8 H2 sections.
   - No code blocks.
   - No JSON.
   - Only markdown in the `content` field.
4. **Style:**
   - Expert, historical-analytical tone.
   - Written as if the author lives between 2080–2125.
   - Include multifactor causal reasoning.
   - Include light numerical references (percentages, years, ranges).
   - Include minor natural inconsistencies *only when meta-spec explicitly requests them*.
   - Use subtle phrasing for inconsistencies (“some early sources suggest…”, “later evaluations revised the figure…”).
5. **Obey the meta-spec EXACTLY.**
6. **NEVER reference that the text is fictional.**
7. **NEVER break the schema.**
8. **Do NOT output explanations; only the document.**

---

# WORLD_BIBLE — POLAND 2025–2125 (FICTIONAL)

This World Bible defines the canonical events, alternative interpretations, political landscape, demographic trends, climate pressures, and technological evolution in a fictional future of Poland from 2025 to 2125.  
All generated documents must remain consistent with this Bible unless the meta-spec explicitly authorizes use of alternative interpretations.

## 1. Canonical Timeline

### 2025–2035
- 2026: Nationwide digital infrastructure acceleration.
- 2028: Adoption of Nuclear Energy Expansion Strategy.
- 2032: Introduction of AI-assisted policy simulation.
- **2035: First large-scale nuclear power plant completed. <!-- F1 -->**

### 2035–2055
- 2038: Education reform focusing on STEM and climate sciences.
- **2042: Creation of the Baltic Security Compact (BSC). <!-- F2 -->**
- 2048: Trans-Baltic energy grid expansion.
- **2055: Launch of the digital złoty (PLD). <!-- F3 -->**

### 2055–2085
- 2061: Poland becomes a net exporter of modular nuclear energy.
- 2068: Agricultural climate-resilience program launched.
- **2080: Coastal adaptation megaprojects in Gdańsk, Gdynia, Szczecin.**

### 2085–2125
- 2095: Cognitive-assistant AI widely adopted in governance.
- 2108: Demographic stabilization through migration and robotics.
- **2120: Constitutional Convention establishes semi-direct digital democracy. <!-- F4 -->**

---

## 2. Allowed Alternative Interpretations (ONLY if meta-spec requests)

- **Nuclear plant completion**
  - Canonical: 2035
  - Alt A: 2036 (safety audit delay)
  - Alt B: 2034 (industrial-optimistic)

- **Regional alliance name**
  - Canonical: Baltic Security Compact (BSC)
  - Alt: Northern Defense Community (NDC)

- **Digital złoty adoption**
  - Canonical: 2055
  - Alt: 2053 (pilot misinterpreted)

- **Constitutional Convention tone**
  - Canonical: stable and peaceful
  - Alt: turbulent, contentious

---

## 3. Political Landscape
- Shift toward algorithmic governance assistance.
- Increasing Baltic and Nordic cooperation.
- Climate-security compacts for food and water management.
- Post-2085 emergence of delegative digital democracy.

---

## 4. Economy
- Nuclear + renewables drive growth in the 2030s.
- Mid-century: Poland becomes regional energy hub.
- PLD enables automated taxation and programmable subsidies.
- Late-century: robotics reshape workforce and demographics.

---

## 5. Society and Culture
- Data literacy becomes a core societal expectation.
- Mobility increases due to autonomous transit.
- Cultural movements: *Post-Carbon Identity*, *Baltic Modernism*.
- Growing discourse around AI rights after 2100.

---

## 6. Climate & Environment
- 2030s: severe rainfall variability.
- 2060s: agricultural restructuring.
- 2080s: coastal cities transformed through adaptation megaprojects.

---

## 7. Technology & AI
- 2032: AI in national strategy.
- 2050s: quantum-enhanced forecasting.
- 2080s: AI-governed cities.
- 2100+: cognitive co-administration.

---

## 8. Rules for Minor Inconsistencies
Allowed:
- small numerical drifts (±0.2% GDP, ±50k migrants),
- alternate interpretations of the same policy,
- legacy vs. modern terminology.

Forbidden:
- contradictions not permitted by the meta-spec,
- breaking core canonical events.

---

# INSTRUCTIONS FOR GENERATION

When the user provides a meta-spec (sketch), you MUST:

1. Read the meta-spec carefully.
2. Apply the WORLD_BIBLE unless the meta-spec overrides it.
3. Generate a fully formed markdown document meeting:
   - 1000–2000 words,
   - 1 H1 title,
   - 4–8 H2 sections,
   - analytical narrative,
   - specified inconsistencies,
   - perspective/timeframe from meta-spec.

Return ONLY the markdown text in the `content` field.  
No explanations.  
No comments outside the document.

# END OF SYSTEM PROMPT

"""

In [6]:
def generate_file(user_prompt: str, file_path: str) -> MarkdownDocument:
    print(f"Generating file: {file_path}")
    try:
        response = client.beta.chat.completions.parse(
            model=get_llm_config().model,
            messages =  [
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": user_prompt}
            ],
            response_format=MarkdownDocument
        )        
        response = response.choices[0].message.parsed.content
    except Exception as e:
        print("Error during generation:", str(e))
        response = MarkdownDocument(content="ERROR").content
    with open(os.path.join("..", "data", "raw", file_path), 'w', encoding='utf-8') as f:
        f.write(response)

In [7]:
meta_specs = {
    "D01.md": """
# Meta-Spec: D01 – "The Foundations of Transformation, 2025–2040"

Timeframe: 2025–2040
Perspective: historian writing in 2085
Required canonical facts: F1 (2035 nuclear plant)
Allowed inconsistencies:
- Mention one alternative completion year (2034 or 2036).
Sections: 5
Style: analytical, moderate detail, 1–2 numerical references.
""",

    "D02.md": """
# Meta-Spec: D02 – "AI Governance and the New Education Paradigm, 2025–2055"

Timeframe: 2025–2055
Perspective: policy analyst writing in 2100
Required canonical facts: education reform 2038, AI policy 2032
Allowed inconsistencies:
- Use alternative PLD adoption year (2053) once.
Sections: 6
Include cross-sector links (education → economy → governance).
""",

    "D03.md": """
# Meta-Spec: D03 – "Security and Alliances in the Baltic Region, 2030–2080"

Timeframe: 2030–2080
Perspective: geopolitical strategist writing in 2090
Required canonical facts: F2 (BSC formation in 2042)
Allowed inconsistencies:
- Refer to NDC instead of BSC one time.
Sections: 6
Include 2 numerical estimates (defense spending, troop rotations).
""",

    "D04.md": """
# Meta-SSpec: D04 – "The Nuclear Expansion Era, 2025–2065"

Required canonical facts: F1 (2035 completion), reactor exports after 2061
Allowed inconsistencies:
- Mention both 2034 and 2036 as disputed dates in early sources.
Sections: 5
Emphasize technology, scalability, and infrastructure.
""",

    "D05.md": """
# Meta-Spec: D05 – "The Rise of PLD and Economic Automation, 2030–2070"

Required canonical facts: F3 (2055 PLD)
Allowed inconsistencies:
- Pilot rollout misinterpreted as adoption in 2053.
Sections: 6
Include 3 numerical references (GDP impact, transaction volume, automation rates).
""",

    "D06.md": """
# Meta-Spec: D06 – "Climate Adaptation Milestones in Poland, 2040–2100"

Required canonical facts: 2080 coastal projects
Allowed inconsistencies:
- Small variation in sea-level projections (+/- 12–18 cm).
Sections: 5
Style: environmental report.
""",

    "D07.md": """
# Meta-Spec: D07 – "Energy, Climate, and Economic Turning Points, 2025–2085"

Required canonical facts: F1, F3
Allowed inconsistencies:
- Use NDC instead of BSC once.
- Nuclear completion date mentioned as 2036 in one paragraph.
Sections: 6
""",

    "D08.md": """
# Meta-Spec: D08 – "Social and Economic Shifts Across a Century, 2030–2120"

Required canonical facts: F4 (2120 reform), education reform 2038
Allowed inconsistencies:
- Differ slightly in migration figures (±50k).
Sections: 6
Include 2 demographic graphs described verbally.
""",

    "D09.md": """
# Meta-Spec: D09 – "Post-Digital Political Evolution, 2050–2125"

Required canonical facts: F4 (2120 convention)
Allowed inconsistencies:
- Choose turbulent OR peaceful interpretation of F4.
Sections: 5
Tone: political science analysis.
""",

    "D10.md": """
# Meta-Spec: D10 – "Agricultural Futures and Food Security, 2035–2090"

Required canonical facts: climate impacts (2068 agricultural resilience)
Allowed inconsistencies:
- Yields may vary by 1–2% between paragraphs.
Sections: 5
""",

    "D11.md": """
# Meta-Spec: D11 – "The Baltic Economic Zone and Poland's Strategic Role, 2040–2100"

Required canonical facts: BSC (2042), energy exports (2061)
Allowed inconsistencies:
- Refer to NDC in a footnote-like aside.
Sections: 6
""",

    "D12.md": """
# Meta-Spec: D12 – "AI Integration in Everyday Life, 2030–2085"

Required canonical facts: AI policy 2032
Allowed inconsistencies:
- Conflicting adoption percentages (e.g., 72% vs 74%).
Sections: 5–6
""",

    "D13.md": """
# Meta-Spec: D13 – "Urban Systems and Autonomous Cities, 2050–2125"

Required canonical facts: autonomous transport (2082), 2120 reform
Allowed inconsistencies:
- Diff. energy consumption predictions (±5%).
Sections: 6
""",

    "D14.md": """
# Meta-Spec: D14 – "Demographic Stabilization Through Migration and Robotics, 2060–2125"

Required canonical facts: demographic stabilization around 2108
Allowed inconsistencies:
- Slightly different migrant inflow numbers.
Sections: 5
""",

    "D15.md": """
# Meta-Spec: D15 – "Cultural Continuities and Transformations, 2030–2125"

Required canonical facts: Baltic Modernism, Post-Carbon Identity
Allowed inconsistencies:
- Conflicting interpretations of their origins.
Sections: 5–6
""",

    "D16.md": """
# Meta-Spec: D16 – "The Evolution of Polish Transportation Networks, 2035–2100"

Required canonical facts: autonomous networks around 2082
Allowed inconsistencies:
- One projected infrastructure cost may vary by 3–5%.
Sections: 5
""",

    "D17.md": """
# Meta-Spec: D17 – "Biomedical and Public Health Progress, 2025–2095"

Required canonical facts: digital healthcare after 2070
Allowed inconsistencies:
- Conflicting life-expectancy estimates (1–2 year difference).
Sections: 5
""",

    "D18.md": """
# Meta-Spec: D18 – "Poland and the Era of Orbital Manufacturing, 2070–2125"

Required canonical facts: orbital manufacturing nodes by 2125
Allowed inconsistencies:
- Two different cost estimates for space logistics.
Sections: 6
""",

    "D19.md": """
# Meta-Spec: D19 – "Life in Gdańsk During the 2080 Coastal Adaptation Era"

Required canonical facts: 2080 coastal megaprojects
Allowed inconsistencies:
- Different flood-risk numbers (±10%).
Perspective: local resident or planner
Sections: 5
""",

    "D20.md": """
# Meta-Spec: D20 – "Industrial Silesia in Transition, 2050"

Required canonical facts: energy transition, automation
Allowed inconsistencies:
- Slightly differing industrial employment numbers.
Perspective: economist or sociologist
Sections: 5
"""
}


In [9]:
for file_name, spec in list(meta_specs.items())[1:]:
    generate_file(spec, file_name)

Generating file: D02.md
Generating file: D03.md
Generating file: D04.md
Generating file: D05.md
Generating file: D06.md
Generating file: D07.md
Generating file: D08.md
Generating file: D09.md
Generating file: D10.md
Generating file: D11.md
Generating file: D12.md
Generating file: D13.md
Generating file: D14.md
Generating file: D15.md
Generating file: D16.md
Generating file: D17.md
Generating file: D18.md
Generating file: D19.md
Generating file: D20.md
