# Tool 0 - Business Request Parser Demo

**Purpose:** Parse standardized Markdown business documents into structured JSON using LangGraph.

**Acceptance Criteria:**
- ‚úÖ Load sample Markdown document
- ‚úÖ Parse via LangGraph structured output (Pydantic schema)
- ‚úÖ Display JSON under cell
- ‚úÖ Save result + prompt to `data/tool0_samples/`
- ‚úÖ Inline implementation (v1) for testing; module in `src/tool0/` exists for future reuse

**Note:** This implements MVP version - single LLM call without regex post-processing.

In [1]:
# Install required packages (run once)
# !pip install langgraph langchain langchain-openai langchain-anthropic pydantic python-dotenv

In [2]:
# Import required modules
from pydantic import BaseModel, Field, field_validator
from datetime import datetime
import json
from pathlib import Path

# Define Pydantic schemas inline
class ProjectMetadata(BaseModel):
    """Metadata about the business project request."""

    project_name: str = Field(
        description="Name of the project"
    )
    sponsor: str = Field(
        description="Name of the project sponsor"
    )
    submitted_at: str = Field(
        description="Date when the request was submitted, in ISO 8601 format (YYYY-MM-DD)"
    )
    extra: dict[str, str] = Field(
        default_factory=dict,
        description="Additional metadata fields as key-value pairs"
    )

    @field_validator('submitted_at')
    @classmethod
    def validate_iso_date(cls, v: str) -> str:
        """Validate that date is in ISO 8601 format."""
        try:
            datetime.fromisoformat(v)
            return v
        except ValueError:
            raise ValueError(f"Date must be in ISO 8601 format (YYYY-MM-DD), got: {v}")


class BusinessRequest(BaseModel):
    """Structured representation of a parsed business request document."""

    project_metadata: ProjectMetadata = Field(
        description="Project metadata including name, sponsor, and submission date"
    )
    goal: str = Field(
        default="unknown",
        description="Main goal or objective of the project"
    )
    scope_in: str = Field(
        default="unknown",
        description="What is included in the project scope"
    )
    scope_out: str = Field(
        default="unknown",
        description="What is explicitly excluded from the project scope"
    )
    entities: list[str] = Field(
        default_factory=list,
        description="Key business entities involved in the project"
    )
    metrics: list[str] = Field(
        default_factory=list,
        description="Key metrics or KPIs to be tracked"
    )
    sources: list[str] = Field(
        default_factory=list,
        description="Expected data sources for the project"
    )
    constraints: list[str] = Field(
        default_factory=list,
        description="Constraints, limitations, or special requirements"
    )
    deliverables: list[str] = Field(
        default_factory=list,
        description="Required deliverables or artifacts from the project"
    )

print("‚úÖ Schemas defined successfully")

‚úÖ Schemas defined successfully


## 1. Load Sample Business Document

We'll use the sample document in `data/sample_business_request.md`

In [3]:
# Hardcoded sample business document
business_document = """# Po≈æadavek na datov√Ω projekt

## Projekt
**N√°zev:** BA/BS Supplier Analytics Platform
**Sponzor:** Petra  Nov√°kov√°
**Datum:** 2025-10-25
**Oddƒõlen√≠:** Business Analytics
**Priorita:** Vysok√°

## C√≠l
Vytvo≈ôit komplexn√≠ analytickou platformu pro reporting a anal√Ωzu dodavatelsk√Ωch dat v BA/BS datamart≈Ø. Platforma umo≈æn√≠ business u≈æivatel≈Øm sledovat v√Ωkonnost dodavatel≈Ø, identifikovat rizikov√© partnery a optimalizovat n√°kupn√≠ procesy na z√°kladƒõ historick√Ωch dat.

## Rozsah

### In Scope
- Anal√Ωza dodavatel≈Ø (suppliers) a jejich v√Ωkonnosti
- Reporting n√°kupn√≠ch objedn√°vek (purchase orders)
- Sledov√°n√≠ kvality dod√°vek (delivery quality metrics)
- Dimenzion√°ln√≠ model pro dodavatelsk√© √∫daje
- Integrace s existuj√≠c√≠mi BA/BS datamartami

### Out of Scope
- HR data o zamƒõstnanc√≠ch
- Finanƒçn√≠ forecasting a budgetov√°n√≠
- Real-time monitoring dod√°vek
- Integrace s extern√≠ CRM syst√©my

## Kl√≠ƒçov√© entity & metriky

### Entity
- Suppliers (dodavatel√©)
- Purchase Orders (n√°kupn√≠ objedn√°vky)
- Products (produkty)
- Delivery Performance (v√Ωkonnost dod√°vek)

### Metriky
- On-time delivery rate (% vƒçasn√Ωch dod√°vek)
- Supplier reliability score (hodnocen√≠ spolehlivosti)
- Average order value (pr≈Ømƒõrn√° hodnota objedn√°vky)
- Quality defect rate (% vadn√Ωch dod√°vek)
- Lead time pr≈Ømƒõr

## Oƒçek√°van√© zdroje
- Databricks Unity Catalog (BA/BS datamart schemas)
- Collibra Data Catalog (metadata governance)
- SAP Tables (zdrojov√© transakƒçn√≠ data)
- Historical supplier performance logs

## Omezen√≠
- GDPR compliance - ≈æ√°dn√© osobn√≠ √∫daje bez souhlasu
- Data retention max 5 let
- Maxim√°ln√≠ response time pro dashboardy: 3 sekundy
- Read-only p≈ô√≠stup k produkƒçn√≠m dat√°m
- Row Level Security podle business unit

## Po≈æadovan√© artefakty
- ER diagram v Mermaid form√°tu
- Power Query M skripty pro data refresh
- Governance report (kvalita metadat, validace)
- Security report (RLS n√°vrh, klasifikace)
- Dokumentace datov√©ho modelu
"""

print(f"üìÑ Business document loaded ({len(business_document)} characters)")
print("\nFirst 300 characters:")
print("=" * 60)
print(business_document[:300])
print("...")

üìÑ Business document loaded (1919 characters)

First 300 characters:
# Po≈æadavek na datov√Ω projekt

## Projekt
**N√°zev:** BA/BS Supplier Analytics Platform
**Sponzor:** Petra  Nov√°kov√°
**Datum:** 2025-10-25
**Oddƒõlen√≠:** Business Analytics
**Priorita:** Vysok√°

## C√≠l
Vytvo≈ôit komplexn√≠ analytickou platformu pro reporting a anal√Ωzu dodavatelsk√Ωch dat v BA/BS datamart
...


## 2. Parse Document Using LangGraph

Call `parse_business_request()` which uses LangGraph with structured output.

In [4]:
# Parse the business document using LangGraph
from langchain.agents import create_agent

print("üîÑ Parsing document with LangGraph...")

# System prompt for parsing
SYSTEM_PROMPT = """You are a business requirements parser. Your task is to extract structured information from business request documents.

Documents may contain a mix of Czech and English. Common section headers include:
- "Projekt" / "Project" - project metadata (name, sponsor, date)
- "C√≠l" / "Goal" - main project objective
- "Rozsah" / "Scope" - what is in/out of scope
- "Kl√≠ƒçov√© entity & metriky" / "Key entities & metrics" - business entities and KPIs
- "Oƒçek√°van√© zdroje" / "Expected sources" - data sources
- "Omezen√≠" / "Constraints" - limitations and requirements
- "Po≈æadovan√© artefakty" / "Required artifacts" - deliverables

IMPORTANT INSTRUCTIONS:
1. Extract information into the structured format exactly as specified
2. Use "unknown" for any missing sections
3. Ensure dates are in ISO 8601 format (YYYY-MM-DD)
4. Extract lists as arrays of strings, not concatenated text
5. For project metadata, look for project name, sponsor name, and submission date
6. Any additional metadata fields should go into the "extra" dictionary
7. Be thorough - extract all relevant information from the document
"""

# Create agent with structured output
agent = create_agent(
    model="openai:gpt-5-mini",
    response_format=BusinessRequest,  # Auto-selects best strategy
    system_prompt=SYSTEM_PROMPT
)

# Prepare user message
user_message = f"""Parse the following business request document:

{business_document}

Extract all information into the structured format."""

# Invoke agent
result = agent.invoke({
    "messages": [
        {"role": "user", "content": user_message}
    ]
})

# Extract structured response
structured_response = result.get("structured_response")

if not structured_response:
    raise ValueError("No structured response returned from agent")

# Convert to dict
parsed_json = structured_response.model_dump() if hasattr(structured_response, 'model_dump') else structured_response.dict()

# Extract raw response
raw_response = str(result.get("messages", [])[-1] if result.get("messages") else structured_response)

# Full prompt for audit
prompt_used = f"System: {SYSTEM_PROMPT}\n\nUser: {user_message}"

print("‚úÖ Parsing complete!")

üîÑ Parsing document with LangGraph...
‚úÖ Parsing complete!


## 3. Display Parsed JSON

Show the structured output directly under this cell.

In [5]:
# Display parsed JSON
print("üìä Parsed Business Request:")
print("=" * 60)
print(json.dumps(parsed_json, indent=2, ensure_ascii=False))

# Also show as Pydantic model
print("\n" + "=" * 60)
print("üìã Validation:")
try:
    validated = BusinessRequest.model_validate(parsed_json)
    print(f"‚úÖ Schema valid: {validated.project_metadata.project_name}")
    print(f"   Sponsor: {validated.project_metadata.sponsor}")
    print(f"   Date: {validated.project_metadata.submitted_at}")
    print(f"   Entities: {len(validated.entities)} found")
    print(f"   Sources: {len(validated.sources)} found")
except Exception as e:
    print(f"‚ùå Validation error: {e}")

üìä Parsed Business Request:
{
  "project_metadata": {
    "project_name": "BA/BS Supplier Analytics Platform",
    "sponsor": "Petra Nov√°kov√°",
    "submitted_at": "2025-10-25",
    "extra": {
      "department": "Business Analytics",
      "priority": "Vysok√°"
    }
  },
  "goal": "Vytvo≈ôit komplexn√≠ analytickou platformu pro reporting a anal√Ωzu dodavatelsk√Ωch dat v BA/BS datamart≈Ø. Platforma umo≈æn√≠ business u≈æivatel≈Øm sledovat v√Ωkonnost dodavatel≈Ø, identifikovat rizikov√© partnery a optimalizovat n√°kupn√≠ procesy na z√°kladƒõ historick√Ωch dat.",
  "scope_in": "Anal√Ωza dodavatel≈Ø (suppliers) a jejich v√Ωkonnosti; Reporting n√°kupn√≠ch objedn√°vek (purchase orders); Sledov√°n√≠ kvality dod√°vek (delivery quality metrics); Dimenzion√°ln√≠ model pro dodavatelsk√© √∫daje; Integrace s existuj√≠c√≠mi BA/BS datamartami",
  "scope_out": "HR data o zamƒõstnanc√≠ch; Finanƒçn√≠ forecasting a budgetov√°n√≠; Real-time monitoring dod√°vek; Integrace s extern√≠ CRM syst√©my",
  "

## 4. Save Results to data/tool0_samples/

Save both JSON result and prompt for regression testing.

In [6]:
# Save results to data/tool0_samples/
timestamp = datetime.now().isoformat()
output_dir = Path.cwd().parent / 'data' / 'tool0_samples'
output_dir.mkdir(parents=True, exist_ok=True)

# Save JSON result
json_path = output_dir / f"{timestamp}.json"
with open(json_path, 'w', encoding='utf-8') as f:
    json.dump(parsed_json, f, indent=2, ensure_ascii=False)

# Save prompt
md_path = output_dir / f"{timestamp}.md"
with open(md_path, 'w', encoding='utf-8') as f:
    f.write(f"# Parse Request - {timestamp}\n\n")
    f.write(f"## Prompt Used\n\n```\n{prompt_used}\n```\n\n")
    f.write(f"## Raw Response\n\n```\n{raw_response}\n```\n\n")
    f.write(f"## Parsed JSON\n\n```json\n{json.dumps(parsed_json, indent=2, ensure_ascii=False)}\n```\n")

print(f"üíæ Results saved:")
print(f"   JSON: {json_path}")
print(f"   Markdown: {md_path}")

üíæ Results saved:
   JSON: /Users/marekminarovic/archi-agent/data/tool0_samples/2025-10-31T01:14:27.960789.json
   Markdown: /Users/marekminarovic/archi-agent/data/tool0_samples/2025-10-31T01:14:27.960789.md


## 5. Summary

‚úÖ **Acceptance Criteria Met (v1 - Inline Approach):**
- [x] Jupyter notebook with sample business document (hardcoded)
- [x] Single LLM call (no regex) converts to valid JSON
- [x] Structured output via Pydantic schema (BusinessRequest)
- [x] JSON displayed under cell
- [x] Results saved to `data/tool0_samples/` (JSON + Markdown)
- [x] Inline implementation (no external imports for v1 testing)

**Implementation Details:**
- **Schemas:** Defined inline in Cell 3 (ProjectMetadata, BusinessRequest)
- **Document:** Hardcoded in Cell 5 (no file I/O)
- **Parser:** Inline LangGraph agent creation in Cell 7
- **Model:** OpenAI GPT-5-mini with structured output
- **Output:** parsed_json, raw_response, prompt_used for audit trail

**Next Steps:**
- Run compliance checker: `python3 .claude/skills/langchain/compliance-checker/check.py --file src/tool0/parser.py`
- Update story frontmatter: `skill_created: true`, `skill_status: ready_to_execute`
- Refactor to modular structure (optional - use src/tool0/parser.py after v1 validation)