# Notebook 2 — Output Contract (JSON) + Validation + Repair (Prod yaklaşımı)

Bu notebook, LLM çıktısını üretime yakın hale getirir:
- JSON-only contract
- Parse
- Schema doğrulama (Pydantic)
- Hatalı çıktıda repair + retry



## Setup

Configuration with azure openai api key

In [None]:
# Requirements
%pip -q install -U langchain-core langchain-openai langchain-google-genai tiktoken python-dotenv matplotlib pandas==2.2.2 pydantic==2.12.3

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m52.8/52.8 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.8/84.8 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m66.5/66.5 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.7/8.7 MB[0m [31m51.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import os, json, re
from typing import Dict, Any
from langchain_openai import AzureChatOpenAI

# --- Azure OpenAI Configuration ---
AZURE_ENDPOINT = "https://sd-rg.cognitiveservices.azure.com/"
AZURE_API_KEY = ""  # api key
AZURE_DEPLOYMENT = "vodafone_rag_module"
API_VERSION = "2024-12-01-preview"

def get_llm():
    """
    Initializes the Azure OpenAI client using the provided configuration.

    Returns:
        tuple: (llm_instance, provider_name_string)

    Raises:
        RuntimeError: If API credentials are missing or connection fails.
    """
    # Check if essential credentials are present
    if AZURE_API_KEY and AZURE_ENDPOINT:
        try:
            llm = AzureChatOpenAI(
                azure_deployment=AZURE_DEPLOYMENT,
                api_version=API_VERSION,
                azure_endpoint=AZURE_ENDPOINT,
                api_key=AZURE_API_KEY,
                temperature=0.1,
                max_retries=2
            )
            return llm, f'Azure OpenAI ({AZURE_DEPLOYMENT})'
        except Exception as e:
            raise RuntimeError(f"Azure connection error: {e}")

    raise RuntimeError('Azure API credentials are missing!')

# --- Initialization ---
try:
    llm, provider = get_llm()
    print('✅ LLM ready:', provider)

    # Uncomment the line below to test the connection immediately
    # print("Test Response:", llm.invoke("Hello, are you active?").content)

except Exception as e:
    # If initialization fails, set llm to None to prevent subsequent NameErrors
    llm = None
    print(f"❌ Error occurred: {e}")

def llm_text(prompt: str) -> str:
    """
    Sends a prompt to the LLM and retrieves the text response.

    Args:
        prompt (str): The input string to send to the model.

    Returns:
        str: The clean text response from the model.
    """
    if llm is None:
        return "Error: LLM is not initialized."

    resp = llm.invoke(prompt)
    # Safely retrieve content whether it's an object or string
    return getattr(resp, 'content', str(resp)).strip()

def strip_fences(s: str) -> str:
    """
    Removes Markdown code fences (e.g., ```json ... ```) from a string.

    Args:
        s (str): The input string containing code fences.

    Returns:
        str: Cleaned string without the fences.
    """
    s = s.strip()
    # Remove starting ```json or ``` (case insensitive)
    s = re.sub(r'^```(json)?\s*', '', s, flags=re.IGNORECASE)
    # Remove ending ```
    s = re.sub(r'\s*```$', '', s)
    return s.strip()

✅ LLM ready: Azure OpenAI (vodafone_rag_module)


In [None]:
# Mini veri seti (email/ticket) — demo için
EMAILS = [
    {'id': 'E1', 'text': 'Kargom hâlâ gelmedi. 7 gündür bekliyorum. Acil çözüm istiyorum!', 'notes': 'Gecikme + yüksek aciliyet'},
    {'id': 'E2', 'text': 'Ürün kırık geldi. Değişim yapabilir miyiz?', 'notes': 'Hasarlı ürün'},
    {'id': 'E3', 'text': 'İade sürecini nasıl başlatabilirim? Kutuyu attım ama ürün duruyor.', 'notes': 'İade + edge-case (kutusuz)'},
    {'id': 'E4', 'text': 'Kartımdan iki kez çekim yapılmış görünüyor. Lütfen hemen kontrol edin.', 'notes': 'Faturalama + yüksek aciliyet'},
    {'id': 'E5', 'text': 'Ürününüzün kullanım kılavuzunu paylaşır mısınız?', 'notes': 'Bilgi talebi (low)'},
]
len(EMAILS)

5

## 1) JSON Schema (Pydantic)

In [None]:
from pydantic import BaseModel, Field, ValidationError
from typing import Literal

class TriageOut(BaseModel):
    category: str = Field(min_length=1)
    urgency: Literal['low','medium','high']
    reason: str = Field(min_length=1, max_length=240)

## 2) Contract tanımı

In [None]:
SCHEMA = (
    'Return ONLY valid JSON with exactly these keys:\n'
    '{\n'
    '  "category": "string",\n'
    '  "urgency": "low|medium|high",\n'
    '  "reason": "string (max 1 sentence)"\n'
    '}\n'
    'No extra text, no markdown, JSON only.'
)

## 3) Contract yoksa ne olur?

Genelde açıklama/metin döner; parse zorlaşır.

In [None]:
def prompt_no_contract(email_text: str) -> str:
    return (
        'Classify this email: category and urgency.\n'
        'Explain briefly.\n\n'
        'Email:\n' + email_text
    )

raw = llm_text(prompt_no_contract(EMAILS[0]['text']))
print(raw)

Category: Customer Service / Delivery Issue  
Urgency: High  

Explanation: The email expresses frustration about a shipment that has not arrived after 7 days and requests an urgent resolution, indicating a high level of urgency.


## 4) Contract ile tekrar

In [None]:
def prompt_with_contract(email_text: str) -> str:
    return (
        'You are a customer support triage assistant.\n'
        'Classify the email into a category and urgency.\n\n'
        + SCHEMA + '\n\n'
        + 'Email:\n' + email_text
    )

raw = llm_text(prompt_with_contract(EMAILS[0]['text']))
print(raw)

{
  "category": "Shipping Delay",
  "urgency": "high",
  "reason": "Customer has been waiting 7 days for their shipment and requests urgent resolution."
}


## 5) Parse + Validate

In [None]:
import json
from typing import Dict, Any

def parse_json(raw: str) -> Dict[str, Any]:
    return json.loads(strip_fences(raw))

def validate(obj: Dict[str, Any]) -> TriageOut:
    return TriageOut.model_validate(obj)

raw = llm_text(prompt_with_contract(EMAILS[0]['text']))
obj = parse_json(raw)
validated = validate(obj)
validated

TriageOut(category='Shipping Delay', urgency='high', reason='Customer has been waiting 7 days for their shipment and requests urgent resolution.')

## 6) Repair / Retry

Üretimde çok kullanılan yaklaşım:
- İlk deneme
- Parse/validate başarısızsa: **çıktıyı** tekrar modele verip sadece JSON formatına zorlamak
- 1–2 retry ile stabil hale getirmek

In [None]:
REPAIR_PROMPT = (
    'You are a strict JSON formatter.\n'
    'Fix the following output to match this JSON schema exactly and return JSON only.\n\n'
    'Schema:\n{\n  "category": "string",\n  "urgency": "low|medium|high",\n  "reason": "string (max 1 sentence)"\n}\n\n'
    'Bad output:\n{bad_output}'
)

def triage_with_retry(email_text: str, max_retries: int = 1) -> TriageOut:
    last_raw = llm_text(prompt_with_contract(email_text))
    for attempt in range(max_retries + 1):
        try:
            obj = parse_json(last_raw)
            return validate(obj)
        except Exception as e:
            if attempt >= max_retries:
                print('❌ Failed. Last raw:\n', last_raw)
                raise
            last_raw = llm_text(REPAIR_PROMPT.format(bad_output=last_raw))

out = triage_with_retry(EMAILS[2]['text'], max_retries=1)
out

TriageOut(category='Return/Refund', urgency='medium', reason='Customer wants to start a return process but the product is still showing as undelivered.')

## 7) Batch: kaç tanesi retry ile toparlandı?

In [None]:
import pandas as pd

rows = []
for e in EMAILS:
    try:
        out = triage_with_retry(e['text'], max_retries=1)
        rows.append({'id': e['id'], 'ok': True, 'category': out.category, 'urgency': out.urgency})
    except Exception:
        rows.append({'id': e['id'], 'ok': False, 'category': None, 'urgency': None})

pd.DataFrame(rows)

Unnamed: 0,id,ok,category,urgency
0,E1,True,Shipping Delay,high
1,E2,True,Product Issue,high
2,E3,True,Return/Refund,medium
3,E4,True,Billing Issue,high
4,E5,True,Product Information,low


## 8) Egzersiz (3–5 dk)

1) `reason` tek cümle mi? Regex ile kontrol ekleyin.
2) `category` için izinli liste tanımlayıp standardize edin.

➡️ Sonraki notebook: token/context maliyeti + CoT kalite/maliyet tradeoff’u.