# **Court Judgment Summarizer — Legal Document to Structured Excel Report**

This notebook demonstrates how to build an end-to-end pipeline for extracting structured summaries from court judgments using Sarvam AI's suite of models.

### **Use Case**
Automate legal document analysis for law firms, legal-tech platforms, and judicial research.
1. **Extract:** Use **Sarvam Vision Document Intelligence** to OCR the judgment PDF or image.
2. **Parse:** Use **Sarvam-M** to convert raw OCR text into clean, structured JSON (case details, parties, judges, acts cited, key issues, decision, reasoning).
3. **Export:** Use **openpyxl** to write the summary into a formatted two-sheet Excel report.

### **Supported Formats**
- Images: `.jpg`, `.jpeg`, `.png`
- Documents: `.pdf`
- Languages: Hindi, Kannada, Tamil, Telugu, Bengali, Gujarati, Marathi, and more + English

> ⚠️ **Disclaimer:** This notebook is for **demo and educational purposes only**. Do **not** use it as a substitute for legal advice or professional legal analysis. Always consult a qualified legal professional.

In [None]:
# Pinning versions for reproducibility
!pip install -Uqq sarvamai==0.1.24 openpyxl python-dotenv Pillow

### **1. Setup & API Key**

Obtain your API key from the [Sarvam AI Dashboard](https://dashboard.sarvam.ai).
Create a `.env` file in this directory with `SARVAM_API_KEY=your_key_here`, or set the environment variable directly.

In [None]:
from __future__ import annotations

import os
import json
import re
import zipfile
import tempfile
from pathlib import Path

from dotenv import load_dotenv
from sarvamai import SarvamAI

load_dotenv()

SARVAM_API_KEY = os.environ.get("SARVAM_API_KEY", "YOUR_SARVAM_API_KEY")
client = SarvamAI(api_subscription_key=SARVAM_API_KEY)

print("Client initialised.")

### **2. Step 1 — EXTRACT: Document Intelligence Helper**

`extract_judgment_text` sends the judgment file to Sarvam Vision Document Intelligence and returns the extracted text as a Markdown string.

The API uses an async job workflow: create → upload → start → wait → download (ZIP) → unzip.

> **Note:** The API accepts `.pdf` or `.zip` only. PNG/JPG images are automatically wrapped in a ZIP before upload.

In [None]:
_IMAGE_EXTENSIONS = {'.jpg', '.jpeg', '.png'}


def extract_judgment_text(file_path: str) -> str:
    """Extract text from a judgment image or PDF using Sarvam Document Intelligence.

    Images (.jpg, .png) are automatically wrapped in a ZIP archive before upload,
    as the API only accepts PDF or ZIP files directly.
    """
    path = Path(file_path)
    upload_path = file_path
    tmp_zip: str | None = None

    if path.suffix.lower() in _IMAGE_EXTENSIONS:
        # Wrap image in a flat ZIP — required by the Document Intelligence API
        with tempfile.NamedTemporaryFile(suffix='.zip', delete=False) as tmp:
            tmp_zip = tmp.name
        with zipfile.ZipFile(tmp_zip, 'w', zipfile.ZIP_DEFLATED) as zf:
            zf.write(file_path, arcname=path.name)
        upload_path = tmp_zip

    try:
        job = client.document_intelligence.create_job(
            language="en-IN",
            output_format="md"
        )
        job.upload_file(upload_path)
        job.start()

        status = job.wait_until_complete()
        if status.job_state != "Completed":
            raise RuntimeError(
                f"Document Intelligence job ended with state: {status.job_state}. "
                f"Details: {status}"
            )

        # Download ZIP output to a temp file, then extract markdown in-memory
        with tempfile.NamedTemporaryFile(suffix='.zip', delete=False) as tmp:
            out_zip = tmp.name

        try:
            job.download_output(out_zip)
            with zipfile.ZipFile(out_zip, 'r') as zf:
                md_files = [f for f in zf.namelist() if f.endswith('.md')]
                if not md_files:
                    raise RuntimeError(
                        "No markdown output found in Document Intelligence result. "
                        f"ZIP contents: {zf.namelist()}"
                    )
                with zf.open(md_files[0]) as f:
                    return f.read().decode('utf-8')
        finally:
            os.unlink(out_zip)

    finally:
        if tmp_zip:
            os.unlink(tmp_zip)


print("extract_judgment_text defined.")

### **3. Step 2 — PARSE: Structured JSON Extraction**

`parse_judgment` sends the raw OCR markdown to **Sarvam-M** with a structured prompt and returns a validated Python dict.

A `confidence` score below **0.85** triggers a warning — flag those judgments for manual review before relying on the extracted data.

In [None]:
PARSE_SYSTEM_PROMPT = """You are a precise legal document analyser. Extract the following fields from the court judgment text and return ONLY valid JSON with no other text, no markdown fences, no explanation.

Required JSON schema:
{
  \"case_number\": \"<string or null>\",
  \"court_name\": \"<string or null>\",
  \"judgment_date\": \"<DD-MM-YYYY format or null>\",
  \"petitioner\": \"<string or null>\",
  \"respondent\": \"<string or null>\",
  \"judges\": [\"<name of judge>\", \"...\"],
  \"acts_cited\": [\"<Act name and section>\", \"...\"],
  \"key_issues\": [\"<legal issue>\", \"...\"],
  \"decision\": \"<outcome of the case>\",
  \"reasoning_summary\": \"<brief summary of the court's reasoning>\",
  \"relief_granted\": \"<relief awarded or null>\",
  \"language_detected\": \"<primary language of the judgment, e.g. English, Hindi, Kannada>\",
  \"confidence\": <float between 0.0 and 1.0>
}

Rules:
- Use null (not \"null\") for optional string fields not present in the judgment
- judges, acts_cited, and key_issues must always be arrays, even if empty
- decision and reasoning_summary are required; use an empty string if not found
- confidence reflects how completely all fields could be read (1.0 = perfect, 0.0 = unreadable)
- Return ONLY the JSON object"""


def parse_judgment(raw_text: str) -> dict:
    """Parse raw judgment text into structured JSON using Sarvam-M."""
    response = client.chat.completions(
        messages=[
            {"role": "system", "content": PARSE_SYSTEM_PROMPT},
            {"role": "user", "content": f"Extract data from this court judgment:\n\n{raw_text}"}
        ]
    )

    if not response or not response.choices:
        raise ValueError("Sarvam-M returned no response. Check your API quota.")

    raw_json = response.choices[0].message.content.strip()
    # Strip markdown code fences if the model wraps output anyway
    raw_json = re.sub(r'^```(?:json)?\s*|\s*```$', '', raw_json, flags=re.DOTALL).strip()

    try:
        parsed = json.loads(raw_json)
    except json.JSONDecodeError:
        print(f"ERROR: Could not parse JSON from model response:\n{raw_json}")
        raise

    confidence = parsed.get("confidence", 1.0)
    if confidence < 0.85:
        print(
            f"WARNING: Low confidence ({confidence:.2f}) — review this judgment manually "
            "before relying on the extracted data."
        )

    return parsed


print("parse_judgment defined.")

### **4. Step 3 — EXPORT: Excel Judgment Summary Writer**

`write_to_excel` takes a parsed judgment dict and writes it into a formatted `.xlsx` file with two sheets:

- **Case Summary** — case number, court, date, parties, judges, acts cited, language, confidence
- **Legal Analysis** — key issues, decision, reasoning summary, relief granted

In [None]:
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment


def _join(items: list, sep: str = "\n") -> str:
    """Join a list into a single string, returning empty string for empty lists."""
    return sep.join(str(i) for i in items) if items else ""


def write_to_excel(judgment: dict, output_path: str) -> None:
    """Write a parsed judgment dict to a two-sheet Excel workbook.

    Sheet 1 — Case Summary: case/party/court metadata.
    Sheet 2 — Legal Analysis: issues, decision, reasoning, relief.
    """
    wb = Workbook()
    header_fill = PatternFill(start_color="1F4E79", end_color="1F4E79", fill_type="solid")
    header_font = Font(color="FFFFFF", bold=True)
    label_font  = Font(bold=True)
    wrap_align  = Alignment(wrap_text=True, vertical="top")

    def _sheet_title(ws, title: str) -> None:
        """Add a full-width dark blue title row to a worksheet."""
        ws.merge_cells("A1:B1")
        cell = ws.cell(row=1, column=1, value=title)
        cell.font      = Font(color="FFFFFF", bold=True, size=14)
        cell.fill      = PatternFill(start_color="1F4E79", end_color="1F4E79", fill_type="solid")
        cell.alignment = Alignment(horizontal="center")

    def _fill_rows(ws, fields: list, start_row: int = 2) -> None:
        """Write label-value pairs into a worksheet starting at start_row."""
        for row_idx, (label, value) in enumerate(fields, start=start_row):
            label_cell = ws.cell(row=row_idx, column=1, value=label)
            label_cell.font = label_font
            val_cell = ws.cell(row=row_idx, column=2, value=value)
            val_cell.alignment = wrap_align

    # ── Sheet 1: Case Summary ─────────────────────────────────────────
    ws_case = wb.active
    ws_case.title = "Case Summary"
    _sheet_title(ws_case, "Case Summary")

    _fill_rows(ws_case, [
        ("Case Number",       judgment.get("case_number")),
        ("Court Name",        judgment.get("court_name")),
        ("Judgment Date",     judgment.get("judgment_date")),
        ("Petitioner",        judgment.get("petitioner")),
        ("Respondent",        judgment.get("respondent")),
        ("Judges",            _join(judgment.get("judges", []))),
        ("Acts Cited",        _join(judgment.get("acts_cited", []))),
        ("Language Detected", judgment.get("language_detected")),
        ("Confidence",        judgment.get("confidence")),
    ])

    ws_case.column_dimensions["A"].width = 22
    ws_case.column_dimensions["B"].width = 55

    # ── Sheet 2: Legal Analysis ───────────────────────────────────────
    ws_legal = wb.create_sheet(title="Legal Analysis")
    _sheet_title(ws_legal, "Legal Analysis")

    _fill_rows(ws_legal, [
        ("Key Issues",        _join(judgment.get("key_issues", []))),
        ("Decision",          judgment.get("decision")),
        ("Reasoning Summary", judgment.get("reasoning_summary")),
        ("Relief Granted",    judgment.get("relief_granted")),
    ])

    ws_legal.column_dimensions["A"].width = 22
    ws_legal.column_dimensions["B"].width = 70

    wb.save(output_path)
    print(f"Judgment summary saved to: {output_path}")


print("write_to_excel defined.")

### **5. End-to-End Pipeline**

`process_judgment` ties all three steps together. Pass a single file path (image or PDF) and an output Excel path.

In [None]:
def process_judgment(file_path: str, output_path: str = "judgment_summary.xlsx") -> dict | None:
    """
    End-to-end pipeline: extract -> parse -> export for a single court judgment file.

    Args:
        file_path:   Path to a judgment image (.jpg, .png) or PDF (.pdf).
        output_path: Path for the output Excel file.

    Returns:
        Parsed judgment dict, or None if processing failed.
    """
    print(f"Processing: {file_path}")
    try:
        print("  Step 1/3 — Extracting text via Document Intelligence...")
        raw_text = extract_judgment_text(file_path)

        print("  Step 2/3 — Parsing structured data with Sarvam-M...")
        parsed = parse_judgment(raw_text)

        print("  Step 3/3 — Writing judgment summary to Excel...")
        write_to_excel(parsed, output_path)

        decision_preview = str(parsed.get("decision", ""))[:60]
        print(
            f"\nCase: {parsed.get('case_number')} | "
            f"Court: {parsed.get('court_name')} | "
            f"Decision: {decision_preview} | "
            f"Confidence: {parsed.get('confidence', 0):.2f}"
        )
        return parsed

    except Exception as e:
        print(f"ERROR: Failed to process {file_path}: {e}")
        return None


print("process_judgment defined.")

### **6. Demo — Run the Pipeline**

Cell 8 creates a synthetic typed court judgment image using Pillow (no real document required), then runs the full pipeline. The image mimics a High Court judgment with a formal letterhead, parties section, acts cited, and decision.

In [None]:
from PIL import Image, ImageDraw, ImageFont


def _create_sample_judgment(output_path: str = "sample_data/sample_judgment.png") -> str:
    """Create a synthetic typed court judgment image for demo purposes."""
    Path(output_path).parent.mkdir(parents=True, exist_ok=True)

    img  = Image.new("RGB", (820, 1120), color="white")
    draw = ImageDraw.Draw(img)

    try:
        font_title = ImageFont.truetype("/System/Library/Fonts/Helvetica.ttc", 16)
        font_bold  = ImageFont.truetype("/System/Library/Fonts/Helvetica.ttc", 14)
        font_body  = ImageFont.truetype("/System/Library/Fonts/Helvetica.ttc", 13)
        font_small = ImageFont.truetype("/System/Library/Fonts/Helvetica.ttc", 12)
    except (IOError, OSError):
        font_title = font_bold = font_body = font_small = ImageFont.load_default()

    BLACK = "black"

    # Outer border
    draw.rectangle([(15, 15), (805, 1105)], outline=BLACK, width=2)

    y = 35

    # Court header
    draw.text((410, y), "IN THE HIGH COURT OF KARNATAKA", font=font_title, fill=BLACK, anchor="mm")
    y += 22
    draw.text((410, y), "AT BENGALURU", font=font_title, fill=BLACK, anchor="mm")
    y += 20
    draw.line([(30, y), (790, y)], fill=BLACK, width=1)
    y += 14
    draw.text((410, y), "WRIT PETITION NO. WP 12345/2024", font=font_bold, fill=BLACK, anchor="mm")
    y += 20
    draw.text((410, y), "(Under Article 226 of the Constitution of India)", font=font_small, fill=BLACK, anchor="mm")
    y += 22
    draw.line([(30, y), (790, y)], fill=BLACK, width=1)
    y += 14

    # Parties
    draw.text((35, y), "BETWEEN:", font=font_bold, fill=BLACK)
    y += 22
    draw.text((50, y), "Ramesh Kumar", font=font_body, fill=BLACK)
    draw.text((620, y), "... Petitioner", font=font_body, fill=BLACK)
    y += 20
    draw.text((50, y), "S/o Mohan Kumar, 45 MG Road, Bengaluru - 560001", font=font_small, fill=BLACK)
    y += 24
    draw.text((410, y), "AND", font=font_bold, fill=BLACK, anchor="mm")
    y += 22
    draw.text((50, y), "State of Karnataka", font=font_body, fill=BLACK)
    draw.text((620, y), "... Respondent", font=font_body, fill=BLACK)
    y += 20
    draw.text((50, y), "rep. by Chief Secretary, Vidhana Soudha, Bengaluru - 560001", font=font_small, fill=BLACK)
    y += 22
    draw.line([(30, y), (790, y)], fill=BLACK, width=1)
    y += 14

    # Judge and date
    draw.text((35, y), "BEFORE:", font=font_bold, fill=BLACK)
    y += 20
    draw.text((50, y), "HON'BLE JUSTICE S. SURESH KUMAR", font=font_body, fill=BLACK)
    y += 22
    draw.text((35, y), "Date of Judgment: 05-02-2025", font=font_bold, fill=BLACK)
    y += 22
    draw.line([(30, y), (790, y)], fill=BLACK, width=1)
    y += 18

    # Judgment heading
    draw.text((410, y), "J U D G M E N T", font=font_title, fill=BLACK, anchor="mm")
    y += 28

    # Acts cited
    draw.text((35, y), "ACTS CITED:", font=font_bold, fill=BLACK)
    y += 20
    draw.text((50, y), "1.  Constitution of India, Article 226", font=font_body, fill=BLACK)
    y += 18
    draw.text((50, y), "2.  Karnataka Land Revenue Act, 1964, Section 94", font=font_body, fill=BLACK)
    y += 26

    # Key issue
    draw.text((35, y), "KEY ISSUE:", font=font_bold, fill=BLACK)
    y += 20
    draw.text((50, y), "Whether the impugned order of demolition was passed without adequate", font=font_body, fill=BLACK)
    y += 18
    draw.text((50, y), "notice to the petitioner, violating principles of natural justice.", font=font_body, fill=BLACK)
    y += 26

    # Reasoning
    draw.text((35, y), "REASONING:", font=font_bold, fill=BLACK)
    y += 20
    for line in [
        "The respondent failed to issue mandatory notice under Section 94 of the",
        "Karnataka Land Revenue Act, 1964 before passing the demolition order,",
        "thereby violating the principles of natural justice. The petitioner was",
        "not given an opportunity to be heard before the impugned order was passed.",
    ]:
        draw.text((50, y), line, font=font_body, fill=BLACK)
        y += 18
    y += 18

    draw.line([(30, y), (790, y)], fill=BLACK, width=1)
    y += 18

    # Decision
    draw.text((35, y), "DECISION:", font=font_bold, fill=BLACK)
    y += 20
    draw.text((50, y), "Petition ALLOWED. The order of demolition dated 01-01-2025 is", font=font_body, fill=BLACK)
    y += 18
    draw.text((50, y), "hereby QUASHED.", font=font_body, fill=BLACK)
    y += 26

    # Relief
    draw.text((35, y), "RELIEF GRANTED:", font=font_bold, fill=BLACK)
    y += 20
    draw.text((50, y), "1.  Stay of demolition order with immediate effect.", font=font_body, fill=BLACK)
    y += 18
    draw.text((50, y), "2.  Respondent directed to issue fresh notice within 30 days.", font=font_body, fill=BLACK)
    y += 50

    draw.line([(30, y), (790, y)], fill=BLACK, width=1)
    y += 18

    # Signature
    draw.text((620, y), "Sd/-", font=font_bold, fill=BLACK)
    y += 20
    draw.text((580, y), "HON'BLE JUSTICE S. SURESH KUMAR", font=font_small, fill=BLACK)

    img.save(output_path)
    print(f"Sample judgment created: {output_path}")
    return output_path


# --- Run the demo ---
sample_path = _create_sample_judgment()
result = process_judgment(sample_path, output_path="judgment_summary.xlsx")

### **7. Results**

View the parsed JSON and download the generated Excel report.

In [None]:
from IPython.display import FileLink, display

if result:
    print("=== Parsed Judgment Data ===\n")
    print(json.dumps(result, indent=2, ensure_ascii=False))

    print("\n=== Download Judgment Summary ===")
    display(FileLink("judgment_summary.xlsx", result_html_prefix="Click to download: "))
else:
    print("No result to display. Check the error messages above.")

### **8. Error Reference**

| Error Code | HTTP Status | Cause | Solution |
| :--- | :--- | :--- | :--- |
| `invalid_api_key_error` | 403 | Invalid API key | Verify your key at [dashboard.sarvam.ai](https://dashboard.sarvam.ai). |
| `insufficient_quota_error` | 429 | Quota exceeded | Check your usage limits. |
| `internal_server_error` | 500 | Server-side issue | Wait and retry the request. |
| Job state not `Completed` | — | Doc Intelligence failure | Check file format; supported: `.pdf`, `.zip` (images auto-wrapped). |
| `JSONDecodeError` | — | Sarvam-M returned non-JSON | Usually transient; re-run the cell. |

### **9. Conclusion & Resources**

This recipe demonstrates how to chain **Sarvam Vision** and **Sarvam-M** into a practical legal document analysis workflow — summarising court judgments in any Indian language with a single pipeline call.

* [Sarvam AI Docs](https://docs.sarvam.ai)
* [Document Intelligence API](https://docs.sarvam.ai/api-reference-docs/document-intelligence)
* [Sarvam-M Chat API](https://docs.sarvam.ai/api-reference-docs/chat)
* [Indic Language Support](https://docs.sarvam.ai/language-support)

> ⚠️ **Reminder:** This notebook is for demo and educational purposes only. Do not use it as a substitute for legal advice.

**Keep Building!**