# TODO: Recipe Title

TODO: One or two sentences describing what this recipe does and what problem it solves.

### Pipeline

1. **Extract:** TODO — describe Step 1 (e.g., read a PDF with Document Intelligence).
2. **Parse:** TODO — describe Step 2 (e.g., extract structured fields with Sarvam-M).
3. **Output:** TODO — describe Step 3 (e.g., write results to `outputs/result.xlsx`).

### Supported Inputs

TODO: list accepted file formats (e.g., PDF, PNG, JPG, WAV, MP3).

In [None]:
# TODO: add your packages. Mirror requirements.txt exactly.
!pip install -Uqq sarvamai>=0.1.24 python-dotenv>=1.0.0

### **1. Setup & API Key**

Obtain your API key from the [Sarvam AI Dashboard](https://dashboard.sarvam.ai).
Create a `.env` file in this directory with `SARVAM_API_KEY=your_key_here`, or set
the environment variable directly before launching Jupyter.

In [None]:
from __future__ import annotations

import os
import traceback
from pathlib import Path

# TODO: add your imports here

from dotenv import load_dotenv
from sarvamai import SarvamAI

load_dotenv()

SARVAM_API_KEY = os.environ.get("SARVAM_API_KEY", "")
if not SARVAM_API_KEY or SARVAM_API_KEY == "YOUR_SARVAM_API_KEY":
    raise RuntimeError(
        "SARVAM_API_KEY is not set. Add it to your .env file or set the environment variable."
    )

client = SarvamAI(api_subscription_key=SARVAM_API_KEY)

print("Client initialised.")

### **2. Step 1 — EXTRACT**

TODO: Describe what this step does — for example, reading a PDF with the Document
Intelligence API, or recording / loading an audio file for STT.

Accepted inputs: TODO (e.g., PDF, PNG, JPG — PNG/JPG must be ZIP-wrapped).

In [None]:
def extract(file_path: str) -> str:
    """TODO: Extract raw content from the input file.

    Args:
        file_path: Path to the input file.

    Returns:
        TODO: raw text, transcript, or bytes extracted from the file.
    """
    path = Path(file_path)

    # TODO: implement extraction.
    # Example — Document Intelligence async workflow:
    #   job = client.documents.create(...)
    #   ... poll until complete ...
    #   pages = client.documents.get_result(job.job_id)
    #   return " ".join(p.markdown for p in pages)

    raise NotImplementedError("Replace this with your extraction logic.")


print("extract defined.")

### **3. Step 2 — PARSE**

TODO: Describe what this step does — for example, sending extracted text to Sarvam-M
to pull out structured fields (name, date, amounts) as JSON.

In [None]:
def parse(raw_text: str) -> dict | None:
    """TODO: Parse raw extracted text into structured data using Sarvam AI.

    Args:
        raw_text: The text returned by the extract step.

    Returns:
        A dict of parsed fields, or None if parsing fails.
    """
    if not raw_text.strip():
        raise ValueError("raw_text is empty — extraction may have failed.")

    # TODO: implement parsing.
    # Example — Chat completion:
    #   response = client.chat.completions(
    #       messages=[
    #           {"role": "system", "content": SYSTEM_PROMPT},
    #           {"role": "user", "content": raw_text},
    #       ]
    #   )
    #   content = response.choices[0].message.content
    #   if content is None:
    #       raise ValueError("Model returned an empty response.")
    #   return json.loads(content)

    raise NotImplementedError("Replace this with your parsing logic.")


print("parse defined.")

### **4. Step 3 — OUTPUT**

TODO: Describe what this step produces — for example, writing structured data to an
Excel file, filling an HTML form, or synthesizing a TTS audio reply.

All generated files are saved to the `outputs/` folder.

In [None]:
def output(parsed: dict, output_dir: str = "outputs") -> str:
    """TODO: Save the parsed results to a file.

    Args:
        parsed:     The structured data returned by the parse step.
        output_dir: Directory where the output file is saved.

    Returns:
        Path to the saved output file.
    """
    Path(output_dir).mkdir(parents=True, exist_ok=True)

    # TODO: implement output.
    # Example — write to Excel:
    #   wb = openpyxl.Workbook()
    #   ws = wb.active
    #   ws.title = "Results"
    #   for key, value in parsed.items():
    #       ws.append([key, value])
    #   out_path = str(Path(output_dir) / "result.xlsx")
    #   wb.save(out_path)
    #   return out_path

    raise NotImplementedError("Replace this with your output logic.")


print("output defined.")

### **5. End-to-End Pipeline**

`process` ties all three steps together. Pass any supported input file path and
receive a dict summarising the result, or `None` if the pipeline fails at any step.

In [None]:
def process(file_path: str, output_dir: str = "outputs") -> dict | None:
    """Full pipeline: extract -> parse -> output.

    Args:
        file_path:  Path to the input file.
        output_dir: Directory where the output file is saved.

    Returns:
        Dict summarising the result, or None if the pipeline fails.
    """
    print(f"Processing: {file_path}")
    try:
        print("  Step 1/3 — Extracting...")
        raw = extract(file_path)

        print("  Step 2/3 — Parsing...")
        parsed = parse(raw)
        if parsed is None:
            raise ValueError("parse() returned None — check the parsing step for errors.")

        print("  Step 3/3 — Writing output...")
        out_path = output(parsed, output_dir)

        print(f"Done. Output saved to: {out_path}")
        return {"output_path": out_path, "parsed": parsed}

    except Exception as e:
        traceback.print_exc()
        print(f"ERROR: Failed to process {file_path}: {e}")
        return None


print("process defined.")

### **6. Demo**

TODO: Replace the sample file creation below with your own synthetic input generator,
or point directly at a file in `sample_data/`.

Running this cell end-to-end validates the full pipeline without any external files.

In [None]:
# TODO: create or point to a sample input file, then call process().
# Example:
#   sample_path = "sample_data/sample_input.pdf"
#   result = process(sample_path)

# Placeholder — remove once you have a real demo:
print("TODO: add your demo here.")
result = None

### **7. Results**

TODO: Inspect the output below. Depending on your recipe, display a table, play
audio, or show a download link.

In [None]:
if result:
    # TODO: display your results.
    # Examples:
    #   from IPython.display import Audio, FileLink, display
    #   display(Audio(filename=result['output_path']))          # for audio
    #   display(FileLink(result['output_path']))               # for files
    print("Result:", result)
else:
    print("Processing failed. Check the error messages above.")

### **8. Error Reference**

| Error | HTTP Status | Cause | Solution |
| :--- | :--- | :--- | :--- |
| `RuntimeError: SARVAM_API_KEY is not set` | — | Missing API key | Add key to `.env` |
| `invalid_api_key_error` | 403 | Invalid API key | Verify at [dashboard.sarvam.ai](https://dashboard.sarvam.ai) |
| `insufficient_quota_error` | 429 | Quota exceeded | Check usage limits |
| `internal_server_error` | 500 | Transient server issue | Wait and retry |
| TODO: add recipe-specific errors | | | |

### **9. Using Your Own Input**

```python
result = process("path/to/your/input_file")
```

TODO: note any file format requirements (resolution, duration, encoding).

### **10. Conclusion & Resources**

TODO: one or two sentences summarising what the recipe demonstrated.

* [Sarvam AI Docs](https://docs.sarvam.ai)
* [Sarvam AI Dashboard](https://dashboard.sarvam.ai)
* [Sarvam AI Discord](https://discord.com/invite/8ka56wQaT3)

**Keep Building!**