# **Document Intelligence API Tutorial**

This notebook demonstrates how to use **Sarvam's Document Intelligence API** to extract structured, machine-readable data from documents. Powered by [Sarvam Vision](https://docs.sarvam.ai/api-reference-docs/getting-started/models/sarvam-vision), the API supports:

- **High-fidelity text extraction** across 23 languages (22 Indian + English)
- **Layout & structure preservation** including reading order and hierarchies
- **Table parsing** into structured HTML or Markdown
- **Multiple output formats** â€” HTML, Markdown, or JSON (delivered as ZIP)
- **Batch processing** of multi-page documents and ZIP archives

### Supported Input Formats

| Format | Extension | Description |
|--------|-----------|-------------|
| PDF | `.pdf` | Multi-page PDF documents |
| PNG | `.png` | Document page images |
| JPEG | `.jpg`, `.jpeg` | Document page images |
| ZIP | `.zip` | Flat archive containing document page images (JPG/PNG) |

## **1. Installation**

Install the Sarvam AI Python SDK:

In [None]:
!pip install -Uqq sarvamai

## **2. Setting Up the API Key**

To use the Document Intelligence API, you need an API subscription key:

1. **Obtain your API key**: Sign up on the [Sarvam AI Dashboard](https://dashboard.sarvam.ai/) to get one.
2. **Replace the placeholder key**: In the code below, replace `"YOUR_SARVAM_API_KEY"` with your actual API key.

In [None]:
SARVAM_API_KEY = "YOUR_SARVAM_API_KEY"

## **3. Initialize the Client**

Create a `SarvamAI` client instance with your API key.

In [None]:
from sarvamai import SarvamAI

client = SarvamAI(api_subscription_key=SARVAM_API_KEY)

## **4. Understanding the Parameters**

### Job Configuration

| Parameter | Type | Description | Options |
|-----------|------|-------------|---------|  
| `language` | string | Target language in BCP-47 format | See supported languages below |
| `output_format` | string | Output format for processed documents | `"md"` (Markdown, default), `"html"`, `"json"` |

### Supported Languages (BCP-47 format)

| Language | Code | Language | Code |
|----------|------|----------|------|
| Hindi | `hi-IN` | English | `en-IN` |
| Bengali | `bn-IN` | Gujarati | `gu-IN` |
| Kannada | `kn-IN` | Malayalam | `ml-IN` |
| Marathi | `mr-IN` | Odia | `or-IN` |
| Punjabi | `pa-IN` | Tamil | `ta-IN` |
| Telugu | `te-IN` | Urdu | `ur-IN` |
| Assamese | `as-IN` | Bodo | `bodo-IN` |
| Dogri | `doi-IN` | Kashmiri | `ks-IN` |
| Konkani | `kok-IN` | Maithili | `mai-IN` |
| Manipuri | `mni-IN` | Nepali | `ne-IN` |
| Sanskrit | `sa-IN` | Santali | `sat-IN` |
| Sindhi | `sd-IN` | | |

### Output Formats (delivered as ZIP file)

| Format | Description |
|--------|-------------|
| `md` | Markdown files (default) |
| `html` | Structured HTML files with layout preservation |
| `json` | Structured JSON files for programmatic processing |

## **5. Process a Document**

The Document Intelligence API uses an asynchronous job-based workflow:

1. **Create a job** with your desired language and output format
2. **Upload your document** (PDF, image, or ZIP archive)
3. **Start the job** to begin processing
4. **Wait for completion** and monitor progress
5. **Download the output** (ZIP file with processed results)

### 5.1 Upload a Document

First, make sure you have a PDF or image file ready. You can upload it using the widget below (for Google Colab) or set the file path directly.

In [None]:
# Option 1: Set the path to your document directly
document_path = "document.pdf"  # Replace with your document path

# Option 2: Upload a file in Google Colab
# from google.colab import files
# uploaded = files.upload()
# document_path = list(uploaded.keys())[0]

### 5.2 Create a Job, Upload, and Process

In [None]:
# Step 1: Create a Document Intelligence job
job = client.document_intelligence.create_job(
    language="hi-IN",       # Target language (BCP-47 format)
    output_format="md"      # Output format: "html", "md", or "json" (delivered as ZIP)
)
print(f"Job created successfully!")

# Step 2: Upload your document
job.upload_file(document_path)
print(f"Document uploaded: {document_path}")

# Step 3: Start processing
job.start()
print("Processing started...")

### 5.3 Wait for Completion and Download Results

In [None]:
# Step 4: Wait for completion
status = job.wait_until_complete()
print(f"Job completed with state: {status.job_state}")

# Step 5: Get processing metrics
metrics = job.get_page_metrics()
print(f"\nProcessing Metrics:")
print(f"  Total pages: {metrics['total_pages']}")
print(f"  Pages processed: {metrics['pages_processed']}")
print(f"  Pages succeeded: {metrics['pages_succeeded']}")
print(f"  Pages failed: {metrics['pages_failed']}")

# Step 6: Download the output (ZIP file)
output_path = "./output.zip"
job.download_output(output_path)
print(f"\nOutput saved to {output_path}")

### 5.4 Extract and View Results

In [None]:
import zipfile
import os

# Extract the ZIP file
extract_dir = "./doc_intelligence_output"
with zipfile.ZipFile(output_path, 'r') as zip_ref:
    zip_ref.extractall(extract_dir)

# List extracted files
print("Extracted files:")
for root, dirs, files in os.walk(extract_dir):
    for file in sorted(files):
        filepath = os.path.join(root, file)
        print(f"  {filepath}")

# Display the content of the first output file
output_files = []
for root, dirs, files in os.walk(extract_dir):
    for file in sorted(files):
        output_files.append(os.path.join(root, file))

if output_files:
    print(f"\n--- Content of {output_files[0]} ---\n")
    with open(output_files[0], 'r', encoding='utf-8') as f:
        content = f.read()
        print(content[:2000])  # Print first 2000 chars
        if len(content) > 2000:
            print("\n... (truncated)")

## **6. Error Handling**

Here's how to handle errors gracefully:

In [None]:
from sarvamai import SarvamAI
from sarvamai.core.api_error import ApiError

client = SarvamAI(api_subscription_key=SARVAM_API_KEY)

try:
    job = client.document_intelligence.create_job(
        language="hi-IN",
        output_format="md"
    )
    job.upload_file("document.pdf")
    job.start()
    status = job.wait_until_complete()

    if status.job_state == "Completed":
        job.download_output("./output.zip")
        print("Output saved to ./output.zip")
    else:
        print(f"Job ended with state: {status.job_state}")
        print(f"Details: {status}")

except ApiError as e:
    if e.status_code == 400:
        print(f"Bad request: {e.body}")
    elif e.status_code == 403:
        print("Invalid API key")
    elif e.status_code == 429:
        print("Rate limit exceeded")
    else:
        print(f"Error {e.status_code}: {e.body}")
except FileNotFoundError:
    print("Document file not found")

## **7. Job States**

| State | Description |
|-------|-------------|
| `Accepted` | Job created, awaiting file upload |
| `Pending` | File uploaded, waiting to start |
| `Running` | Processing in progress |
| `Completed` | All pages processed successfully |
| `PartiallyCompleted` | Some pages succeeded, some failed |
| `Failed` | All pages failed or job-level error |

## **8. Error Codes**

| HTTP Status | Error Code | Description |
|-------------|-----------|-------------|
| `400` | `invalid_request_error` | Invalid parameters or missing required fields |
| `403` | `invalid_api_key_error` | Invalid or missing API key |
| `404` | `not_found_error` | Job not found |
| `422` | `unprocessable_entity_error` | Invalid file format or corrupted file |
| `429` | `insufficient_quota_error` | Rate limit or quota exceeded |
| `500` | `internal_server_error` | Server error, retry the request |

## **9. Best Practices**

- **Choose the right format**: Use Markdown for human-readable output and HTML for web rendering and rich formatting.
- **Specify language**: Always specify the correct language code for optimal text extraction accuracy, especially for Indian languages.
- **Handle large documents**: For large documents, monitor `page_metrics` to track progress and handle partial failures gracefully.
- **Use HTML for tables**: Choose HTML output format when you need to preserve table structures and rich formatting.
- **ZIP input**: For ZIP files, include only JPG and PNG document pages in a flat structure (no nested folders).

## **10. Conclusion**

This notebook demonstrated how to use **Sarvam's Document Intelligence API** to extract structured data from documents. The API provides:

1. Enterprise-grade document processing powered by Sarvam Vision
2. Support for 23 languages including all 22 Constitutionally recognized Indian languages
3. Multiple output formats (Markdown, HTML, JSON)
4. Job-based async processing with progress tracking

---

### **Additional Resources**

- **Documentation**: [docs.sarvam.ai](https://docs.sarvam.ai)
- **API Reference**: [Document Intelligence API](https://docs.sarvam.ai/api-reference-docs/document-intelligence)
- **Sarvam Vision Model**: [Learn more](https://docs.sarvam.ai/api-reference-docs/getting-started/models/sarvam-vision)
- **Community**: [Join the Discord Community](https://discord.gg/hTuVuPNF)

---

### **Final Notes**

- Keep your API key secure.
- Specify the correct language for best results on Indian language documents.
- Monitor page metrics for large documents to track progress.

**Keep Building!**