# Running Docling as an API Service
*Using docling-serve for scalable document processing*

## Lab 4: Docling as a Service

In this lab, you'll learn how to deploy Docling as a REST API service using **docling-serve**. This enables:

- **Scalable Processing**: Handle multiple document conversion requests concurrently
- **Language Agnostic**: Use Docling from any programming language via HTTP
- **Microservices Architecture**: Integrate document processing into your existing infrastructure
- **Batch Processing**: Process large document collections efficiently

This is ideal for production deployments where you need to process documents from multiple applications or services.

## Prerequisites

Before we begin, ensure you have:
- Python 3.10 or later installed
- Completed Labs 1-3 (or equivalent Docling knowledge)
- Basic understanding of REST APIs

## Setting up the environment

Ensure you are running Python 3.10 or later in a freshly created virtual environment.

In [None]:
import sys
assert sys.version_info >= (3, 10) and sys.version_info < (3, 14), "Use Python 3.10, 3.11, 3.12, or 3.13 to run this notebook."

## Install Dependencies

In [None]:
! echo "::group::Install Dependencies"
%pip install uv
! uv pip install docling-serve httpx
! echo "::endgroup::"

## Understanding docling-serve

[docling-serve](https://github.com/docling-project/docling-serve) is the official REST API wrapper for Docling. It provides:

- **FastAPI-based API**: Modern, fast, and well-documented endpoints
- **OpenAPI Schema**: Auto-generated API documentation
- **Async Processing**: Non-blocking document conversion
- **Multiple Output Formats**: JSON, Markdown, HTML, and more
- **Chunking Endpoints**: Built-in support for hybrid and hierarchical chunking

## Starting the Server

To start the docling-serve server, run the following command in a terminal:

```bash
docling-serve run
```

By default, the server starts on `http://localhost:5000`. You can customize this:

```bash
# Custom host and port
docling-serve run --host 0.0.0.0 --port 8080

# Enable debug mode
docling-serve run --reload
```

Once running, you can access the API documentation at `http://localhost:5000/docs`.

## Terminal Commands Reference

Copy and paste these commands directly into your terminal to interact with docling-serve.

### Start the Server

```bash
# Basic start (default port 5001)
docling-serve run --port 5001

# Start without OCR (faster, no easyocr dependency)
DOCLING_SERVE_ENGINE=DoclingParseV2DocumentBackend docling-serve run --port 5001

# Start with custom host for external access
docling-serve run --host 0.0.0.0 --port 5001
```

### Health Check

```bash
curl http://localhost:5001/health
```

### Convert a Document from URL

```bash
curl -X POST "http://localhost:5001/v1/convert/source" \
  -H "Content-Type: application/json" \
  -d '{
    "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2501.17887"}],
    "options": {"do_ocr": false, "pdf_backend": "dlparse_v2"}
  }'
```

### Convert with Markdown Output

```bash
curl -X POST "http://localhost:5001/v1/convert/source" \
  -H "Content-Type: application/json" \
  -d '{
    "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2501.17887"}],
    "options": {"to_formats": ["md"], "do_ocr": false}
  }' | jq -r '.md' | head -100
```

### Convert a Local File

```bash
curl -X POST "http://localhost:5001/v1/convert/file" \
  -F "files=@/path/to/document.pdf" \
  -F "options={\"do_ocr\": false}"
```

### Chunk a Document (for RAG)

```bash
curl -X POST "http://localhost:5001/v1/chunk/hybrid/source" \
  -H "Content-Type: application/json" \
  -d '{
    "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2501.17887"}],
    "options": {"do_ocr": false}
  }'
```

### View API Documentation

Open in your browser: http://localhost:5001/docs

## Making API Requests

Let's demonstrate how to interact with the docling-serve API using Python's `httpx` library.

In [None]:
import httpx
import json

# Configure the API endpoint
BASE_URL = "http://localhost:5001"

# Check if server is running
def check_server():
    try:
        response = httpx.get(f"{BASE_URL}/health", timeout=5.0)
        if response.status_code == 200:
            print("docling-serve is running!")
            return True
    except httpx.ConnectError:
        print("Server not running. Please start docling-serve first:")
        print("  docling-serve run")
        return False

server_running = check_server()

### Converting a Document via API

In [None]:
if server_running:
    # Example: Convert a PDF document
    document_url = "https://arxiv.org/pdf/2501.17887"
    
    # Make conversion request
    response = httpx.post(
        f"{BASE_URL}/v1/convert/source",
        json={
            "sources": [
                {"kind": "http", "url": document_url}
            ],
            "options": {
                "do_ocr": False,
                "pdf_backend": "dlparse_v2",
            }
        },
        timeout=120.0  # Document conversion can take time
    )
    
    if response.status_code == 200:
        result = response.json()
        print("Conversion successful!")
        print(f"Response keys: {list(result.keys())}")
        
        # Try different possible markdown field names
        markdown = result.get('md') or result.get('markdown') or result.get('document', {}).get('md', '')
        if markdown:
            print("\nMarkdown preview (first 500 chars):")
            print(markdown[:500])
    else:
        print(f"Conversion failed: {response.status_code}")
        print(response.text)
else:
    print("Skipping - server not running")

### Uploading a Local File

In [None]:
if server_running:
    # Example: Upload a local file (uncomment and modify path)
    # from pathlib import Path
    
    # file_path = Path("path/to/your/document.pdf")
    
    # with open(file_path, "rb") as f:
    #     response = httpx.post(
    #         f"{BASE_URL}/v1/convert/file",
    #         files={"file": (file_path.name, f, "application/pdf")},
    #         data={"options": json.dumps({"to_markdown": True})},
    #         timeout=120.0
    #     )
    
    # if response.status_code == 200:
    #     result = response.json()
    #     print("File converted successfully!")
    
    print("Local file upload example - uncomment and modify the code above to test")
else:
    print("Skipping - server not running")

## Batch Processing

For processing multiple documents, you can make concurrent requests:

In [None]:
import asyncio

async def convert_document_async(client, url):
    """Convert a single document asynchronously"""
    response = await client.post(
        f"{BASE_URL}/v1/convert/source",
        json={
            "sources": [{"kind": "http", "url": url}],
            "options": {
                "do_ocr": False,
                "pdf_backend": "dlparse_v2",
            }
        },
        timeout=120.0
    )
    return url, response.status_code == 200

async def batch_convert(urls):
    """Convert multiple documents concurrently"""
    async with httpx.AsyncClient() as client:
        tasks = [convert_document_async(client, url) for url in urls]
        results = await asyncio.gather(*tasks, return_exceptions=True)
        return results

if server_running:
    # Example batch of documents
    document_urls = [
        "https://arxiv.org/pdf/2501.17887",
        # Add more URLs as needed
    ]
    
    print(f"Processing {len(document_urls)} documents...")
    
    # In Jupyter, use 'await' directly instead of asyncio.run()
    results = await batch_convert(document_urls)
    
    for url, success in results:
        status = "✓ Success" if success else "✗ Failed"
        print(f"  {url.split('/')[-1]}: {status}")
else:
    print("Skipping - server not running")

## Docker Deployment

For production deployments, you can run docling-serve in Docker:

```bash
# Pull the official image
docker pull quay.io/docling-project/docling-serve

# Run the container
docker run -p 5000:5000 quay.io/docling-project/docling-serve
```

Or with Docker Compose:

```yaml
version: '3.8'
services:
  docling:
    image: quay.io/docling-project/docling-serve
    ports:
      - "5000:5000"
    environment:
      - DOCLING_WORKERS=4
```

## API Reference

### Key Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/v1/convert/source` | POST | Convert from URL |
| `/v1/convert/file` | POST | Convert uploaded file |
| `/docs` | GET | OpenAPI documentation |

### Conversion Options

- `to_markdown`: Export as Markdown
- `to_json`: Export as JSON
- `to_html`: Export as HTML
- `do_chunking`: Enable chunking
- `chunker`: Chunking strategy ("hybrid" or "hierarchical")
- `max_tokens`: Maximum tokens per chunk

See the [docling-serve documentation](https://github.com/docling-project/docling-serve) for the complete API reference.

## Summary

In this lab, you learned how to:

1. **Deploy docling-serve**: Run Docling as a REST API service
2. **Make API requests**: Convert documents via HTTP endpoints
3. **Batch process**: Handle multiple documents concurrently
4. **Deploy with Docker**: Run in production environments

### When to Use docling-serve

- Building microservices that need document processing
- Creating web applications with document upload features
- Processing documents from non-Python applications
- Scaling document processing across multiple workers

### Resources

- [docling-serve GitHub](https://github.com/docling-project/docling-serve)
- [Docling Documentation](https://docling-project.github.io/docling/)
- [FastAPI Documentation](https://fastapi.tiangolo.com/)