# Asynchronous Inference with Ray Serve

**⏱️ Time to complete**: 30 minutes

This template demonstrates how to build scalable asynchronous inference services using Ray Serve. Learn how to handle long-running PDF processing tasks without blocking HTTP responses, using Celery task queues and Redis as a message broker.

## Overview

Traditional synchronous APIs block until processing completes, causing timeouts for long-running tasks. Ray Serve's asynchronous inference pattern decouples request lifetime from compute time by:

1. Accepting HTTP requests and immediately returning a task ID
2. Enqueuing work to background processors (Celery workers)
3. Allowing clients to poll for status and retrieve results

This example implements a **PDF processing service** that extracts text and generates summaries from PDF documents.

## Prerequisites

- Python 3.9+
- Ray 2.50.0+
- Redis (for message broker and result backend)

## Step 1: Setup Redis

Redis serves as both the message broker (task queue) and result backend.

**Docker (Recommended for local testing)**

In [None]:
# Run Redis in Docker
!docker run -d -p 6379:6379 redis:latest

**Alternative: Install Redis locally**

- macOS: `brew install redis && brew services start redis`
- Ubuntu: `sudo apt-get install redis-server && sudo systemctl start redis`
- [Official Redis Installation Guide](https://redis.io/docs/getting-started/installation/)

## Step 2: Install Dependencies

In [None]:
!pip install -q ray[serve-async-inference]>=2.50.0 requests>=2.31.0 PyPDF2>=3.0.0

## Step 3: Start the Ray Serve Application

Open a terminal and run:

```bash
serve run server:app
```

The service will be available at `http://localhost:8000`

## Step 4: Test the Service

First, let's run the complete example to see it in action:

In [None]:
# Run the complete example client
!python client.py

### Understanding the Client Code

Now let's break down how the async workflow works. We'll use the client methods interactively:

In [None]:
import requests
import time

BASE_URL = "http://localhost:8000"

#### 1. Submit a PDF Processing Task

Submit returns immediately with a task ID, without waiting for processing:

In [None]:
# Submit a task
pdf_url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"

response = requests.post(
    f"{BASE_URL}/process",
    json={
        "pdf_url": pdf_url,
        "max_summary_paragraphs": 2
    }
)

task_data = response.json()
task_id = task_data["task_id"]["id"]

print(f"✓ Task submitted!")
print(f"  Task ID: {task_id}")
print(f"  Status: {task_data['status']}")

#### 2. Poll for Task Status

Check the task status to see if it's complete:

In [None]:
# Check status (may need to run this cell multiple times)
response = requests.get(f"{BASE_URL}/status/{task_id}")
status_data = response.json()

print(f"Task status: {status_data['status']}")

if status_data['status'] == 'SUCCESS':
    result = status_data['result']
    print(f"\n✓ Complete!")
    print(f"  Pages: {result['page_count']}")
    print(f"  Words: {result['word_count']}")
    print(f"  Time: {result['processing_time_seconds']}s")
elif status_data['status'] == 'FAILURE':
    print(f"\n✗ Failed: {status_data.get('error')}")
else:
    print(f"  Still processing... (Status: {status_data['status']})")

#### 3. Wait for Completion

Or use a polling loop to automatically wait:

In [None]:
# Submit a new task and wait for completion
pdf_url = "https://arxiv.org/pdf/1706.03762.pdf"

response = requests.post(
    f"{BASE_URL}/process",
    json={"pdf_url": pdf_url, "max_summary_paragraphs": 3}
)

task_id = response.json()["task_id"]["id"]
print(f"Task submitted: {task_id}\n")

# Poll until complete
max_attempts = 40
for attempt in range(max_attempts):
    response = requests.get(f"{BASE_URL}/status/{task_id}")
    status_data = response.json()
    
    if status_data['status'] == 'SUCCESS':
        result = status_data['result']
        print(f"\n✓ Complete!")
        print(f"  Pages: {result['page_count']}")
        print(f"  Words: {result['word_count']}")
        print(f"  Time: {result['processing_time_seconds']}s")
        print(f"\n  Summary preview:")
        print(f"  {result['summary'][:200]}...")
        break
    elif status_data['status'] == 'FAILURE':
        print(f"✗ Failed: {status_data.get('error')}")
        break
    elif attempt % 5 == 0:
        print(f"  Still processing... ({status_data['status']})")
    
    time.sleep(3)

## Architecture Overview

```
┌─────────────┐
│   Client    │
└──────┬──────┘
       │ HTTP POST /process
       ▼
┌─────────────────────┐
│   AsyncPDFAPI       │ ← Ingress Deployment
│ (HTTP Endpoints)    │
└──────┬──────────────┘
       │ enqueue_task()
       ▼
┌─────────────────────┐
│   Redis Queue       │ ← Message Broker
│ (Celery Backend)    │
└──────┬──────────────┘
       │ consume tasks
       ▼
┌─────────────────────┐
│   PDFProcessor      │ ← Task Consumer Deployment
│ @task_consumer      │   (Scaled to N replicas)
│ - process_pdf       │
└─────────────────────┘
```

## Key Concepts

### Task Consumer

The `@task_consumer` decorator transforms a Ray Serve deployment into a Celery worker that processes tasks from a queue:

```python
@serve.deployment(num_replicas=2, max_ongoing_requests=5)
@task_consumer(
    TaskProcessorConfig(
        queue_name="pdf_processing_queue",
        adapter_config=CeleryAdapterConfig(...),
        max_retries=3,
    )
)
class PDFProcessor:
    ...
```

### Task Handler

The `@task_handler` decorator marks a method that processes a specific task type:

```python
@task_handler(name="process_pdf")
def process_pdf(self, pdf_url: str, max_summary_paragraphs: int = 3):
    # Download PDF, extract text, generate summary
    return {"status": "success", ...}
```

### Task Adapter

The adapter provides methods to interact with the task queue:

```python
# Enqueue a task
task_id = adapter.enqueue_task_sync(
    task_name="process_pdf",
    kwargs={"pdf_url": url}
)

# Check status
status = adapter.get_task_status_sync(task_id)
```

## Deploy to Anyscale

1. Update Redis configuration in `server.py` with your production Redis instance
2. Deploy using the Anyscale CLI:

```bash
anyscale service deploy -f service.yaml
```

3. Get your service URL:

```bash
anyscale service status
```

## Learn More

- [Ray Serve Documentation](https://docs.ray.io/en/latest/serve/index.html)
- [Asynchronous Inference Guide](https://docs.ray.io/en/master/serve/asynchronous-inference.html)
- [Celery Documentation](https://docs.celeryq.dev/)
- [Redis Documentation](https://redis.io/docs/)
- [PyPDF2 Documentation](https://pypdf2.readthedocs.io/)
- [Anyscale Platform](https://docs.anyscale.com/)