DeepSeek-OCR-2 OpenAI-compatible API Server

Production Docker deployment for DeepSeek-OCR-2 with OpenAI-compatible API.

Files

.
├── Dockerfile          # Production Docker image
├── openai_server.py    # Custom OpenAI-compatible API server
└── README.md           # This file

Why `openai_server.py`?

DeepSeek-OCR-2 includes a native vLLM model implementation (deepseek_ocr2.py), but it cannot be used directly with vLLM's built-in OpenAI server. The issue is image preprocessing:

vLLM's OpenAI server passes raw PIL images to the model's processor
DeepSeek's processor expects images pre-processed by tokenize_with_images() — a custom method that handles dynamic resolution cropping, tiling, and feature extraction

When you try to use vllm serve directly, you get errors like:

TypeError: cannot unpack non-iterable Image object
TypeError: 'Image' object is not subscriptable

Our openai_server.py solves this by:

Extracting base64 images from OpenAI-format requests
Preprocessing them using DeepSeek's tokenize_with_images() method (same as their run_dpsk_ocr2_image.py)
Passing the processed features to vLLM's AsyncLLMEngine
Returning OpenAI-compatible responses with streaming support

This approach uses the exact same preprocessing pipeline as DeepSeek's official scripts, ensuring correct results.

Requirements

Docker with NVIDIA Container Toolkit
NVIDIA GPU with CUDA 11.8+ support
~8GB+ VRAM (model uses ~6.3GB)

Quick Start

Build

docker build -t deepseek-ocr2 .

Run

docker run --gpus all -p 8000:8000 \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  deepseek-ocr2

API Usage

List Models

curl http://localhost:8000/v1/models

OCR with Layout Detection

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-OCR-2",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,'$(base64 -w0 image.jpg)'"}},
        {"type": "text", "text": "<|grounding|>Convert the document to markdown."}
      ]
    }],
    "max_tokens": 8192
  }'

OCR without Layout (Text Only)

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-OCR-2",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,'$(base64 -w0 image.jpg)'"}},
        {"type": "text", "text": "Free OCR."}
      ]
    }],
    "max_tokens": 8192
  }'

Streaming

Add "stream": true to the request body for streaming responses.

Environment Variables

Variable	Default	Description
`GPU_MEMORY_UTILIZATION`	`0.90`	GPU memory fraction to use
`MAX_MODEL_LEN`	`8192`	Maximum sequence length (input + output)
`TENSOR_PARALLEL_SIZE`	`1`	Number of GPUs for tensor parallelism

Token Limits

Total context window: 8192 tokens (input + output combined)
Visual tokens: up to 1120 tokens per image ((0-6)×144 + 256)
Default max_tokens: 8192 (will be limited by remaining context after input)
Practical output limit: ~7000 tokens for typical images

Prompts

Prompt	Description
`<\|grounding\|>Convert the document to markdown.`	OCR with layout detection (bounding boxes)
`Free OCR.`	Plain text extraction without layout

Health Check

curl http://localhost:8000/health

Notes

First request will be slow as the model loads (~30-40 seconds)
Model weights are cached in ~/.cache/huggingface
Supports base64-encoded JPEG/PNG images
Returns markdown with HTML tables for tabular content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepSeek-OCR-2 OpenAI-compatible API Server

Files

Why `openai_server.py`?

Requirements

Quick Start

Build

Run

API Usage

List Models

OCR with Layout Detection

OCR without Layout (Text Only)

Streaming

Environment Variables

Token Limits

Prompts

Health Check

Notes

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Dockerfile		Dockerfile
README.md		README.md
openai_server.py		openai_server.py

Folders and files

Latest commit

History

Repository files navigation

DeepSeek-OCR-2 OpenAI-compatible API Server

Files

Why openai_server.py?

Requirements

Quick Start

Build

Run

API Usage

List Models

OCR with Layout Detection

OCR without Layout (Text Only)

Streaming

Environment Variables

Token Limits

Prompts

Health Check

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Why `openai_server.py`?

Packages