A Go docling client library and CLI. Docling is a deep learning document analysis and conversion project, which can also be run as service. This project helps to decouple the document processing, which may benefit from a GPU, from the client, which may be a lower spec machine.
$ go install github.com/miku/doclingclient/cmd/docli@latestPackages (deb, rpm), cf. releases. Quick start:
$ docli --server http://docling.city:5001 convert https://arxiv.org/pdf/2110.06595Docling serve supplies an openapi spec, currently using version 3.1.0 of the standard.
$ jq -rc '.paths | keys[]' openapi.json
/health
/openapi-3.0.json
/ready
/v1/chunk/hierarchical/file
/v1/chunk/hierarchical/file/async
/v1/chunk/hierarchical/source
/v1/chunk/hierarchical/source/async
/v1/chunk/hybrid/file
/v1/chunk/hybrid/file/async
/v1/chunk/hybrid/source
/v1/chunk/hybrid/source/async
/v1/clear/converters
/v1/clear/results
/v1/convert/file
/v1/convert/file/async
/v1/convert/source
/v1/convert/source/async
/v1/memory/counts
/v1/memory/stats
/v1/result/{task_id}
/v1/status/poll/{task_id}
/version
Unfortunately, an SDK generated from a spec can be quite large and may have downsides; cf. also this comparison.
Hence, we decided to use a more manual approach. We use an LLM to build a simple, mostly idiomatic client for the core functionality first. For docling this may be just "/v1/convert/file" and "/v1/convert/source" - this would already serve most use cases.
Create a minimal Go library, then wrap a nice CLI around the library, so interacting with the docling service becomes easy to integrate into shell scripts or ad-hoc human (and maybe agentic) terminal use.
Status: Library and CLI cover synchronous conversion
(/v1/convert/{source,file}), synchronous chunking
(/v1/chunk/{hybrid,hierarchical}/{source,file}), and the /health, /ready,
and /version routes. Async conversion and async chunking are not yet wrapped.
Requirements: Go 1.24+. A running docling-serve instance (defaults to http://localhost:5001).
import "github.com/miku/doclingclient"
c := doclingclient.New("http://localhost:5001",
doclingclient.WithAPIKey("sk-..."),
doclingclient.WithTimeout(10*time.Minute),
)
// Convert a URL.
resp, err := c.ConvertURL(ctx, "https://arxiv.org/pdf/2206.01062", nil)
// Convert a local file (streamed multipart upload).
resp, err := c.ConvertPath(ctx, "paper.pdf", &doclingclient.Options{
ToFormats: []doclingclient.OutputFormat{
doclingclient.FormatMD,
doclingclient.FormatJSON},
DoOCR: doclingclient.Ptr(true),
Pipeline: doclingclient.PipelineStandard,
})
// A 200 response can still describe a conversion failure — check it.
if err := resp.Err(false); err != nil {
log.Fatal(err)
}
fmt.Println(resp.Document.MDContent)The library covers /v1/convert/source (URL or base64 in-body), /v1/convert/file
(streamed multipart upload), and the /health, /ready, /version routes.
For full coverage of ConvertDocumentsOptions, the struct in types.go is a
deliberate subset — extend it as needed.
Note on output formats: the docling-serve OutputFormat enum also defines
yaml, html_split_page, and vtt, but the ExportDocumentResponse object
does not carry corresponding content fields, so this library and CLI do not
surface them. The five exposed formats (md, json, html, text,
doctags) match what the server actually returns.
A minimal command, docli, wraps the library. It is named to avoid collision
with the upstream docling CLI.
go install github.com/miku/doclingclient/cmd/docli@latest
# Convert a URL (default output: markdown to stdout).
docli convert https://arxiv.org/pdf/2206.01062 > paper.md
# Convert a local file as JSON.
docli convert --to json paper.pdf > paper.json
# Produce several formats at once and write them to a directory.
docli convert --to md,json,html --output ./out paper.pdf
# => ./out/paper.md, ./out/paper.json, ./out/paper.html
# Talk to a remote docling-serve, with auth.
DOCLING_SERVER=https://docling.example.org \
DOCLING_API_KEY=sk-... \
docli convert paper.pdf
# Server checks.
docli health
docli ready
docli versiondocli chunk converts a document and splits it into chunks suitable for
feeding into an embedding model. Output is JSONL on stdout — one chunk per
line — which composes naturally with jq.
# Default hybrid chunker (tokenization-aware).
docli chunk paper.pdf > chunks.jsonl
# Pick a tokenizer and cap chunks to 512 tokens.
docli chunk --max-tokens 512 \
--tokenizer Qwen/Qwen3-Embedding-0.6B \
paper.pdf > chunks.jsonl
# Structural chunks (one per document element, no tokenizer).
docli chunk --chunker hierarchical paper.pdf > chunks.jsonl
# Inspect chunk lengths.
jq -r '.num_tokens // (.text | length)' < chunks.jsonl | sort -n | uniq -cEach chunk carries text (with headings/captions inlined for context),
optional raw_text (with --include-raw-text), num_tokens, headings,
captions, page_numbers, and doc_items references into the source
document.
The hybrid chunker counts tokens to keep each chunk within a budget. That budget is meaningful only relative to a specific tokenizer — and you almost always want the tokenizer to match the embedding model you'll feed the chunks into downstream, so chunk sizes line up with the embedder's context window.
docling-serve accepts any HuggingFace tokenizer identifier as --tokenizer
(OpenAI/tiktoken tokenizers are not reachable through the server). The default
is sentence-transformers/all-MiniLM-L6-v2. If you don't pass --max-tokens,
the cap is derived from the tokenizer's model_max_length.
A few common picks, biased toward what shows up in docling's own examples and typical RAG stacks:
| Tokenizer (HuggingFace ID) | Max tokens | Notes |
|---|---|---|
sentence-transformers/all-MiniLM-L6-v2 |
256 | Default. Tiny, fast, English-only. Good baseline. |
sentence-transformers/all-mpnet-base-v2 |
384 | Higher-quality English embeddings, still small. |
BAAI/bge-small-en-v1.5 |
512 | Strong small English model, widely used in RAG. |
BAAI/bge-m3 |
8192 | Multilingual, long-context. Good general-purpose pick. |
intfloat/multilingual-e5-large |
512 | Multilingual, balanced quality/size. |
nomic-ai/nomic-embed-text-v1.5 |
8192 | Long-context English. |
Qwen/Qwen3-Embedding-0.6B |
32768 | Long-context, multilingual, newer. |
Rule of thumb: pick the tokenizer that ships with the embedding model you
plan to call after docli chunk. Mixing them silently misaligns the token
count and leads to chunks that overflow (or underfill) the real embedder.
The server needs to fetch the tokenizer the first time it sees it. In air-gapped deployments only models already cached on the server will work.
These flags tune the underlying document conversion. They apply identically
to docli convert and docli chunk. Numeric and boolean defaults marked
(auto) are sent only when you set them explicitly, so docling-serve's own
defaults stay authoritative on bare invocations.
| Flag | Default | Description |
|---|---|---|
--from |
(auto) | Input formats, e.g. pdf,docx; server autodetects if empty. |
--ocr |
true |
Enable OCR. |
--force-ocr |
false |
Force OCR over existing text. |
--ocr-lang |
(auto) | Comma-separated OCR languages, e.g. en,de. |
--table-mode |
(auto) | fast or accurate; server default if empty. |
--tables |
(auto) | Extract table structure. Sent only when explicitly set. |
--pages |
(all) | Page range, e.g. 1-10 or 3. |
--image-export-mode |
(auto) | placeholder, embedded, or referenced. Server default if empty. |
--include-images |
(auto) | Include extracted images. Sent only when explicitly set. |
--images-scale |
(auto) | Scale factor for extracted images (server default ~2.0). |
--abort-on-error |
false |
Abort on first error. Sent only when explicitly set. |
--document-timeout |
(none) | Per-document timeout in seconds. |
--pdf-backend |
(auto) | pypdfium2, docling_parse, dlparse_v1, dlparse_v2, dlparse_v4. |
--pipeline |
(auto) | legacy, standard, vlm, or asr. Server default if empty. |
| Flag | Default | Description |
|---|---|---|
--to, -t |
md |
Output formats: md, json, html, text, doctags. |
--output, -o |
(none) | Directory to write all requested formats as <basename>.<ext>; stdout is silent. |
--status |
false |
Emit one status line/object to stderr after the conversion. |
--status-format |
text |
text or json (see Caching below). |
--cache-dir |
(XDG) | Override the on-disk cache directory. Env: DOCLING_CACHE_DIR. |
--no-cache |
false |
Disable the on-disk result cache. |
| Flag | Default | Description |
|---|---|---|
--chunker |
hybrid |
Chunker strategy: hybrid or hierarchical. |
--max-tokens |
(auto) | Hybrid only. Max tokens per chunk; derived from the tokenizer if unset. |
--tokenizer |
sentence-transformers/all-MiniLM-L6-v2 |
Hybrid only. HuggingFace tokenizer ID. See "Tokenizer choice" above. |
--merge-peers |
true |
Hybrid only. Merge undersized successive chunks with the same headings. |
--markdown-tables |
false |
Serialize tables as Markdown instead of triplets. |
--include-raw-text |
false |
Populate raw_text on each chunk alongside the contextualized text. |
--pretty |
false |
Emit the full response as indented JSON instead of one chunk per line. |
Note: docli chunk does not cache results; each invocation re-runs the
conversion server-side. Only docli convert uses the on-disk cache.
Global flags (any subcommand): --server/-s (env DOCLING_SERVER),
--api-key/-K (env DOCLING_API_KEY), --tenant/-T (env
DOCLING_TENANT_ID).
docli convert caches results on disk by default, so repeating a request is
near-instant. The cache uses the XDG spec, typically
~/.cache/doclingclient/, overridable with --cache-dir or
DOCLING_CACHE_DIR. Disable with --no-cache.
Layout:
~/.cache/doclingclient/
├── server_version.json # /version response, refreshed every 24 h
└── <12-char-server-hash>/
├── server_info.json # full server version map for this namespace
└── <input-hash>.json.zst # zstd-compressed ConvertResponse JSON
Cache key fingerprints everything that affects output: source URL or local
file content (SHA-256), to_formats, OCR settings, table mode, page range,
etc. The server-version directory namespaces cached results, so an upstream
docling-serve upgrade naturally falls into a fresh namespace — old results
stay around for diffing or can be pruned with rm -rf ~/.cache/doclingclient/<hash>/.
Use --status to see whether a request was served fresh or from cache:
$ docli convert --status paper.pdf > /dev/null
status=success processing_time=12.43s source=fresh
$ docli convert --status paper.pdf > /dev/null
status=success processing_time=12.43s source=cachedFor ad-hoc post-processing, add --status-format json to emit a single JSON
object per run to stderr (one line, suitable for jq or appending to a log):
$ docli convert --status --status-format json paper.pdf > paper.md
{"status":"success","processing_time":12.43,"source":"fresh","filename":"paper.pdf","errors":[]}
$ docli convert --status --status-format json paper.pdf 2> status.jsonl > paper.md
$ jq -r '.processing_time' < status.jsonl
12.43go test ./...
go test -cover ./...The library exercises its HTTP client against httptest.Server; no live
docling-serve instance is required.
OpenAPI was very helpful to get this client started, in that the LLM could inquire the openapi.json file for the spec. However, we did not need to use any of the openapi generators, of which there are quite a few. A more systematic comparison of features of various libraries is still outstanding, but you could see an LLM + Prompt + openapi.json based client SDK generator.
