Skip to content

hisham-stack/docling-ui

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Docling UI

A local Streamlit GUI that exposes the full power of Docling through a clean, sidebar-driven interface — no CLI required.


Overview

Docling UI wraps IBM Research's Docling document-intelligence library with a point-and-click frontend. Upload a file, tune every pipeline knob, click Convert, and get structured output (Markdown, JSON, HTML, plain text, or document tokens) ready to download — all without touching the command line.

Built for personal productivity on WSL/Linux; runs entirely on localhost with no external API calls.


Features

Category Capability
Input PDF, DOCX, PPTX, HTML, PNG, JPG, JPEG, TIFF, BMP
Pipelines StandardPdfPipeline (full ML) · SimplePipeline (lightweight)
OCR RapidOCR · Tesseract · force full-page mode · confidence threshold
Tables TableFormer fast or accurate mode
Enrichments Code · Formula/MathML · image extraction
Page selection All pages or a custom range (e.g. 1-3, 5, 7)
Output formats Markdown · JSON · Plain Text · HTML · Document Tokens
Export One-click download button for every format
Observability Real-time conversion log viewer (captures the docling logger)

Requirements

  • Python 3.10 or later
  • System package: tesseract-ocr (only required when using the Tesseract OCR engine)

All Python dependencies are declared in requirements.txt and installed by the setup commands below.


Setup

# Clone the repository (or navigate to the project directory)
cd docling-ui

# Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install Python dependencies
# Note: docling pulls in PyTorch and HuggingFace models (~3–5 GB on first run)
pip install -r requirements.txt

# (Optional) Install the Tesseract binary for the Tesseract OCR engine
sudo apt update && sudo apt install -y tesseract-ocr

First-run note: Docling downloads ML models (layout detection, TableFormer, etc.) on first use and caches them locally. Subsequent runs are significantly faster.


Running

source .venv/bin/activate   # if not already active
streamlit run app.py

Open http://localhost:8501 in your browser.


Usage

  1. Upload a document via the sidebar file uploader. File name, size, and detected type are shown immediately.
  2. Configure the pipeline in the sidebar:
    • Choose Standard (full ML, best quality) or Simple (fast, no ML).
    • Enable/disable OCR and select the engine.
    • Toggle table recognition and pick fast vs. accurate mode.
    • Activate code, formula, or image enrichments as needed.
    • Optionally restrict conversion to a custom page range.
  3. Select an output format (Markdown, JSON, Plain Text, HTML, or Document Tokens).
  4. Click Convert.
  5. The Preview tab renders the output in the appropriate viewer; the Logs tab shows the full Docling log for the run.
  6. Use the Download button to save the result, or copy from the code block.

Project Structure

docling-ui/
├── app.py            # Streamlit application (single file)
├── requirements.txt  # Python dependencies
└── README.md

app.py is organised into clearly separated layers:

  • UILogHandler — custom logging.Handler that captures the docling logger into a StringIO buffer so logs appear in the UI rather than only the terminal.
  • ConversionSettings — typed dataclass holding every sidebar value; passed as a single argument to convert_document().
  • convert_document() — pure conversion function; writes a temp file, builds DocumentConverter with the configured options, runs conversion, and returns a plain dict.
  • render_sidebar() — all 9 control sections; returns the uploaded file, settings, and the button state.
  • main() — orchestrates rendering, conversion trigger, error display, and tab switching.

Dependency Notes

Package Purpose
streamlit Web UI framework
docling Document conversion engine
docling-core Core data models
rapidocr-onnxruntime RapidOCR engine (Python-only, no system binary needed)
pytesseract Tesseract Python wrapper (requires system tesseract-ocr)
Pillow Image handling
watchdog Streamlit file-watcher backend

Troubleshooting

Tesseract not found

sudo apt install tesseract-ocr

RapidOCR import error

pip install rapidocr-onnxruntime

Out of memory on large PDFs Switch to SimplePipeline in the sidebar, or narrow the page range.

export_to_document_tokens not available

pip install --upgrade docling

License

MIT — see LICENSE.

About

A local Streamlit-based GUI for Docling, providing full control over all conversion pipeline settings including OCR, table recognition, and document enrichment — built for personal productivity on WSL/Linux.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages