OCR-LLM Model Hub

Curates the latest text-vision large-model (LLM for OCR) ecosystem information to simplify capability comparison, track open/closed-source roadmaps, and accelerate experimentation.

Overview

Curates state-of-the-art OCR-centric multimodal LLMs, from academic labs to commercial providers.
Provides one-click access to official weights, evaluation reports, and papers for each model.
Documents the in-house benchmark snapshot so results are reproducible and comparable.
Ideal for research, engineering, and product teams that need to locate OCR LLM solutions quickly.

Repository Layout

README.md – the living catalog you are reading; updated as new OCR LLMs are released.
img/benchmark.png – benchmark snapshot referenced below (replace with new figures when re-running evaluations).

Model Catalog

Model	Project / Weights	Paper / Notes	Affiliation
GOT-OCR 2.0	https://github.com/Ucas-HaoranWei/GOT-OCR2.0	https://arxiv.org/abs/2409.01704	StepFun
MonkeyOCR	https://huggingface.co/echo840/MonkeyOCR	https://arxiv.org/pdf/2506.05218	HUST & Kingsoft Office
SmolDocling	https://huggingface.co/ds4sd/SmolDocling-256M-preview	https://arxiv.org/pdf/2503.11576	IBM Research & Hugging Face
olmOCR	https://huggingface.co/allenai/olmOCR-7B-0825	https://olmocr.allenai.org/papers/olmocr.pdf	Allen Institute for AI
OCRFlux	https://huggingface.co/ChatDOC/OCRFlux-3B	- (model card)	ChatDOC Team
dots.ocr	https://huggingface.co/rednote-hilab/dots.ocr	-	Rednote HiLab
DeepSeekOCR	https://github.com/deepseek-ai/DeepSeek-OCR	https://arxiv.org/abs/2510.18234	DeepSeek-AI
Nanonets-OCR (DocStrange)	https://github.com/NanoNets/docstrange	-	Nanonets
MinerU 2.5	https://opendatalab.github.io/MinerU/ / https://github.com/opendatalab/MinerU	https://arxiv.org/pdf/2509.22186	Shanghai AI Laboratory
PaddleOCR-VL	https://www.paddleocr.ai/latest/version3.x/algorithm/PaddleOCR-VL/PaddleOCR-VL.html	https://arxiv.org/pdf/2510.14528	Baidu
Mistral OCR	https://mistral.ai/news/mistral-ocr	- (product brief)	Mistral AI

Models	Project / Weights	Paper / Notes	Affiliation
Qwen3-VL	https://github.com/QwenLM/Qwen3-VL	- (technical report comming soon)	Tongyi Team / Alibaba
Gemini 2.5 Pro	https://gemini.google.com/app	- (service docs)	Google DeepMind
GPT-4o	https://chatgpt.com	- (OpenAI system card)	OpenAI

Descriptions: All listed models focus on end-to-end OCR, document understanding, or multimodal visual reasoning. Some (e.g., GPT-4o, Gemini 2.5 Pro) are closed-source APIs but remain key baselines.

Benchmark Snapshot

Acknowledgements

This catalog aggregates public information from the respective research groups and vendors. Please cite the original papers or product pages when using the models in academic or commercial projects.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
img		img
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OCR-LLM Model Hub

Overview

Repository Layout

Model Catalog

Benchmark Snapshot

Acknowledgements

About

Uh oh!

Releases

Packages

polarisZhao/OCR-LLM

Folders and files

Latest commit

History

Repository files navigation

OCR-LLM Model Hub

Overview

Repository Layout

Model Catalog

Benchmark Snapshot

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages