Skip to content

This repository lists major OCR LLMs with project links, papers, affiliations, plus a benchmark chart for quick comparison.

Notifications You must be signed in to change notification settings

polarisZhao/OCR-LLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

OCR-LLM Model Hub

Curates the latest text-vision large-model (LLM for OCR) ecosystem information to simplify capability comparison, track open/closed-source roadmaps, and accelerate experimentation.

Overview

  • Curates state-of-the-art OCR-centric multimodal LLMs, from academic labs to commercial providers.
  • Provides one-click access to official weights, evaluation reports, and papers for each model.
  • Documents the in-house benchmark snapshot so results are reproducible and comparable.
  • Ideal for research, engineering, and product teams that need to locate OCR LLM solutions quickly.

Repository Layout

  • README.md – the living catalog you are reading; updated as new OCR LLMs are released.
  • img/benchmark.png – benchmark snapshot referenced below (replace with new figures when re-running evaluations).

Model Catalog

Model Project / Weights Paper / Notes Affiliation
GOT-OCR 2.0 https://github.com/Ucas-HaoranWei/GOT-OCR2.0 https://arxiv.org/abs/2409.01704 StepFun
MonkeyOCR https://huggingface.co/echo840/MonkeyOCR https://arxiv.org/pdf/2506.05218 HUST & Kingsoft Office
SmolDocling https://huggingface.co/ds4sd/SmolDocling-256M-preview https://arxiv.org/pdf/2503.11576 IBM Research & Hugging Face
olmOCR https://huggingface.co/allenai/olmOCR-7B-0825 https://olmocr.allenai.org/papers/olmocr.pdf Allen Institute for AI
OCRFlux https://huggingface.co/ChatDOC/OCRFlux-3B - (model card) ChatDOC Team
dots.ocr https://huggingface.co/rednote-hilab/dots.ocr - Rednote HiLab
DeepSeekOCR https://github.com/deepseek-ai/DeepSeek-OCR https://arxiv.org/abs/2510.18234 DeepSeek-AI
Nanonets-OCR (DocStrange) https://github.com/NanoNets/docstrange - Nanonets
MinerU 2.5 https://opendatalab.github.io/MinerU/ / https://github.com/opendatalab/MinerU https://arxiv.org/pdf/2509.22186 Shanghai AI Laboratory
PaddleOCR-VL https://www.paddleocr.ai/latest/version3.x/algorithm/PaddleOCR-VL/PaddleOCR-VL.html https://arxiv.org/pdf/2510.14528 Baidu
Mistral OCR https://mistral.ai/news/mistral-ocr - (product brief) Mistral AI
Models Project / Weights Paper / Notes Affiliation
Qwen3-VL https://github.com/QwenLM/Qwen3-VL - (technical report comming soon) Tongyi Team / Alibaba
Gemini 2.5 Pro https://gemini.google.com/app - (service docs) Google DeepMind
GPT-4o https://chatgpt.com - (OpenAI system card) OpenAI

Descriptions: All listed models focus on end-to-end OCR, document understanding, or multimodal visual reasoning. Some (e.g., GPT-4o, Gemini 2.5 Pro) are closed-source APIs but remain key baselines.

Benchmark Snapshot

Acknowledgements

This catalog aggregates public information from the respective research groups and vendors. Please cite the original papers or product pages when using the models in academic or commercial projects.

About

This repository lists major OCR LLMs with project links, papers, affiliations, plus a benchmark chart for quick comparison.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published