vlm后端解析中文效果很差是什么原因 #4033
Unanswered
Xiaochaomeng
asked this question in
Q&A
Replies: 1 comment
-
|
Hi @Xiaochaomeng! I'm Dosu and I’m helping the MinerU team. VLM后端解析中文效果差,layout.pdf中中文大段落没有被框起来,主要原因是布局检测模型(如DocLayoutYOLO/YOLOv10)是纯视觉的,不区分语言,且对复杂中文排版(如多栏、长段落、密集文本)分块能力有限,容易漏检或合并过大区域。英文文档布局通常更简单,模型分块效果更好相关讨论。 此外,OCR环节虽然支持中文(如PaddleOCR的ch/ch_lite),但如果布局检测阶段没有正确框选中文段落,后续OCR也无法补救。部分模型在中文行检测时span高度过大,导致串行和框选错误相关案例。缺失字体(如fonts-noto-cjk)或PDF字体编码异常也会影响分块和识别效果FAQ说明。 建议:可以尝试pipeline后端( To reply, just mention @dosu. Share context across your team and agents. Try Dosu. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
layout.pdf很多大段落的中文没有被框起来,但英文的layout.pdf没有问题
Beta Was this translation helpful? Give feedback.
All reactions