Skip to content

The text content was not recognized #304

@neverlatetolearn0

Description

@neverlatetolearn0

Section / Chapter headers are recognized based on font sizes only. There is no semantic recognition here!

Therefore (your example), if headers have no larger font size than body text, they cannot be detected.

The text content was not recognized either

As shown in the following figure

Image

The "表1 基础指标" was not identified

This is a text, not a title!

pymupdf4llm == 0.0.27

file:

test.pdf

@JorjMcKie

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions