Skip to content
This repository has been archived by the owner on Apr 15, 2024. It is now read-only.

The parsing sequence of tables is messy, sometimes horizontal, and sometimes vertical #194

Open
YellowDong opened this issue Sep 18, 2017 · 3 comments

Comments

@YellowDong
Copy link

The table text to be parsed is shown in the figure below:
1505725201 1

The following picture is a screenshot of me using pdfminer to parse out the form part:
1505725266 1

@YellowDong
Copy link
Author

@euske

@cadu-leite
Copy link

The problem is how the PDF was built or written. Some documents are built using for instance 2 columns pages, and this is the behavior desirable -

@984958198
Copy link

你这个需要另行处理,有识别线条的函数 你可以根据线条坐标 把字分出来

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants