Skip to content
This repository has been archived by the owner on Apr 15, 2024. It is now read-only.

scanned pdf or native pdf #272

Open
longbowking opened this issue Dec 2, 2019 · 2 comments
Open

scanned pdf or native pdf #272

longbowking opened this issue Dec 2, 2019 · 2 comments

Comments

@longbowking
Copy link

Given a pdf file, how to judge whether it is a native pdf or a scanned pdf by using pdfminer,
any suggestions?

@himanshugarg
Copy link

You could extract all text and consider the pdf as "native" if there is too little. Of course this would fail for "native" pdf's that have no text.

@himanshugarg
Copy link

Yes, that's another case where this heuristic will fail.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants