Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically choose flavor based on type of table in PDF #211

Closed
vinayak-mehta opened this issue Dec 1, 2018 · 4 comments
Closed

Automatically choose flavor based on type of table in PDF #211

vinayak-mehta opened this issue Dec 1, 2018 · 4 comments
Assignees

Comments

@vinayak-mehta
Copy link
Contributor

vinayak-mehta commented Dec 1, 2018

Continuing the conversation from #102.

imri:

When you say that lattice should work perfectly - I sort of wish to create a generic way to detect and extract tables without having to know which detection method (lattice / stream) is best for a given document - I want to decouple them as much as possible.

vinayak-mehta:

I get your use-case and it is not possible currently through the library itself. But I see two possibilities which can be implemented (both heuristics):

  1. As far as I can tell from NurminenDetectionAlgorithm.java, Tabula first filters out all Lattice-type tables from the document and then looks for Stream-type tables, till it cannot find any more tables. Similarly, we can "couple" both flavors into a single one inside Camelot.
  2. We can create a flavor called guess which automatically chooses between Lattice and Stream.
@vinayak-mehta vinayak-mehta self-assigned this Dec 1, 2018
@vinayak-mehta vinayak-mehta added this to the v0.7.0 milestone Dec 2, 2018
@vinayak-mehta
Copy link
Contributor Author

Needs more discussion, removing from this release.

@vinayak-mehta vinayak-mehta removed this from the v0.7.0 milestone Jan 2, 2019
@vinayak-mehta
Copy link
Contributor Author

Moved to #19.

@sanjayishah
Copy link

runtime error make sure Ghostscript is installed.
line 229 in _gsprint.py
but its installed.
why this error.

@dpranav1988
Copy link

Hi @vinayak-mehta ,

Could you please let me know how i can auto detect when to use lattice and stream. I dont want to check manually rather need an automated way to identify

Thanks in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants