Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Options for Lattice / Stream Mode ? #65

Closed
psychemedia opened this issue Sep 10, 2017 · 8 comments
Closed

Options for Lattice / Stream Mode ? #65

psychemedia opened this issue Sep 10, 2017 · 8 comments

Comments

@psychemedia
Copy link

psychemedia commented Sep 10, 2017

The tabulapdf/tabula-java packages and API seems to have moved on compared to ropensci/tabulizer, for example in terms of the lattice and stream options.

Is support for these planned?

I think these options make it easier to extract multiple tables from a single page, eg as queried in #22 ?

@leeper
Copy link
Member

leeper commented Oct 9, 2017

lattice is the new name for spreadsheet. stream is not supported in tabulizer (yet).

@tpaskhalis
Copy link
Contributor

The stream option in Tabula is essentially equivalent to spreadsheet = FALSE in tabulizer. While lattice would be spreadsheet = TRUE. I restructured the code to be consistent with Tabula nomenclature. It's available on the development branch: tabulizer/pdfbox2.0, which works together with the development branch of tabulizerjars/tabula1.0.0. So, something like that would work:

tab <- extract_tables(f, method = "stream")

The only key difference is that the default option decide as in the current version of Tabula applies to each page individually. Which, as you suggested, makes it somewhat easier to extract multiple inconsistent tables from one page.

@leeper
Copy link
Member

leeper commented Apr 6, 2018

@maelle I apparently don't have the ability to add contributors to this repo. Can you add @tpaskhalis, giving him edit/push rights here and on tabulizerjars?

@maelle
Copy link
Member

maelle commented Apr 6, 2018

Ok so I

  • Created a team and invited @tpaskhalis to it. The team has write access to this repo.

  • Made you an admin @leeper

Should the team also get write access to tabulizerjars?

@maelle
Copy link
Member

maelle commented Apr 6, 2018

Oops I should have read better am gonna add the second repo for the team sorry

@tpaskhalis
Copy link
Contributor

Many thanks, @maelle!

@maelle
Copy link
Member

maelle commented Apr 6, 2018

Thank you!

@tpaskhalis
Copy link
Contributor

The options are now available on the master branch (#83). Please, update and reinstall the package. Check ?extract_tables for further details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants