Multiple table in 1 page #22

leeper · 2016-09-22T12:12:49Z

Migrated from ropensci/tabulizerjars#1 (@khun84)

Is there param that I can parse in to extract more than 1 table per page?

I have a pdf page with 2 tables:

table 1 is 2 columns and multiple rows
table 2 has 2 columns and multiple rows, but some of the cells are merged).

I use the extract_table() function with default param and the output only has 1 table (table 1).

What I can think of is to set method = 'asis' but I do not know to proceed with the output java object. Is there any documentation I can refer to?

The text was updated successfully, but these errors were encountered:

leeper · 2016-09-22T12:14:53Z

@khun84 Yes, you can specify the page number twice, along with the area (or use the extract_areas() function to specify those areas interactively).

So something like extract_areas(file, pages = c(1,1)). This will give you the chance to extract two different areas from a given page.

You can pursue the Java approach, but it's really only useful if you know the underlying tabula Java library well; and that is not very well documented anywhere.

khun84 · 2016-09-22T17:26:22Z

thanks for the clarification...ive tried with extract_areas(file, c(1, 1)) but it return the same table twice. If I have to explicitly define the area for both tables, then my code will break when the position of the tables change.

Is there any function that can return the entire content of the pdf in a DOM like format? In that case, I can traverse the DOM tree and extract what I want.

SteveLane · 2016-12-21T04:05:25Z

Hi @leeper - I've recently run into similar issues, but with multi-page documents and a random number of tables per page, I found that the 'spreadsheet' method on the command line and/or via Tabula's interface will drag them out. The write_csv function spills them all out correctly (at least in the cases I've tested), but the list_matrices function doesn't.

I've edited the list_matrices function if you're happy for a pull request?

leeper · 2016-12-21T07:22:59Z

Yes, please send a PR!

leeper added the question label Sep 22, 2016

leeper mentioned this issue Sep 22, 2016

Multiple table in 1 page ropensci/tabulizerjars#1

Closed

psychemedia mentioned this issue Sep 10, 2017

Options for Lattice / Stream Mode ? #65

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple table in 1 page #22

Multiple table in 1 page #22

leeper commented Sep 22, 2016

leeper commented Sep 22, 2016

khun84 commented Sep 22, 2016

SteveLane commented Dec 21, 2016

leeper commented Dec 21, 2016

Multiple table in 1 page #22

Multiple table in 1 page #22

Comments

leeper commented Sep 22, 2016

leeper commented Sep 22, 2016

khun84 commented Sep 22, 2016

SteveLane commented Dec 21, 2016

leeper commented Dec 21, 2016