Not able to read multiple pages using pages='1,2,3' or pages='all' #245

ghost · 2019-01-08T06:37:15Z

When I am reading a pdf filw with multiple pages, I am only getting first page by default.

When I am using this code read_pdf(sample.pdf, pages='1-2') or read_pdf(sample.pdf, pages='all') or read_pdf(sample.pdf, pages='1,2') I am getting following error.

AttributeError: 'PDFHandler' object has no attribute 'password'

Can you please fix this issue

vinayak-mehta · 2019-01-08T08:42:11Z

@kallol-oto Thanks for reporting, looking into it.

vinayak-mehta · 2019-01-09T17:12:17Z

@kallol-oto Which version are you using? I'm able to extract tables using both pages='1-2' and pages='all'. I can reproduce the bug if you can post a link to the PDF you're using.

ghost · 2019-01-10T12:42:00Z

@vinayak-mehta I am using python 3 in jupyter notebook. I installed this package using pip3 install camelot-py.
I am reading bank statement of hdfc bank
harsh1.pdf

vinayak-mehta · 2019-01-10T14:33:23Z

@kallol-oto Thanks for the info, I was able to reproduce the error, created a PR.

ghost · 2019-01-11T12:03:44Z

@vinayak-mehta Thanks, Let me know when it moves to production

vinayak-mehta · 2019-01-22T07:13:39Z

@kallol-oto This has been fixed in v0.7.2.

aperna1 · 2019-02-09T20:41:02Z

@vinayak-mehta I was getting the same error, but when I tried to upgrade to v0.7.2 using pip install --upgrade camelot I started getting this error: AttributeError: 'module' object has no attribute 'read_pdf'

EDIT: I followed the directions you had here (#142) and that solved the read_pdf error, but I'm still unable to read tables on any other page other than the first page. Has v0.7.2 been released yet?

vinayak-mehta · 2019-02-10T10:54:00Z

@aperna1 Try upgrading again, it has been released.

aperna1 · 2019-02-11T00:37:58Z

@vinayak-mehta When I use pip install camelot-py[cv] I still get v0.7.1

richajak · 2019-06-23T07:58:54Z

Still encountered the same problem as of now, I can only read one page at a time, e.g. pages='1' or pages='2'. If I try pages='1-2' or pages='1,2' , or pages='1-end', pages='all', these do not work at all.

srivatsshankar · 2020-02-17T09:58:54Z

This problem persists in version 0.7.3

DenDen047 · 2020-11-09T02:55:19Z

I got the same trouble. In my case, the program could load the second page of my pdf, however it could not extract the table on the second page.

CharlieDixon · 2021-02-12T09:39:19Z

I am getting the same problem with camelot-py 0.8.2. I installed using pip install camelot-py[cv] and have the necessary dependencies installed. Like richajak, when I specify a page range it'll read the table from the first page in that range but won't return the remaining selection. Here's the pdf I'm trying to extract tables from: huaneng2020q1.pdf.

q1 = camelot.read_pdf('huaneng2020q1.pdf', pages = '2,3,4,5')
q1 = q1[0].df  # returns dataframe of first table (on pg 2)

KornbotDevUltimatorKraton · 2021-06-30T15:55:12Z

Hello, I'm here in 2021 the problem still unable to solve can anyone give me the answer to how to solve this problem in Camelot? Even if I'm using the older version still unable to solve this.

vinayak-mehta · 2021-07-11T17:23:15Z

Still encountered the same problem as of now, I can only read one page at a time, e.g. pages='1' or pages='2'. If I try pages='1-2' or pages='1,2' , or pages='1-end', pages='all', these do not work at all.

A pdf and output logs would help debug this in that case. It's possible that camelot is not able to find tables on some of those pages. You can try out some settings from here: https://camelot-py.readthedocs.io/en/master/user/advanced.html to extract tables from those pages.

I am getting the same problem with camelot-py 0.8.2. I installed using pip install camelot-py[cv] and have the necessary dependencies installed. Like richajak, when I specify a page range it'll read the table from the first page in that range but won't return the remaining selection. Here's the pdf I'm trying to extract tables from: huaneng2020q1.pdf.

@CharlieDixon I was able to extract tables on the specified pages from the PDF you attached:

vinayak-mehta mentioned this issue Jan 8, 2019

Unable to extract tables using Camelot from page 1 and 2 of the attached pdf statement of the bank #242

Closed

vinayak-mehta added the bug label Jan 8, 2019

vinayak-mehta mentioned this issue Jan 10, 2019

[MRG] Fix AttributeError for encrypted files #251

Closed

yatintaluja closed this as completed in 6c4b468 Jan 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not able to read multiple pages using pages='1,2,3' or pages='all' #245

Not able to read multiple pages using pages='1,2,3' or pages='all' #245

ghost commented Jan 8, 2019

vinayak-mehta commented Jan 8, 2019

vinayak-mehta commented Jan 9, 2019

ghost commented Jan 10, 2019 •

edited by ghost

Loading

vinayak-mehta commented Jan 10, 2019

ghost commented Jan 11, 2019

vinayak-mehta commented Jan 22, 2019

aperna1 commented Feb 9, 2019 •

edited

Loading

vinayak-mehta commented Feb 10, 2019

aperna1 commented Feb 11, 2019

richajak commented Jun 23, 2019

srivatsshankar commented Feb 17, 2020 •

edited

Loading

DenDen047 commented Nov 9, 2020

CharlieDixon commented Feb 12, 2021

KornbotDevUltimatorKraton commented Jun 30, 2021

vinayak-mehta commented Jul 11, 2021

Not able to read multiple pages using pages='1,2,3' or pages='all' #245

Not able to read multiple pages using pages='1,2,3' or pages='all' #245

Comments

ghost commented Jan 8, 2019

vinayak-mehta commented Jan 8, 2019

vinayak-mehta commented Jan 9, 2019

ghost commented Jan 10, 2019 • edited by ghost Loading

vinayak-mehta commented Jan 10, 2019

ghost commented Jan 11, 2019

vinayak-mehta commented Jan 22, 2019

aperna1 commented Feb 9, 2019 • edited Loading

vinayak-mehta commented Feb 10, 2019

aperna1 commented Feb 11, 2019

richajak commented Jun 23, 2019

srivatsshankar commented Feb 17, 2020 • edited Loading

DenDen047 commented Nov 9, 2020

CharlieDixon commented Feb 12, 2021

KornbotDevUltimatorKraton commented Jun 30, 2021

vinayak-mehta commented Jul 11, 2021

ghost commented Jan 10, 2019 •

edited by ghost

Loading

aperna1 commented Feb 9, 2019 •

edited

Loading

srivatsshankar commented Feb 17, 2020 •

edited

Loading