Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to read multiple pages using pages='1,2,3' or pages='all' #245

Closed
ghost opened this issue Jan 8, 2019 · 15 comments
Closed

Not able to read multiple pages using pages='1,2,3' or pages='all' #245

ghost opened this issue Jan 8, 2019 · 15 comments
Labels

Comments

@ghost
Copy link

ghost commented Jan 8, 2019

When I am reading a pdf filw with multiple pages, I am only getting first page by default.

When I am using this code read_pdf(sample.pdf, pages='1-2') or read_pdf(sample.pdf, pages='all') or read_pdf(sample.pdf, pages='1,2') I am getting following error.

AttributeError: 'PDFHandler' object has no attribute 'password'

Can you please fix this issue

@vinayak-mehta
Copy link
Contributor

@kallol-oto Thanks for reporting, looking into it.

@vinayak-mehta
Copy link
Contributor

@kallol-oto Which version are you using? I'm able to extract tables using both pages='1-2' and pages='all'. I can reproduce the bug if you can post a link to the PDF you're using.

@ghost
Copy link
Author

ghost commented Jan 10, 2019

@vinayak-mehta I am using python 3 in jupyter notebook. I installed this package using pip3 install camelot-py.
I am reading bank statement of hdfc bank
harsh1.pdf

@vinayak-mehta
Copy link
Contributor

@kallol-oto Thanks for the info, I was able to reproduce the error, created a PR.

@ghost
Copy link
Author

ghost commented Jan 11, 2019

@vinayak-mehta Thanks, Let me know when it moves to production

@vinayak-mehta
Copy link
Contributor

@kallol-oto This has been fixed in v0.7.2.

@aperna1
Copy link

aperna1 commented Feb 9, 2019

@vinayak-mehta I was getting the same error, but when I tried to upgrade to v0.7.2 using pip install --upgrade camelot I started getting this error: AttributeError: 'module' object has no attribute 'read_pdf'

EDIT: I followed the directions you had here (#142) and that solved the read_pdf error, but I'm still unable to read tables on any other page other than the first page. Has v0.7.2 been released yet?

@vinayak-mehta
Copy link
Contributor

@aperna1 Try upgrading again, it has been released.

@aperna1
Copy link

aperna1 commented Feb 11, 2019

@vinayak-mehta When I use pip install camelot-py[cv] I still get v0.7.1

@richajak
Copy link

Still encountered the same problem as of now, I can only read one page at a time, e.g. pages='1' or pages='2'. If I try pages='1-2' or pages='1,2' , or pages='1-end', pages='all', these do not work at all.

@srivatsshankar
Copy link

srivatsshankar commented Feb 17, 2020

This problem persists in version 0.7.3

@DenDen047
Copy link

I got the same trouble. In my case, the program could load the second page of my pdf, however it could not extract the table on the second page.

@CharlieDixon
Copy link

I am getting the same problem with camelot-py 0.8.2. I installed using pip install camelot-py[cv] and have the necessary dependencies installed. Like richajak, when I specify a page range it'll read the table from the first page in that range but won't return the remaining selection. Here's the pdf I'm trying to extract tables from: huaneng2020q1.pdf.

q1 = camelot.read_pdf('huaneng2020q1.pdf', pages = '2,3,4,5')
q1 = q1[0].df  # returns dataframe of first table (on pg 2)

@KornbotDevUltimatorKraton

Hello, I'm here in 2021 the problem still unable to solve can anyone give me the answer to how to solve this problem in Camelot? Even if I'm using the older version still unable to solve this.

@vinayak-mehta
Copy link
Contributor

Still encountered the same problem as of now, I can only read one page at a time, e.g. pages='1' or pages='2'. If I try pages='1-2' or pages='1,2' , or pages='1-end', pages='all', these do not work at all.

A pdf and output logs would help debug this in that case. It's possible that camelot is not able to find tables on some of those pages. You can try out some settings from here: https://camelot-py.readthedocs.io/en/master/user/advanced.html to extract tables from those pages.

I am getting the same problem with camelot-py 0.8.2. I installed using pip install camelot-py[cv] and have the necessary dependencies installed. Like richajak, when I specify a page range it'll read the table from the first page in that range but won't return the remaining selection. Here's the pdf I'm trying to extract tables from: huaneng2020q1.pdf.

@CharlieDixon I was able to extract tables on the specified pages from the PDF you attached:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants