Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3 doesn't support negative seeks #489

Closed
webisteme opened this issue Mar 6, 2019 · 13 comments
Closed

Python 3 doesn't support negative seeks #489

webisteme opened this issue Mar 6, 2019 · 13 comments
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF

Comments

@webisteme
Copy link

webisteme commented Mar 6, 2019

When opening a pdf file in Python 3.7 I get the following error:

>> PdfFileReader(open('/file.pdf'))
>> UnsupportedOperation: can't do nonzero end-relative seeks

The reason is Python 3 no longer supports negative seeks. To fix the issue I had to set the mode to 'rb':

>> PdfFileReader(open('/file.pdf', mode="rb"))

This may be worth noting in the documentation.

@Dakotapa
Copy link

Thank you! This worked

@snehal1989
Copy link

thanks this worked

@basawanayya
Copy link

Thanks

@SheriffNabil
Copy link

Hello,
I can't read a PDF that is in Arabic.
file = PyPDF2.PdfFileReader(open('P07.pdf',mode='rb', encoding="UTF-8"))
I get the following message: binary mode doesn't take an encoding argument
Do you have any idea how to fix this?
Thanks :)

@basawanayya
Copy link

basawanayya commented Apr 27, 2020 via email

@gothill
Copy link

gothill commented Apr 27, 2020

@SherifBassiouni You're opening a file in binary mode, which assumes the file you are opening is in bytes, i.e. not encoded, so try removing the encoding argument.

@SheriffNabil
Copy link

@SherifBassiouni You're opening a file in binary mode, which assumes the file you are opening is in bytes, i.e. not encoded, so try removing the encoding argument.

Thanks for your reply.
I tried your suggestion and got the following error:
UnsupportedOperation: can't do nonzero end-relative seeks

@faisalaltqiqi faisalaltqiqi mentioned this issue Apr 29, 2020
@faisalaltqiqi
Copy link

Hello,
I can't read a PDF that is in Arabic.
file = PyPDF2.PdfFileReader(open('P07.pdf',mode='rb', encoding="UTF-8"))
I get the following message: binary mode doesn't take an encoding argument
Do you have any idea how to fix this?
Thanks :)

Dear
I have the same problem
Did you come up with a solution?

@SheriffNabil
Copy link

Hello,
I can't read a PDF that is in Arabic.
file = PyPDF2.PdfFileReader(open('P07.pdf',mode='rb', encoding="UTF-8"))
I get the following message: binary mode doesn't take an encoding argument
Do you have any idea how to fix this?
Thanks :)

Dear
I have the same problem
Did you come up with a solution?

Hello Faisal,

There is another Python library called (fitz) that allows you to extract text from PDF files.
here is the code:

import fitz
doc = fitz.open("your_file.pdf")
txt = doc.getPageText(0)
txt

The only problem is that it prints the words in reverse order. for example, if your PDF file has the word "فيصل" in it, it will be printed as "لصيف".

to get around this problem you can add the following code:

txt2 = txt.split(" ")
for word in txt2:
print(word)

Good luck

@faisalaltqiqi
Copy link

faisalaltqiqi commented Apr 29, 2020 via email

@faisalaltqiqi
Copy link

Hello,
I can't read a PDF that is in Arabic.
file = PyPDF2.PdfFileReader(open('P07.pdf',mode='rb', encoding="UTF-8"))
I get the following message: binary mode doesn't take an encoding argument
Do you have any idea how to fix this?
Thanks :)

Dear
I have the same problem
Did you come up with a solution?

Hello Faisal,

There is another Python library called (fitz) that allows you to extract text from PDF files. here is the code:

import fitz
doc = fitz.open("your_file.pdf")
txt = doc.getPageText(0)
txt

The only problem is that it prints the words in reverse order. for example, if your PDF file has the word "فيصل" in it, it will be printed as "لصيف".

to get around this problem you can add the following code:

txt2 = txt.split(" ")
for word in txt2:
print(word)

Good luck

Can you contact me?
(faisalsrpg@gmail.com)

@nduprincekc
Copy link

nduprincekc commented Feb 2, 2022

PdfFileReader(open('/file.pdf', mode="rb"))

This is the correct code for the bug


> When opening a pdf file in Python 3.7 I get the following error:
> 
> ```
> >> PdfFileReader(open('/file.pdf'))
> >> UnsupportedOperation: can't do nonzero end-relative seeks
> ```
> 
> The reason is Python 3 no longer supports negative seeks. To fix the issue I had to set the mode to 'rb':
> 
> ```
> >> PdfFileReader(open('/file.pdf', mode="rb"))
> ```
> 
> This may be worth noting in the documentation.

PdfFileReader(open('/file.pdf', mode="rb"))

@MartinThoma MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Apr 7, 2022
@MartinThoma
Copy link
Member

MartinThoma commented Apr 16, 2022

I think this is just a wrong usage of PyPDF2.

Wrong

reader = PdfFileReader(open("example.pdf"))

Correct - Option 1

reader = PdfFileReader("example.pdf")  # just pass the path

Correct - Option 2

fh = open("example.pdf", "rb")   # mind the "rb"!
reader = PdfFileReader(fh)
# do your stuff
fh.close()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF
Projects
None yet
Development

No branches or pull requests

9 participants