Python 3 doesn't support negative seeks #489

webisteme · 2019-03-06T11:22:02Z

When opening a pdf file in Python 3.7 I get the following error:

>> PdfFileReader(open('/file.pdf'))
>> UnsupportedOperation: can't do nonzero end-relative seeks

The reason is Python 3 no longer supports negative seeks. To fix the issue I had to set the mode to 'rb':

>> PdfFileReader(open('/file.pdf', mode="rb"))

This may be worth noting in the documentation.

The text was updated successfully, but these errors were encountered:

Dakotapa · 2019-10-27T08:19:28Z

Thank you! This worked

snehal1989 · 2019-12-07T07:23:54Z

thanks this worked

basawanayya · 2020-04-26T18:12:55Z

Thanks

SheriffNabil · 2020-04-26T21:25:22Z

Hello,
I can't read a PDF that is in Arabic.
file = PyPDF2.PdfFileReader(open('P07.pdf',mode='rb', encoding="UTF-8"))
I get the following message: binary mode doesn't take an encoding argument
Do you have any idea how to fix this?
Thanks :)

basawanayya · 2020-04-27T09:44:44Z

sorry, i don't know

…

On Mon, Apr 27, 2020 at 2:55 AM Sherif Bassiouni ***@***.***> wrote: Hello, I can't read a PDF that is in Arabic. file = PyPDF2.PdfFileReader(open('P07.pdf',mode='rb', encoding="UTF-8")) I get the following message: binary mode doesn't take an encoding argument Do you have any idea how to fix this? Thanks :) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#489 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANRZGJTA5ZQAMKKMOUWDZLLROSRE5ANCNFSM4G4CTSPA> .

gothill · 2020-04-27T09:55:44Z

@SherifBassiouni You're opening a file in binary mode, which assumes the file you are opening is in bytes, i.e. not encoded, so try removing the encoding argument.

SheriffNabil · 2020-04-27T17:47:22Z

@SherifBassiouni You're opening a file in binary mode, which assumes the file you are opening is in bytes, i.e. not encoded, so try removing the encoding argument.

Thanks for your reply.
I tried your suggestion and got the following error:
UnsupportedOperation: can't do nonzero end-relative seeks

faisalaltqiqi · 2020-04-29T19:55:21Z

Hello,
I can't read a PDF that is in Arabic.
file = PyPDF2.PdfFileReader(open('P07.pdf',mode='rb', encoding="UTF-8"))
I get the following message: binary mode doesn't take an encoding argument
Do you have any idea how to fix this?
Thanks :)

Dear
I have the same problem
Did you come up with a solution?

SheriffNabil · 2020-04-29T20:26:54Z

Hello,
I can't read a PDF that is in Arabic.
file = PyPDF2.PdfFileReader(open('P07.pdf',mode='rb', encoding="UTF-8"))
I get the following message: binary mode doesn't take an encoding argument
Do you have any idea how to fix this?
Thanks :)

Dear
I have the same problem
Did you come up with a solution?

Hello Faisal,

There is another Python library called (fitz) that allows you to extract text from PDF files.
here is the code:

import fitz
doc = fitz.open("your_file.pdf")
txt = doc.getPageText(0)
txt

The only problem is that it prints the words in reverse order. for example, if your PDF file has the word "فيصل" in it, it will be printed as "لصيف".

to get around this problem you can add the following code:

txt2 = txt.split(" ")
for word in txt2:
print(word)

Good luck

faisalaltqiqi · 2020-04-29T20:30:27Z

packet = io.BytesIO() text=("12345فيصل") can = canvas.Canvas(packet, pagesize=letter ) can.drawString(250, 603, text) can.save() packet.seek(0) new_pdf = PdfFileReader(packet) #(open('P07.pdf',mode='rb', encoding="UTF-8")) with open("test.pdf",mode='rd',encoding='utf-8') as op: output = PdfFileWriter() page = existing_pdf.getPage(0) page2 = new_pdf.getPage(0) page.mergePage(page2) output.addPage(page) with open("New.pdf" ,'wb',encoding='utf-8') as out: output.write(out) #outputStream = open("New.pdf" , "wb") #output.write(outputStream) out.close()

…

Hello, I can't read a PDF that is in Arabic. file = PyPDF2.PdfFileReader(open('P07.pdf',mode='rb', encoding="UTF-8")) I get the following message: binary mode doesn't take an encoding argument Do you have any idea how to fix this? Thanks :) Dear I have the same problem Did you come up with a solution? Hello Faisal, *There is another Python library called (fitz) that allows you to extract text from PDF files. here is the code:* import fitz doc = fitz.open("your_file.pdf") txt = doc.getPageText(0) txt The only problem is that it prints the words in reverse order. for example, if your PDF file has the word "فيصل" in it, it will be printed as "لصيف". *to get around this problem you can add the following code:* txt2 = txt.split(" ") for word in txt2: print(word) Good luck — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#489 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALCLQBCRO6GGDXPJHNVA6PLRPCER3ANCNFSM4G4CTSPA> .

faisalaltqiqi · 2020-04-29T23:39:20Z

Hello,
I can't read a PDF that is in Arabic.
file = PyPDF2.PdfFileReader(open('P07.pdf',mode='rb', encoding="UTF-8"))
I get the following message: binary mode doesn't take an encoding argument
Do you have any idea how to fix this?
Thanks :)

Dear
I have the same problem
Did you come up with a solution?

Hello Faisal,

There is another Python library called (fitz) that allows you to extract text from PDF files. here is the code:

import fitz
doc = fitz.open("your_file.pdf")
txt = doc.getPageText(0)
txt

The only problem is that it prints the words in reverse order. for example, if your PDF file has the word "فيصل" in it, it will be printed as "لصيف".

to get around this problem you can add the following code:

txt2 = txt.split(" ")
for word in txt2:
print(word)

Good luck

Can you contact me?
(faisalsrpg@gmail.com)

nduprincekc · 2022-02-02T20:20:27Z

PdfFileReader(open('/file.pdf', mode="rb"))

This is the correct code for the bug


> When opening a pdf file in Python 3.7 I get the following error:
> 
> ```
> >> PdfFileReader(open('/file.pdf'))
> >> UnsupportedOperation: can't do nonzero end-relative seeks
> ```
> 
> The reason is Python 3 no longer supports negative seeks. To fix the issue I had to set the mode to 'rb':
> 
> ```
> >> PdfFileReader(open('/file.pdf', mode="rb"))
> ```
> 
> This may be worth noting in the documentation.

PdfFileReader(open('/file.pdf', mode="rb"))

MartinThoma · 2022-04-16T10:49:30Z

I think this is just a wrong usage of PyPDF2.

Wrong

reader = PdfFileReader(open("example.pdf"))

Correct - Option 1

reader = PdfFileReader("example.pdf")  # just pass the path

Correct - Option 2

fh = open("example.pdf", "rb")   # mind the "rb"!
reader = PdfFileReader(fh)
# do your stuff
fh.close()

faisalaltqiqi mentioned this issue Apr 29, 2020

Hello, #548

Closed

MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Apr 7, 2022

MartinThoma closed this as completed Apr 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python 3 doesn't support negative seeks #489

Python 3 doesn't support negative seeks #489

webisteme commented Mar 6, 2019 •

edited

Dakotapa commented Oct 27, 2019

snehal1989 commented Dec 7, 2019

basawanayya commented Apr 26, 2020

SheriffNabil commented Apr 26, 2020

basawanayya commented Apr 27, 2020 via email

gothill commented Apr 27, 2020 •

edited

SheriffNabil commented Apr 27, 2020

faisalaltqiqi commented Apr 29, 2020

SheriffNabil commented Apr 29, 2020

faisalaltqiqi commented Apr 29, 2020 via email •

edited

faisalaltqiqi commented Apr 29, 2020

nduprincekc commented Feb 2, 2022 •

edited

MartinThoma commented Apr 16, 2022 •

edited

Python 3 doesn't support negative seeks #489

Python 3 doesn't support negative seeks #489

Comments

webisteme commented Mar 6, 2019 • edited

Dakotapa commented Oct 27, 2019

snehal1989 commented Dec 7, 2019

basawanayya commented Apr 26, 2020

SheriffNabil commented Apr 26, 2020

basawanayya commented Apr 27, 2020 via email

gothill commented Apr 27, 2020 • edited

SheriffNabil commented Apr 27, 2020

faisalaltqiqi commented Apr 29, 2020

SheriffNabil commented Apr 29, 2020

faisalaltqiqi commented Apr 29, 2020 via email • edited

faisalaltqiqi commented Apr 29, 2020

nduprincekc commented Feb 2, 2022 • edited

MartinThoma commented Apr 16, 2022 • edited

Wrong

Correct - Option 1

Correct - Option 2

webisteme commented Mar 6, 2019 •

edited

gothill commented Apr 27, 2020 •

edited

faisalaltqiqi commented Apr 29, 2020 via email •

edited

nduprincekc commented Feb 2, 2022 •

edited

MartinThoma commented Apr 16, 2022 •

edited