New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python 3 doesn't support negative seeks #489
Comments
Thank you! This worked |
thanks this worked |
Thanks |
Hello, |
sorry, i don't know
…On Mon, Apr 27, 2020 at 2:55 AM Sherif Bassiouni ***@***.***> wrote:
Hello,
I can't read a PDF that is in Arabic.
file = PyPDF2.PdfFileReader(open('P07.pdf',mode='rb', encoding="UTF-8"))
I get the following message: binary mode doesn't take an encoding argument
Do you have any idea how to fix this?
Thanks :)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#489 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANRZGJTA5ZQAMKKMOUWDZLLROSRE5ANCNFSM4G4CTSPA>
.
|
@SherifBassiouni You're opening a file in binary mode, which assumes the file you are opening is in bytes, i.e. not encoded, so try removing the encoding argument. |
Thanks for your reply. |
Dear |
Hello Faisal, There is another Python library called (fitz) that allows you to extract text from PDF files.
The only problem is that it prints the words in reverse order. for example, if your PDF file has the word "فيصل" in it, it will be printed as "لصيف". to get around this problem you can add the following code:
Good luck |
packet = io.BytesIO()
text=("12345فيصل")
can = canvas.Canvas(packet, pagesize=letter )
can.drawString(250, 603, text)
can.save()
packet.seek(0)
new_pdf = PdfFileReader(packet)
#(open('P07.pdf',mode='rb', encoding="UTF-8"))
with open("test.pdf",mode='rd',encoding='utf-8') as op:
output = PdfFileWriter()
page = existing_pdf.getPage(0)
page2 = new_pdf.getPage(0)
page.mergePage(page2)
output.addPage(page)
with open("New.pdf" ,'wb',encoding='utf-8') as out:
output.write(out)
#outputStream = open("New.pdf" , "wb")
#output.write(outputStream)
out.close()
… Hello,
I can't read a PDF that is in Arabic.
file = PyPDF2.PdfFileReader(open('P07.pdf',mode='rb', encoding="UTF-8"))
I get the following message: binary mode doesn't take an encoding argument
Do you have any idea how to fix this?
Thanks :)
Dear
I have the same problem
Did you come up with a solution?
Hello Faisal,
*There is another Python library called (fitz) that allows you to extract
text from PDF files. here is the code:*
import fitz
doc = fitz.open("your_file.pdf")
txt = doc.getPageText(0)
txt
The only problem is that it prints the words in reverse order. for
example, if your PDF file has the word "فيصل" in it, it will be printed as
"لصيف".
*to get around this problem you can add the following code:*
txt2 = txt.split(" ")
for word in txt2:
print(word)
Good luck
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#489 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALCLQBCRO6GGDXPJHNVA6PLRPCER3ANCNFSM4G4CTSPA>
.
|
Can you contact me? |
PdfFileReader(open('/file.pdf', mode="rb")) This is the correct code for the bug
|
I think this is just a wrong usage of PyPDF2. Wrongreader = PdfFileReader(open("example.pdf")) Correct - Option 1reader = PdfFileReader("example.pdf") # just pass the path Correct - Option 2fh = open("example.pdf", "rb") # mind the "rb"!
reader = PdfFileReader(fh)
# do your stuff
fh.close() |
When opening a pdf file in Python 3.7 I get the following error:
The reason is Python 3 no longer supports negative seeks. To fix the issue I had to set the mode to 'rb':
This may be worth noting in the documentation.
The text was updated successfully, but these errors were encountered: