Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Python 3.3 support #4

Closed
wants to merge 4 commits into
from

Conversation

Projects
None yet
5 participants

cedeon commented Feb 18, 2013

Hi,
I ported this over to Python 3. I have tested for backwards compatibility with Python 2 and all seems well although I havn't tested it extensively so it may need some more investigation.

I chose to use importlib for 'relative' style imports. Forgive my naievity if this was a bad choice, its the first time I've run into the differences between the Python 2 and 3 module import handling.

@cedeon cedeon commented on the diff Feb 18, 2013

PyPDF2/generic.py
@@ -344,8 +349,14 @@ def readStringFromStream(stream):
# Then don't add anything to the actual string, since this
# line break was escaped:
tok = b_('')
+ elif tok == b_(" "):
@cedeon

cedeon Feb 18, 2013

Sorry I nearly forgot about this addition. This one is a totally blind duck-punch that I added because one of my PDF files was falling through this code all the way to the exception. I havn't read the pdf spec so forgive my ignorance. My PDF worked after the change so that's all I cared about at the time. If I made a bad call and you want me to remove and rebase this, let me know.

TWAC commented Jun 19, 2013

The py3-3Fix branch causes this code to fail in both 2.7 and 3.3.
It works with master and 2.7, but not 3.3.

from PyPDF2 import PdfFileReader
pdf = PdfFileReader(open("test.pdf", "rb"))
print(pdf.getPage(0).extractText())
PdfReadWarning: Xref table not zero-indexed. ID numbers for objects will not be corrected. [pdf.py:990]
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    print(pdf.getPage(0).extractText())
  File "C:\Users\TWAC\python\PyPDF2\PyPDF2\pdf.py", line 1711, in extractText
    content = ContentStream(content, self.pdf)
  File "C:\Users\TWAC\python\PyPDF2\PyPDF2\pdf.py", line 1793, in __init__
    stream = StringIO(stream.getData())
  File "C:\Users\TWAC\python\PyPDF2\PyPDF2\generic.py", line 818, in getData
    decoded._data = filters.decodeStreamData(self)
NameError: global name 'filters' is not defined

As a side note, perhaps the bugtracker could be enabled, now that this is the officially blessed fork of pyPdf?

Collaborator

claird commented Aug 15, 2013

Cross-version support does matter to us. Our current estimate is that PyPDF2 should be good with 3.3 by September--maybe sooner!

Owner

mstamy2 commented Jan 8, 2014

Closing this since the branches were merged

@mstamy2 mstamy2 closed this Jan 8, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment