python3 - TypeError: ord() expected string of length 1, but int found #254

LeoFCardoso · 2016-03-27T14:41:25Z

I am getting this error when using python3 and this simple code:

imagepdf = PdfFileReader(open(sys.argv[1], 'rb'), strict=False)
textpdf = PdfFileReader(open(sys.argv[2], 'rb'), strict=False)
for i in range(imagepdf.getNumPages()):
imagepage = imagepdf.getPage(i)
textpage = textpdf.getPage(i)
factor_x = textpage.mediaBox.upperRight[0] / imagepage.mediaBox.upperRight[0]
factor_y = textpage.mediaBox.upperRight[1] / imagepage.mediaBox.upperRight[1]
imagepage.scale(float(factor_x), float(factor_y))
textpage.mergePage(imagepage) # imagepage stay on top
textpage.compressContentStreams()
output.addPage(textpage)

Trace:

Traceback (most recent call last):
File "...", line 34, in
imagepage.scale(float(factor_x), float(factor_y))
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/PyPDF2/pdf.py", line 2493, in scale
0, 0])
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/PyPDF2/pdf.py", line 2479, in addTransformation
originalContent, self.pdf, ctm)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/PyPDF2/pdf.py", line 2180, in _addTransformationMatrix
contents = ContentStream(contents, pdf)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/PyPDF2/pdf.py", line 2641, in init
data += s.getObject().getData()
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/PyPDF2/generic.py", line 837, in getData
decoded._data = filters.decodeStreamData(self)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/PyPDF2/filters.py", line 350, in decodeStreamData
data = LZWDecode.decode(data, stream.get("/DecodeParms"))
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/PyPDF2/filters.py", line 255, in decode
return LZWDecode.decoder(data).decode()
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/PyPDF2/filters.py", line 228, in decode
cW = self.nextCode();
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/PyPDF2/filters.py", line 205, in nextCode
nextbits=ord(self.data[self.bytepos])
TypeError: ord() expected string of length 1, but int found

Am I doing something wrong?

mstamy2 · 2016-05-19T19:13:29Z

Any chance you can post/send me the PDF(s) you're working with?

Most likely this is a Python 3 type handling issue found in the LZW decoding algorithm, in which case it is easily fixable

LeoFCardoso · 2016-05-21T13:25:06Z

My script is
https://github.com/LeoFCardoso/pdf2pdfocr/blob/master/pdf2pdfocr_multibackground.py

Here is the call:

python3.4 pdf2pdfocr_multibackground.py first.pdf second.pdf result.pdf

second.pdf
first.pdf

Thanks!

mstamy2 · 2016-05-23T19:47:27Z

5bbd5af should take care of these type issues

mstamy2 · 2016-05-24T15:56:16Z

Let me know of any further issues!

LeoFCardoso · 2016-05-25T01:39:25Z

It works! Thanks!

LeoFCardoso · 2016-11-18T00:52:43Z

when new version will be available at https://pypi.python.org/pypi/PyPDF2?
Thanks!

jguram · 2017-11-15T15:25:18Z

199709222.pdf
I am getting the same issue.Can you please help me. @mstamy2

Here is my code

import PyPDF2
import os
import xlsxwriter

search_words = []
os.chdir(r'C:\Users\jui\Documents\Parabole')
with open('allwords.txt') as f:
for line in f:
search_words.append(line)
print(len(search_words))

p1 = 'C:\data\pdfContainer\test'
file_name = []
for filename in os.listdir(p1):
file_name.append(filename.lstrip())
print(file_name)

for i in file_name:
print(i)
os.chdir(r'C:\data\pdfContainer\test')
pdf_file = open(filename, 'rb')
print('opening pdf file')
read_pdf = PyPDF2.PdfFileReader(pdf_file)
print(read_pdf.isEncrypted)

number_of_pages = read_pdf.getNumPages()
print(number_of_pages)
pdf_content = ' '
for j in range(number_of_pages):
    page = read_pdf.getPage(j)
    print(page)
    j = +1
    page_content = page.extractText()
    print(page_content)
    pdf_content = pdf_content + page_content
#print(pdf_content)
new_dict = {}
for word in search_words:
    cnt_of_words = pdf_content.count(word)
    new_dict.update({word: cnt_of_words})
#print(new_dict)    
#for i in file_name:
print(i)
p = i + '.xlsx'
print(p)
workbook = xlsxwriter.Workbook(p)
worksheet = workbook.add_worksheet()
row = 0
col = 0
for key in new_dict.keys():
    row += 1
    worksheet.write(row, col, key)
    worksheet.write(row, col + 1, new_dict[key])
    row += 1

workbook.close()

allwords.txt

jguram · 2017-11-15T15:30:26Z

@mstamy2 I want to check the occurence of words present in allwords.txt in the PDF file mentioned and write it in excel

rameessahlu · 2019-02-05T19:45:47Z

5bbd5af should take care of these type issues

These fixes are still not merged.

Jeff-Winchell · 2019-04-19T23:41:15Z

I got multiple files that trigger this error.
I am currently running a loop looking at 177,000 pdf files. It blows up with that error on the extractText() function. Yes I am running python 3, which was shipped A DECADE AGO.

for index, row in Files.iterrows():
    try:
        filename=row.Filename
        pdffile=PdfFileReader(filename)
        for pagenum in range(pdffile.numPages):
            foo=[word.lower() for word in tokenizer.tokenize(pdffile.getPage(pagenum).extractText()) if word.lower() not in stopWords and not word.isdigit()]
    except:
        print(index,filename,pagenum)

The bug has been triggered 6 times in the first 1255 files, so I'm guessing the error rate is about 0.5%

misilot · 2020-09-24T14:17:39Z

Can a release be made that includes this fix?

mstamy2 closed this as completed May 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python3 - TypeError: ord() expected string of length 1, but int found #254

python3 - TypeError: ord() expected string of length 1, but int found #254

LeoFCardoso commented Mar 27, 2016

mstamy2 commented May 19, 2016

LeoFCardoso commented May 21, 2016

mstamy2 commented May 23, 2016

mstamy2 commented May 24, 2016

LeoFCardoso commented May 25, 2016

LeoFCardoso commented Nov 18, 2016

jguram commented Nov 15, 2017

jguram commented Nov 15, 2017

rameessahlu commented Feb 5, 2019

Jeff-Winchell commented Apr 19, 2019

misilot commented Sep 24, 2020

python3 - TypeError: ord() expected string of length 1, but int found #254

python3 - TypeError: ord() expected string of length 1, but int found #254

Comments

LeoFCardoso commented Mar 27, 2016

mstamy2 commented May 19, 2016

LeoFCardoso commented May 21, 2016

mstamy2 commented May 23, 2016

mstamy2 commented May 24, 2016

LeoFCardoso commented May 25, 2016

LeoFCardoso commented Nov 18, 2016

jguram commented Nov 15, 2017

jguram commented Nov 15, 2017

rameessahlu commented Feb 5, 2019

Jeff-Winchell commented Apr 19, 2019

misilot commented Sep 24, 2020