Encoding issue: PyPDF2.utils.PdfReadError: Illegal character in Name Object #438

gitzjm · 2018-06-23T06:33:50Z

(已解决)我在给一个PDF添加水印的时候遇到了如下错误,提示我的Name Object中有非法字符:

Traceback (most recent call last):
  File "E:/test/水印/PDF水印.py", line 66, in <module>
    add_watermark("111111.pdf",r"F:\SVN代码\repository\back\ninstar_demo1\static\watermark\logo_watermark.pdf","output")
  File "E:/test/水印/PDF水印.py", line 61, in add_watermark
    pdf_output.write(output_stream)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 482, in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 577, in _sweepIndirectReferences
    newobj = data.pdf.getObject(data)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 1611, in getObject
    retval = readObject(self.stream, self)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\PyPDF2\generic.py", line 66, in readObject
    return DictionaryObject.readFromStream(stream, pdf)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\PyPDF2\generic.py", line 579, in readFromStream
    value = readObject(stream, pdf)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\PyPDF2\generic.py", line 60, in readObject
    return NameObject.readFromStream(stream, pdf)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\PyPDF2\generic.py", line 492, in readFromStream
    raise utils.PdfReadError("Illegal character in Name Object")
PyPDF2.utils.PdfReadError: Illegal character in Name Object

从代码中发现文件流已经合并完成,理论上我的水印是已经加上了的,但是往文件中写入的时候抛出了异常
我发现是generic.py的484行:
return NameObject(name.decode('utf-8'))
抛出的异常,因为我的PDF是中文所以我想到是因为编码问题,于是我把utf-8改成了GBK,
但是又出现了另外一个异常:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 8-9: ordinal not in range(256)
又找到这个异常是 utils的第238行导致的:
r = s.encode('latin-1')
又是一个编码问题,一开始我将latin-1换成了utf-8发现可以输出文件,但是文字排版错乱,而且少了许多文字,于是我想到可能是因为PDF中存在不同编码的文字导致的,所以我将此处代码改为了:

            try:
                r = s.encode('latin-1')
                if len(s) < 2:
                    bc[s] = r
                return r
            except Exception as e:
                print(s)
                r = s.encode('utf-8')
                if len(s) < 2:
                    bc[s] = r
                return r

问题成功解决,但是我感觉还会发生其他类似的异常,希望官方能关注一下PDF不同字符编码的兼容问题.

The text was updated successfully, but these errors were encountered:

zhangsanfu · 2018-08-04T04:37:47Z

the same problem

brchiu · 2018-09-28T01:21:36Z

I have the same problem and your fix works for me.
Could you send a pull request to author ?

我遇到同一個問題，您的方法可以解決我的問題。
您可以發一個 pull request 給作者嗎？

gitzjm · 2018-10-05T04:49:53Z

我有同样的问题，你的修复对我有用。
你可以向作者发送拉动请求吗？

我遇到同一个问题，您的方法可以解决我的问题。
您可以发一个拉请求给作者吗？

台灣的朋友你好：
已发pull request 具體修正方法如下
generic.py 的第 486行的代碼：
return NameObject(name.decode('utf-8'))
替換為：

        try:
            ret=name.decode('utf-8')
        except (UnicodeEncodeError, UnicodeDecodeError) as e:
            ret=name.decode('gbk')
        return NameObject(ret)

以及utils.py 中的 238-241行

            r = s.encode('latin-1')
            if len(s) < 2:
                bc[s] = r
            return r

替換為：
```
try:
r = s.encode('latin-1')
if len(s) < 2:
bc[s] = r
return r
except Exception as e:
print(s)
r = s.encode('utf-8')
if len(s) < 2:
bc[s] = r
return r

即可

eagleoflqj · 2018-11-23T09:00:35Z

try:
    r = s.encode('latin-1')
except:
    r = s.encode('utf-8')
if len(s) < 2:
    bc[s] = r
return r

zuiyuewentian · 2021-06-18T09:04:38Z

遇到同样问题，重新打了个包，发在这里
https://github.com/zuiyuewentian/PyPDF2/releases/tag/v1.26.1

MartinThoma · 2022-07-29T06:44:44Z

Do you still get the same issue with the latest PyPDF2?

Can somebody share a pdf that causes it?

MartinThoma · 2022-08-06T06:29:33Z

I'm closing this issue now as it might have been solved with the latest improvements. Please let me know if it wasn't solved by the latest PyPDF2 version.

Also, please share a PDF which causes issues!

michelle-chou25 · 2023-04-24T07:42:22Z

I met the same problem again, same as the author

zuiyuewentian · 2023-04-24T08:43:15Z

you can see my branch project

…

---- Replied Message ---- | From | ***@***.***> | | Date | 04/24/2023 15:42 | | To | ***@***.***> | | Cc | ***@***.***>***@***.***> | | Subject | Re: [py-pdf/pypdf] Encoding issue: PyPDF2.utils.PdfReadError: Illegal character in Name Object (#438) | I met the same problem again, same as the author — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

pubpub-zz · 2023-04-24T21:15:58Z

I met the same problem again, same as the author

@michelle-chou25
Without pdf and code we can not come complete analysis. Please open a new issue and provide the data

lwdsw · 2023-10-30T11:22:41Z

使用最新的 PyPDF2 是否仍然遇到同样的问题？

有人可以分享导致它的pdf吗？

来用我这个
工艺流程图.pdf

stefan6419846 · 2023-10-30T11:27:01Z

@lwdsw Please open a new issue for it with your code and the PDF file as well as an English description. Note that PyPDF2 is deprecated and should be migrated to pypdf.

FFengIll mentioned this issue Mar 18, 2019

UnicodeEncodeError: 'latin-1' codec can't encode characters FFengIll/pdf-cut-white#1

Closed

cbbing added a commit to cbbing/PyPDF2 that referenced this issue Oct 30, 2019

根据 py-pdf#438 调整

f527e22

cbbing added a commit to cbbing/PyPDF2 that referenced this issue Oct 30, 2019

根据 py-pdf#438 调整

ab60c0f

Vimos mentioned this issue Nov 21, 2019

encoding error #260

Closed

MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Apr 8, 2022

MartinThoma changed the title ~~编码问题: PyPDF2.utils.PdfReadError: Illegal character in Name Object~~ Encoding issue: PyPDF2.utils.PdfReadError: Illegal character in Name Object Jun 27, 2022

MartinThoma added the needs-pdf The issue needs a PDF file to show the problem label Jul 29, 2022

MartinThoma closed this as completed Aug 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding issue: PyPDF2.utils.PdfReadError: Illegal character in Name Object #438

Encoding issue: PyPDF2.utils.PdfReadError: Illegal character in Name Object #438

gitzjm commented Jun 23, 2018 •

edited by MartinThoma

Loading

zhangsanfu commented Aug 4, 2018

brchiu commented Sep 28, 2018

gitzjm commented Oct 5, 2018 •

edited

Loading

eagleoflqj commented Nov 23, 2018

zuiyuewentian commented Jun 18, 2021

MartinThoma commented Jul 29, 2022

MartinThoma commented Aug 6, 2022

michelle-chou25 commented Apr 24, 2023

zuiyuewentian commented Apr 24, 2023 via email

pubpub-zz commented Apr 24, 2023

lwdsw commented Oct 30, 2023

stefan6419846 commented Oct 30, 2023

Encoding issue: PyPDF2.utils.PdfReadError: Illegal character in Name Object #438

Encoding issue: PyPDF2.utils.PdfReadError: Illegal character in Name Object #438

Comments

gitzjm commented Jun 23, 2018 • edited by MartinThoma Loading

zhangsanfu commented Aug 4, 2018

brchiu commented Sep 28, 2018

gitzjm commented Oct 5, 2018 • edited Loading

eagleoflqj commented Nov 23, 2018

zuiyuewentian commented Jun 18, 2021

MartinThoma commented Jul 29, 2022

MartinThoma commented Aug 6, 2022

michelle-chou25 commented Apr 24, 2023

zuiyuewentian commented Apr 24, 2023 via email

pubpub-zz commented Apr 24, 2023

lwdsw commented Oct 30, 2023

stefan6419846 commented Oct 30, 2023

gitzjm commented Jun 23, 2018 •

edited by MartinThoma

Loading

gitzjm commented Oct 5, 2018 •

edited

Loading