"KeyError: 0" when merging PDF page that has content-stream-inline images #196

yourcelf · 2015-04-25T17:22:20Z

Attempting to merge in a PDF page which has an image stored inline in the content stream raises an error. Here's an example which generates such a PDF with reportlab:

import PyPDF2
from reportlab.pdfgen import canvas

pdf1 = PyPDF2.PdfFileReader(open("test.pdf", 'rb'))

c = canvas.Canvas("watermark.pdf")
c.drawInlineImage("watermark.png", 200, 300, 100, 100)
c.showPage()
c.save()

watermark = PyPDF2.PdfFileReader(open("watermark.pdf", 'rb'))
pdf1.getPage(0).mergePage(watermark.getPage(0))

Error:

Traceback (most recent call last):
  File "test.py", line 12, in <module>
    pdf1.getPage(0).mergePage(watermark.getPage(0))
  File "/venv/local/lib/python2.7/site-packages/PyPDF2/pdf.py", line 2013, in mergePage
    self._mergePage(page2)
  File "/venv/local/lib/python2.7/site-packages/PyPDF2/pdf.py", line 2058, in _mergePage
    page2Content, rename, self.pdf)
  File "/venv/local/lib/python2.7/site-packages/PyPDF2/pdf.py", line 1963, in _contentStreamRename
    op = operands[i]
KeyError: 0

It looks like _contentStreamRename doesn't expect to see a data object in the content stream.

It's easy to work around this by replacing canvas.drawInlineImage with canvas.drawImage -- but the inline variant is a valid PDF that may occur in the wild.

The text was updated successfully, but these errors were encountered:

schurlix · 2016-01-19T12:01:18Z

Hi there, it seems I have a similar problem, and it also occurs
in _contentStreamRename in pdf.py with a key error. My script
takes a bunch of input files and pastes always 6 input pages on
one page of the output file. There is no problem if I only use
one input file with many pages, but more than one input files
throw the below error.

In the case of exactly one input file, operands in
_contentStreamRename is a list and there is no problem. In
the case of more than one input file operands is a dict and
my patch iterates over the values of the dict.

On the very bottom of my post you'll find the patch that fixed
the problem for me, but I am really not sure if it won't break
other things. Anyway, here you go:

the call and the traceback:

$ python ../../bin/mypdf.py C150334053445EUR2015* urxn.pdf
Traceback (most recent call last):
  File "../../bin/mypdf.py", line 42, in <module>
    out.schurlimerge (page)
  File "../../bin/mypdf.py", line 36, in schurlimerge
    self.newpage.mergeRotatedScaledTranslatedPage (page, 90, 2/3.0, offset_x,offset_y)
  File "/usr/local/lib/python2.7/dist-packages/PyPDF2/pdf.py", line 2462, in mergeRotatedScaledTranslatedPage
    ctm[2][0], ctm[2][1]], expand)
  File "/usr/local/lib/python2.7/dist-packages/PyPDF2/pdf.py", line 2299, in mergeTransformedPage
    PageObject._addTransformationMatrix(page2Content, page2.pdf, ctm), ctm, expand)
  File "/usr/local/lib/python2.7/dist-packages/PyPDF2/pdf.py", line 2255, in _mergePage
    page2Content, rename, self.pdf)
  File "/usr/local/lib/python2.7/dist-packages/PyPDF2/pdf.py", line 2160, in _contentStreamRename
    op = operands[i]
KeyError: 0

the script:

#!/usr/bin/env python

import PyPDF2
import sys 

ppmm = 2.83465
a4xmm = 210 
a4ymm = 297 
a4xp = a4xmm * ppmm
a4yp = a4ymm * ppmm

def pages (filenames):
   for filename in filenames:
      inpdf = PyPDF2.PdfFileReader(file(filename,"rb"))
      for i in range (inpdf.numPages):
         yield (inpdf.getPage (i))

class Writer:

   def __init__ (self, outfile):
      self.outfile = outfile
      self.curpagenum = 0 
      self.writer = PyPDF2.pdf.PdfFileWriter ()
      self.newpage = None

   def write (self):
      self.writer.write (file (self.outfile, "wb"))

   def schurlimerge (self, page):
      if self.curpagenum % 6 == 0:
         if self.newpage: self.newpage.update ()
         self.newpage = self.writer.addBlankPage(a4xp, a4yp)
      if self.curpagenum % 2 == 0: offset_y = 0 
      else: offset_y = a4yp / 2 
      offset_x = a4xp / 3.0 * ((self.curpagenum / 2) % 3) + a4xp / 3.0 
      self.newpage.mergeRotatedScaledTranslatedPage (page, 90, 2/3.0, offset_x,offset_y)
      self.curpagenum += 1

out = Writer (sys.argv [-1])

for page in pages (sys.argv [1:-1]):
   out.schurlimerge (page)

out.write ()

my patch:

Index: usr/local/lib/python2.7/dist-packages/PyPDF2/pdf.py
===================================================================
--- usr/local/lib/python2.7/dist-packages/PyPDF2/pdf.py (revision 2224)
+++ usr/local/lib/python2.7/dist-packages/PyPDF2/pdf.py (revision 2223)
@@ -43,6 +43,7 @@
 __maintainer_email = "PyPDF2@phaseit.net"

 import string
+import types
 import math
 import struct
 import sys
@@ -2156,10 +2157,18 @@
             return stream
         stream = ContentStream(stream, pdf)
         for operands, operator in stream.operations:
-            for i in range(len(operands)):
-                op = operands[i]
-                if isinstance(op, NameObject):
-                    operands[i] = rename.get(op,op)
+            if type (operands) == types.ListType:
+                for i in range(len(operands)):
+                    op = operands[i]
+                    if isinstance(op, NameObject):
+                        operands[i] = rename.get(op,op)
+            elif type (operands) == types.DictType:
+                for i in operands:
+                    op = operands[i]
+                    if isinstance(op, NameObject):
+                        operands[i] = rename.get(op,op)
+            else:
+                raise KeyError ("type of operands is %s" % type (operands))
         return stream
     _contentStreamRename = staticmethod(_contentStreamRename)

josephernest · 2018-05-20T14:58:54Z

@mstamy2 Please include @schurlix's patch , it works and it solves an annoying problem ;)

At the time of writing (20180520_1659), pip install pypdf2 on Python 2.7 64 didn't include it.

Appeared when merging PDFs that have content-stream-inline images This patch was provided by Georg Graf : #196 (comment) Thank you! Closes #196

MartinThoma added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF PdfReader The PdfReader component is affected labels Apr 7, 2022

py-pdf deleted a comment from claird Apr 7, 2022

MartinThoma added PdfMerger The PdfMerger component is affected and removed PdfReader The PdfReader component is affected labels Apr 7, 2022

MartinThoma added a commit that referenced this issue Apr 7, 2022

BUG: Stream operations can be List or Dict

73688e0

Appeared when merging PDFs that have content-stream-inline images This patch was provided by Georg Graf : #196 (comment) Thank you! Closes #196

MartinThoma mentioned this issue Apr 7, 2022

BUG: Stream operations can be List or Dict #665

Merged

MartinThoma closed this as completed in #665 Apr 7, 2022

MartinThoma added a commit that referenced this issue Apr 7, 2022

BUG: Stream operations can be List or Dict (#665)

3eadff0

Appeared when merging PDFs that have content-stream-inline images This patch was provided by Georg Graf : #196 (comment) Thank you! Closes #196

MartinThoma added the key-error Could be a bug, but also a robustness issue label Aug 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"KeyError: 0" when merging PDF page that has content-stream-inline images #196

"KeyError: 0" when merging PDF page that has content-stream-inline images #196

yourcelf commented Apr 25, 2015

schurlix commented Jan 19, 2016

josephernest commented May 20, 2018 •

edited

Loading

"KeyError: 0" when merging PDF page that has content-stream-inline images #196

"KeyError: 0" when merging PDF page that has content-stream-inline images #196

Comments

yourcelf commented Apr 25, 2015

schurlix commented Jan 19, 2016

the call and the traceback:

the script:

my patch:

josephernest commented May 20, 2018 • edited Loading

josephernest commented May 20, 2018 •

edited

Loading