Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image causes integer overflow in docx export #6880

Closed
alecgibson opened this issue Nov 23, 2020 · 6 comments
Closed

Image causes integer overflow in docx export #6880

alecgibson opened this issue Nov 23, 2020 · 6 comments

Comments

@alecgibson
Copy link

I've got an image that - for some reason - is causing an invalid <wp: extent cy="..."> value. See attached image, input JSON (remember to update path to point to your image download location) and output docx.

The resulting file fails to open in MS Word. Passing it through the Open XML SDK tool says that it has an error on wp:extent cy, saying it's "not a valid 'Int64' value". Indeed, inspection of this attribute shows that it's a very, very large negative number (potentially integer overflow?).

Pandoc version

❯ pandoc -v
pandoc 2.11.0.4
Compiled with pandoc-types 1.22, texmath 0.12.0.3, skylighting 0.10.0.3,
citeproc 0.1.0.3, ipynb 0.1

test.json

{
  "blocks": [
    {
      "t": "Para",
      "c": [
        {
          "t": "Image",
          "c": [
            [
              "",
              [],
              []
            ],
            [],
            [
              "/path/to/bad.jpg",
              ""
            ]
          ]
        }
      ]
    }
  ],
  "meta": {},
  "pandoc-api-version": [
    1,
    22
  ]
}

Command

pandoc --from=json --to=docx --output=out.docx test.json

bad.jpg

bad

Output

out.docx

@jgm
Copy link
Owner

jgm commented Nov 23, 2020

Sure enough, I get

<wp:extent cx="5334000" cy="-269653970229347386159395778618353710042696546841345985910145121736599013708251444699062715983611304031680170819807090036488184653221624933739271145959211186566651840137298227914453329401869141179179624428127508653257226023513694322210869665811240855745025766026879447359920868907719574457253034494436336205824" />

@jgm
Copy link
Owner

jgm commented Nov 23, 2020

relevant code

        (xpt,ypt) = desiredSizeInPoints opts attr
               (either (const def) id (imageSize opts img))
        -- 12700 emu = 1 pt
        (xemu,yemu) = fitToPage (xpt * 12700, ypt * 12700)
                                (pageWidth * 12700)

yemu is the thing that is overflowing. Trace reveals that (xpt, ypt) is (Infinity, Infinity).

@jgm
Copy link
Owner

jgm commented Nov 23, 2020

I see the issue. Our imageSize function returns the following: ImageSize {pxX = 960, pxY = 612, dpiX = 0, dpiY = 0} . The dpi == 0 causes an infinite size in points to be calculated. Still investigating why. macos preview says the image has dpi 72.

@jgm
Copy link
Owner

jgm commented Nov 23, 2020

JFIF header:

typedef struct _JFIFHeader
{
  BYTE SOI[2];          /* 00h  Start of Image Marker     */
  BYTE APP0[2];         /* 02h  Application Use Marker    */
  BYTE Length[2];       /* 04h  Length of APP0 Field      */
  BYTE Identifier[5];   /* 06h  "JFIF" (zero terminated) Id String */
  BYTE Version[2];      /* 07h  JFIF Format Revision      */
  BYTE Units;           /* 09h  Units used for Resolution */
  BYTE Xdensity[2];     /* 0Ah  Horizontal Resolution     */
  BYTE Ydensity[2];     /* 0Ch  Vertical Resolution       */
  BYTE XThumbnail;      /* 0Eh  Horizontal Pixel Count    */
  BYTE YThumbnail;      /* 0Fh  Vertical Pixel Count      */
} JFIFHEAD;

bad.jpg begins

0000000 ff d8 ff e0 00 10 4a 46 49 46 00 01 01 01 00 00
0000010 00 00 00 00 ff db 00 43 00 05 03 04 04 04 03 05

You can see in the first line the JFIF id string is 4A 46 49 46 00.
After that we have the JFIF version (00 01 = version 1.0).
Then the units (scale factor) = 01.
Then the horizontal resolution (00 00 = 0).
Then the vertical resolution (00 00 = 0).
Then the horizontal pixel count (00 = 0).
Then the vertical pixel count (00 = 0).
Well, we can see why the dpi is 0.

@jgm
Copy link
Owner

jgm commented Nov 23, 2020

The spec for JFIF says that x density and y density should always be nonzero, so I think this is a malformed jpg.
Nonetheless, pandoc should not blow up -- I will switch to defaulting to 72 dpi when a 0 value is given.

@alecgibson
Copy link
Author

👏 Rapid work! Thanks so much for looking into this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants