Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If Markdown input contains image, then \includepdf{some.pdf} output scales down far too much. #2856

Closed
KurtPfeifle opened this issue Apr 16, 2016 · 10 comments

Comments

@KurtPfeifle
Copy link

KurtPfeifle commented Apr 16, 2016

Steps to reproduce:

  1. Select a simple, 1-page PDF from your stock for testing.

    If you don't have one handy, use this Ghostscript command to create one:

    gs -o mwe-in.pdf              \
        -sDEVICE=pdfwrite         \
        -g5000x6000               \
        -c "1 0.7 0.7 setrgbcolor \
            0 0 500 600 rectfill  \
            0 setgray             \
            /Helvetica findfont 104 scalefont setfont \
            30 390 moveto         \
            (This is a) show      \
            /Helvetica findfont 160 scalefont setfont \
            80 200 moveto         \
            (PDF) show            \
            showpage"
    

    This should have created a PDF, mwe-in.pdf.

  2. Select a simple PNG or JPEG image from your stock for testing.

    If you don't have one handy, use this ImageMagick command to create one:

    convert                             \
           -size 500x471 xc:"#ff000080" \
           -background '#ff000080' -font helvetica -size 500 -fill black label:" IMAGE " \
           -append mwe-in.png
    

    This should have created a PNG image, mwe-in.png.

  3. Create a first simple Markdown file, mwe1.md, with this content:

    ![](mwe-in.png)
    

    You can achieve this by running this command in a terminal:

    echo -e '\n\n![](mwe-in.png)\n\n' > mwe1.md
    
  4. Create a second simple Markdown file, mwe2.md with this content:

    \includepdf{mwe-in.pdf}
    

    You can achieve this by running this command in a terminal:

    echo -e '\n\n\includepdf{mwe-in.pdf}\n\n' > mwe2.md
    
  5. Last, run these commands to create different PDF files from the two Markdown input files:

    pandoc        \
          mwe1.md \
         -o mwe-image-only.pdf
    
    pandoc        \
          mwe2.md \
         -V header-includes="\usepackage{pdfpages}" \
         -o mwe-pdfinclude-only.pdf 
    
    pandoc                \
          mwe1.md mwe2.md \
         -V header-includes="\usepackage{pdfpages}" \
         -o mwe-image-plus-pdfinclude-is-buggy.pdf
    

Results:

  1. Output mwe-image-only.pdf is a one page PDF showing the image mwe-in.png.
    This output is as expected.
  2. Output mwe-pdfinclude-only.pdf is a one page PDF which included the PDF mwe-in.pdf as a page.
    This output is as expected.
  3. Output mwe-image-plus-pdfinclude-is-buggy.pdf is a two page PDF where one page shows the image mwe-in.png, while the other page shows the included PDF mwe-in.pdf. However, the included PDF is scaled down to an extremely small size (allmost invisible).
    The scaling-down of the included PDF is totally un-expected.

More info

My Pandoc version is 1.17.0.3 (on OS X).

It doesn't help to include pdfpages options into the header-includes, such like "\usepackage[noautoscale=true]{pdfpages}" or "\usepackage[fitpaper=true]{pdfpages}".

I'm aware that this may be a LaTeX bug, but I'm not sure. Hence I report it here first.

@mb21
Copy link
Collaborator

mb21 commented Apr 17, 2016

This seems more like a LaTeX issue. Pandoc is really just generating a .tex file, then running that through LaTeX. You can generate the tex and have a look:

pandoc                \
      mwe1.md mwe2.md \
     -V header-includes="\usepackage{pdfpages}" \
     -o mwe-image-plus-pdfinclude-is-buggy.tex

If you can tell us what exactly is wrong with the tex file generated, we might be able to fix it in pandoc...

@KurtPfeifle
Copy link
Author

Pandoc is really just generating a .tex file, then running that through LaTeX. You can generate the tex and have a look:

Thanks for the tipp, mb21.

However, I did this already, but couldn't see what may be causing it. Hence I supplied a step-by-step instruction with an MWE so somebody more knowledgeble than may may look into the Pandoc and the LaTeX side of the process. (My PDF source code reading skills are good enough however to analyze what's going on in there -- see further below.)

My bet still is on the LaTeX code Pandoc generates. There may be a $something in Pandoc's LaTeX template that needs modification -- like order of loading for \usepackage{...}, or some additional [$option] for a package to be added, or some different provision made in the code that scales images to avoid overlapping margins, etc. -- all of which are beyond my level of LaTeX skillz.

Because I cannot be the first person to ever use this function, and I remember it used to work for me. So it must be some new bug introduced alongside the recent improvements/changes to the LaTeX template or the added image attributes...

I cannot complete the analysis, but I can give you these hints:

LaTeX code

Here are the contents of the "only PDF figure included" LaTeX document code:

\includepdf{mwe-in.pdf}

Here are the contents of the "PDF figure plus pixel image included" LaTeX document code:

\begin{figure}[htbp]
\centering
\includegraphics{mwe-in.png}
\caption{}
\end{figure}

\includepdf{mwe-in.pdf}
_Another question aside:_
Can someone please tell me, why with above LaTeX code the PDF output shows the PNG image last (on page 2), and the included PDF figure first (on page 1) ?!?
Even though the PNG-inserting \includegraphics{...} command comes first, while the PDF-inserting \includepdf{...} comes second?

And here is what the "PDF plus image" LaTeX pre-amble has added when compared to the "PDF-include only" LaTeX preamble. It has added, in lines 29-37:

    29 \usepackage{graphicx,grffile}                                           
    30 \makeatletter
    31 \def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
    32 \def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}
    33 \makeatother
    34 % Scale images if necessary, so that they will not overflow the page
    35 % margins by default, and it is still possible to overwrite the defaults
    36 % using explicit options in \includegraphics[width, height, ...]{}      
    37 \setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}

There is no other difference in the pre-amble. If I just comment out or remove lines 30-37 and compile this modified LaTeX to PDF, both pages look "saner" (however, both images no longer honor the page margins).

If you just comment out or delete line 37 (WARNING: I don't know what I'm doing!), so that it reads:

 37 % \setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
  • then this improves PDF output for the pages:
  1. Placement and appearance of the raster image (the \includegraphics{...} part) is still looking the same.
  2. Appearance of the PDF figure (the \includepdf{...} part) is no longer scaled down -- but it is now scaled to the full width of the paper (ignoring margins).

May this problem be related to "Is it ImageMagick's fault or pdflatex's that some JPEGs aren't working?" ?

PDF code

Now for the PDF code... I'll pinpoint to the differences which cause the mwe-image-plus-pdfinclude-is-buggy.pdf to render differently as compared to mwe-pdfinclude-only.pdf.

First, unpack the compressed streams with the help of the qpdf command line tool:

qpdf --qdf --object-streams=disable mwe-image-plus-pdfinclude-is-buggy.pdf scaleddown-output.pdf

Then look at the PDF code in an editor. Line numbers 169-183 read:

   %% Contents for page 1
   %% Original object ID: 11 0
   13 0 obj
   <<
     /Length 14 0 R
   >>
   stream
   1 0 0 1 301.85 391.02 cm
   q
   .00932 0 0 .00932 0 0 cm
   q
   1.78056 0 0 1.78056 0 0 cm
   q
   1 0 0 1 0 0 cm
   /Im5 Do
  1. Line 176 says 1 0 0 1 301.85 391.02 cm:
    • The _cm_ there is an operator which applies a current (transformation) matrix with the value of 1 0 0 1 301.85 391.02 cm to the page.
    • In effect, this shifts the coordinates for the spot where the later /Im5 Do on line 185 draws the image to 301.85 391.02 (which is about the center of the page), where the mini-mini-mini PDF figure indeed is placed.
    • _Change the two values_ to 1/100 th of the original value (like 3.0185 3.9102) and you'll see that _the image shifts to the lower left corner_ of the page.
  2. Line 178 says .00932 0 0 .00932 0 0 cm:
    • Again, the _cm_ operator manipulates the current matrix.

    • In effect, the two identical parameters of .00932 scale the coordinate system by a factor of more than 100 (by about 107.3, to be more exact).

    • _Change the two values up again_ by 100 times, so the line now reads:

      000.932 0 0 000.932 0 0 cm
      

      The inserted PDF figure no longer display as "mini-mini-mini" but rather near its "natural" size.

  3. For some reason Line 180 says 1.78056 0 0 1.78056 0 0 cm which scales the coordinates down again before finally displaying the figure with the /Im5 Do statement in line 181.

Fixing it on the PDF side

So, you can do the following:

  1. Replace all these "a b c d 0 0 cm"-type lines (on 176, 178 and 180) by identical 1 0 0 1 0 0 cm ones.

  2. Fix the PDF's xref table to avoid PDF readers complaining about it being corrupt (due to the editing which may have changed byte offsets of objects within the file) by running:

    qpdf scaleddown-output.pdf sane-looking.pdf
    

    and your sane-looking.pdf will now show the PDF figure in the lower left corner of the page (with no margins).

So the question remains:

Why are pdftex, lualatex and xelatex (all of them!) scaling down the PDF figure in the presence of another (image) illustration in the same PDF (even if it's on a different page)?

@jgm
Copy link
Owner

jgm commented Apr 17, 2016

The default latex template contains:

% Scale images if necessary, so that they will not overflow
% the page
% margins by default, and it is still possible to overwrite
% the defaults
% using explicit options in \includegraphics[width, height,
% ...]{}
\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}

I'd suggest using a custom template without these lines. See
if that makes a difference. If it does, that tells us where
the issue lies.

@KurtPfeifle
Copy link
Author

@jgm: Due to what I described in my comment from April 17 13:00 UTC, I'm using a different template already (well, it's the default one for 1.17.3, but with line 131 commented out).

That's for now good enough, because it shows me roughly the state of my draft document. However, as I wrote in the comment above...

...this improves PDF output for the pages:

  1. Placement and appearance of the raster image (the \includegraphics{...} part) is still looking the same.
  2. Appearance of the PDF figure (the \includepdf{...} part) is no longer scaled down -- but it is now scaled to the full width of the paper (ignoring margins).

So, yes, _this line does make the difference_, as I said above already...

I have to correct point 1. above, though. The _appearance of the raster images also changes_. I just didn't initially notice with the one I mainly tested it. When I use a different (esp. larger) test image, the difference is better visible.

@AndreasMatthias
Copy link

\includepdf is more or less just a wrapper of \includegraphics. Thus, if you change the default options for \includegraphics globally, like you do with \setkeys{Gin}{...}, then these setting impact \includepdf as well.

So far the only thing you can do is to reset these settings for each \includepdf call:

\includepdf[width=!, height=!]{mwe-in.pdf}

(Hope this works. I cannot test it, since I do not have a recent pandoc install.)

@KurtPfeifle
Copy link
Author

@AndreasMatthias: Thanks for your input.

When I looked in the pdfpages manual, it didn't mention width or height parameters. (I tried them nevertheless, just on the chance, but it didn't work.) I only discovered [fitpaper] and [noautoscale] options.

What does the exclamation mark do in [width=!, height=!]? I've never seen it...

Also, I cannot follow your argument (but then, I know next to nothing about LaTeX or TeX): "if you change the default options for \includegraphics globally, like you do with \setkeys{Gin}{...}, then these setting impact \includepdf as well."

You may be right in theory -- but to me it looks like the default options for \includegraphics did work fine for the PNG/JPEG images, but they did _NOT_ transfer to the \includepdf calls: The images are OK and do not overflow borders, but the PDFs are much too small to overflow ANY border....

@AndreasMatthias
Copy link

@KurtPfeifle

When I looked in the pdfpages manual, it didn't mention width or height parameters.

width and height are parameters of \includegraphics and not of \includepdf. But \includepdf is a wrapper of \includegraphics, and all parameters not handled by \includepdf are forwarded to \includegraphics. This is the reason why \setkeys{Gin}{...} had an impact on \includepdf as well.

What does the exclamation mark do in [width=!, height=!]? I've never seen it...

The exclamation mark is the default value, meaning that the picture should not be scaled in this direction.

Anyway, after pondering this issue I think that width and height parameters do not have a reasonable meaning for \includepdf and therefor should not have any impact on \includepdf. I will try to fix it (I'm the author of pdfpages) and will report back.

@KurtPfeifle
Copy link
Author

@AndreasMatthias:

Thank you for your patient explanation.

When I (selectivly) studied the pdfpages manual (which I didn't fully comprehend anyway -- I'm a LaTeX user only by way of using Pandoc) I didn't notice \includegraphis being mentioned. Now that I revisited it, I indeed found this info at the bottom of page 8.

Also thanks for your "exclamation mark" -- this might be useful for me in the future. I have to test it. 👍

Also thank you very much for your awesome work on pdfpages too :-) 👍

@AndreasMatthias
Copy link

Version 0.5f of pdfpages fixes this issue. The new version is already installed on CTAN and should be available on other CTAN mirrors soon.

@KurtPfeifle
Copy link
Author

Great, @AndreasMatthias !
I really appreciate your fast, fast, fast reaction and fix.

I'm looking forward to take it to a testdrive with my current document creating project.

@jgm jgm closed this as completed Dec 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants