Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ipe 7.2.24 produces significantly larger files than 7.1.7 #412

Open
ByteHamster opened this issue Jan 7, 2022 · 5 comments
Open

Ipe 7.2.24 produces significantly larger files than 7.1.7 #412

ByteHamster opened this issue Jan 7, 2022 · 5 comments
Assignees

Comments

@ByteHamster
Copy link
Contributor

ByteHamster commented Jan 7, 2022

I have some pdf files that were created with Ipe 7.1.7. When I open these files in Ipe 7.2.24 and save them without modifications, their file size gets significantly larger. This file (removed, see other file below), for example, grows by a factor of more than 5: from 115 KB to 630 KB. The behavior is the same on Arch Linux and MacOS.

When then including that file in a LaTeX beamer presentation, the increased size becomes a rather big problem. I noticed this because my presentation suddenly went from 3 MB to 70 MB - just by saving one Ipe figure (different slides of the beamer presentation show different pdf pages of the Ipe figure).

My workaround is to run ghostscript on the presentation after compiling: gs -q -sDEVICE=pdfwrite -o presentation-size-fix.pdf presentation.pdf. With that 70 MB file, running the ghostscript command takes about 1-2 minutes (compared to about 2 seconds with the old Ipe image), making it rather hard to work with the presentation.

Do you have an idea why saving the Ipe file with a more recent version increases the file size that much?

@otfried
Copy link
Owner

otfried commented Jan 7, 2022

Between 7.1.7 and 7.2.24, Ipe switched to a much more general method for including the PDF resources from the pdflatex output. And for some reason, pdflatex already produces a rather large file (278kB, larger then the Ipe 7.1.7 version of the entire document).

How do you include the pages in your beamer document? pdflatex should be smart enough that when you include various pages using \includegraphics[page=xx], then it should not make duplicates of the PDF resources from the included file for each page. On the other hand, if you export individual pages and then include those, you very quickly blow up the file size (related to issue #193).

@ByteHamster
Copy link
Contributor Author

ByteHamster commented Jan 7, 2022

Thank you for your reply. I do use \includegraphics<xx>[page=xx]{filename.pdf}, all from a single pdf file. Apparently, pdflatex sometimes does produce duplicates of pdf resources. I have another file for you where the effect is even more extreme.

Steps to reproduce using this zip: https://drive.google.com/file/d/1XT9WGbGXYAelK3188hV7Mk1k0kS--yW9/view?usp=sharing

  • Note that image.pdf is 200 kB
  • Run pdflatex demo.tex
  • demo.pdf is 215 kB

Clean up the files LaTeX generated. Now, open image.pdf in Ipe and save it without modification.

  • Note that image.pdf is now 4 MB (that's 20 times larger)
  • Run pdflatex demo.tex
  • demo.pdf is 41 MB (that's 190 times larger)
$ pdflatex --version
pdfTeX 3.141592653-2.6-1.40.22 (TeX Live 2021/Arch Linux)
kpathsea version 6.3.3
Copyright 2021 Han The Thanh (pdfTeX) et al.
There is NO warranty.  Redistribution of this software is
covered by the terms of both the pdfTeX copyright and
the Lesser GNU General Public License.
For more information about these matters, see the file
named COPYING and the pdfTeX source.
Primary author of pdfTeX: Han The Thanh (pdfTeX) et al.
Compiled with libpng 1.6.37; using libpng 1.6.37
Compiled with zlib 1.2.11; using zlib 1.2.11
Compiled with xpdf version 4.03

@ByteHamster
Copy link
Contributor Author

Would it be possible to have Ipe execute that ghostscript command¹ after building the document, but before embedding its own data? Then the "more general method for including the PDF resources from the pdflatex output" can be kept while still producing output files with a more reasonable size.

¹ gs -q -sDEVICE=pdfwrite -o presentation-size-fix.pdf presentation.pdf

@otfried otfried self-assigned this Jan 22, 2022
@otfried
Copy link
Owner

otfried commented Jan 24, 2022

I was convinced that pdflatex is smart enough to not duplicate resource when you include multiple pages from the same document - the reason being, that I wrote that code for pdflatex in 2001. It turns out that this does not work anymore, at least not when used the standard way through \includegraphics. That explains why demo.pd is so gigantic: it has all the fonts and all the XForm objects from image.pdf 28 times.

However, there is a simple trick:

If you modify your file demo.tex to start like this:

\documentclass{beamer}
\pdfximage{image.pdf}   %% this is the new line
\begin{document}
\begin{frame}{Test}
\includegraphics<1>[page=1,width=0.9\textwidth]{image.pdf}%
\includegraphics<2>[page=2,width=0.9\textwidth]{image.pdf}%
\includegraphics<3>[page=3,width=0.9\textwidth]{image.pdf}%
...

then the duplication of resources does not happen. You can easily check that the result has each font only once.

This doesn't change that image.pdf is very large. You have used about 4600 separate text objects, most of them containing only a single letter - and Ipe makes a PDF XForm object for each of these, that's a lot of overhead for what should be a single letter. The reason the file was much smaller in earlier versions of Ipe is that Ipe then simply included the PDF-stream for the XForm inside the page stream. That works only for simple text, it would make it impossible to use Tikz, \includegraphics, or other interesting stuff inside text objects. I will have to have a closer look what I can do to improve this. Perhaps Ipe can detect when it is safe to include the form literally and optimize the output then. It should also detect when text objects are identical and reuse the XForm then.

Running ghostscript basically parses the entire document and rendering it into a PDF writer. In this case, that eliminates the overhead of the PDF XForms - but it's not always the appropriate thing to do (in other cases you would duplicate the contents of XForms, leading to files that are actually larger), and I'm not sure if it can actually handle documents with links and named objects.

@ByteHamster
Copy link
Contributor Author

Thank you for looking into this!

If you modify your file demo.tex to start like this [...] then the duplication of resources does not happen.

I can confirm that this reduces the file size of my "production" files (not test file) from 70MB to 8MB. The GhostScript command from above still brings it down to about 3MB, so I will probably leave that one in the makefile. (Also, I don't think my colleagues will remember to add the pdfximage "import" for every changed image). Unfortunately, the GhostScript command still takes a very long time even with the pdfximage workaround - so it would still be great if both workarounds would not be necessary :)

Perhaps Ipe can detect when it is safe to include the form literally and optimize the output then. It should also detect when text objects are identical and reuse the XForm then.

That sounds awesome! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants