Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandoc runs out of memory, unless trace logging is enabled #8762

Closed
erakadjiev opened this issue Apr 7, 2023 · 7 comments
Closed

Pandoc runs out of memory, unless trace logging is enabled #8762

erakadjiev opened this issue Apr 7, 2023 · 7 comments

Comments

@erakadjiev
Copy link

Hello,

We are relying on pandoc for converting markdown to HTML. We ran into an out of memory issue, similar to the ones that had been reported several times before, and found that enabling trace logging helps avoid the problem (as mentioned in e.g. #3169).

The input markdown was ~18MB (with inlined images) and the conversion was failing, because pandoc reached the available memory and the OS (Ubuntu 20.04) killed the process. We tried increasing the assigned memory to over 3GB and the conversion completed fine.

We also ran the same conversion with --trace and it completed fine, requiring only 100-200MB memory. So about 15-20 times less than without trace logging.
Now we are forced to always run pandoc with the --trace flag to avoid excessive memory usage and the process being killed.

As a side note, while investigating this issue, we initially used pandoc 2.16.2, then upgraded to 2.19.2 and finally we tried 3.1.2. 3.1.2 did help reduce the default memory usage (without trace), but only marginally.

It would be good to fix the default memory usage of the application, so that we don't need to rely on trace logging for it to work for slightly large inputs.

Thank you!

@erakadjiev erakadjiev added the bug label Apr 7, 2023
@jgm
Copy link
Owner

jgm commented Apr 7, 2023

That's quite interesting, and it seems to point to some kind of laziness-related issue (tracing probably forces evaluation of thunks that otherwise accumulate).
Can you share a sample (or create one with nonsense data) that helps reproduce the issue?

@erakadjiev
Copy link
Author

Sure, we will put together a minimal example that demonstrates the issue.

In the meantime, for the same input, we ran into a related issue, where the --trace flag didn't help, unfortunately. After the MD to HTML conversion is done, we have a post-processing step (HTML to HTML), where we add a footer based on a template. This ran out of memory and --trace didn't help. We will try to provide an example for this as well.

@erakadjiev
Copy link
Author

Please find below an example that demonstrates the issue. The markdown source (with embedded, base64-encoded images), the HTML template, and the command to call pandoc are included.

convert_example_one_step.zip

If you run the command as included, the memory usage will be high. However, if you add --trace, it will be much lower.

Please ignore the additional issue with the post-processing step I mentioned in my previous comment. We improved that command, which helped reduce the memory usage.

@jgm
Copy link
Owner

jgm commented Apr 13, 2023

My test, without --trace:

<<ghc: 189025903488 bytes, 23080 GCs, 73773113/1245890024 avg/max bytes residency (164 samples), 3324M in use, 0.000 INIT (0.005 elapsed), 18.951 MUT (19.689 elapsed), 12.751 GC (13.748 elapsed) :ghc>>

With --trace:

<<ghc: 192495558328 bytes, 23487 GCs, 51084730/84005144 avg/max bytes residency (232 samples), 188M in use, 0.000 INIT (0.006 elapsed), 17.635 MUT (17.859 elapsed), 11.156 GC (11.460 elapsed) :ghc>>

Big difference in memory in use!
3324MB vs 188MB

@erakadjiev
Copy link
Author

Yes, our observations were very similar to yours.
Initially, we even ran out of memory when running without --trace, as we had less than 3GB allocated.

@erakadjiev
Copy link
Author

Thank you for fixing this issue! Just wanted to confirm that after upgrading to Pandoc 3.1.5 and removing the --trace argument, everything seems to work well and there's no more excessive memory usage.

@jgm
Copy link
Owner

jgm commented Jul 25, 2023

Fantastic!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants