New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow performance with some markdown options #2730
Comments
+++ Winston Chang [Feb 19 16 09:37 ]:
Does it contain |
Yes, many of them. |
Thanks for the excellent, detailed bug report. I think this commit fixes the problem (I tested on your files). But let me know if it doesn't. |
jgm
added a commit
that referenced
this issue
Feb 21, 2016
This should give better performance. See #2730.
Great, thanks for the quick fix! |
c-forster
pushed a commit
to c-forster/pandoc
that referenced
this issue
Mar 4, 2016
This version avoids an exponential performance problem with `<script>` tags, and it should be faster in general. Closes jgm#2730.
c-forster
pushed a commit
to c-forster/pandoc
that referenced
this issue
Mar 4, 2016
This should give better performance. See jgm#2730.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When converting some files from Markdown to HTML, performance can be very slow, depending on the markdown variant and options selected. The time grows exponentially, as shown in the graph below.
For this example, I have a very basic input -- it's just raw HTML with some JSON content embedded in a
<script>
tag. (We're using markdown as an input format because sometimes the HTML is intermingled with markdown. But in this example, the actual content is just HTML.)index.html:
This is paired with a minimal template file:
And it's run through pandoc with:
The problem is that, with the content I have (the
"blah blah blah"
is replaced with a bunch of R code in a string), pandoc is extremely slow. Here's a graph of time, with 50KB, 100KB, and 150KB of text in the<script></script>
tags, with various flavors of markdown. Note the log y scale:For
markdown_strict
, the time for 50KB is 0.37 seconds; for 100KB, it's 3.1 seconds, and for 150KB, it's 25.5 seconds. If the input is a megabyte in size, the conversion time with this exponential growth rate would be about 30,000,000,000,000,000 seconds. My actual data is over two megabytes, so there would be many more zeros on there. :)In the graph, I've also compared it to
markdown
andcommonmark
, which are much faster, as well asmarkdown-markdown_in_html_blocks
andmarkdown+markdown_attribute
, which are just as slow asmarkdown_strict
. I would have expected themarkdown-markdown_in_html_blocks
andmarkdown+markdown_attribute
options to be faster thanmarkdown
, but that opposite appears to be true.The example input files are in https://github.com/wch/pandoc-hang, with a subdirectory for each input file size. For example, the 100KB input file is in:
https://github.com/wch/pandoc-hang/tree/master/simplified-100kb
I also tried changing the specific content in the
<script>
tags, and that makes a big difference in speed. In my use case, it's R code in a string, but when I replace it with just blank spaces, the conversion is fast for all of those settings. So there's something about that particular content that slows it down.The text was updated successfully, but these errors were encountered: