Extreme slowdown with fenced code blocks extension on big files #148

andersarpi · 2015-02-11T00:43:26Z

It seems using EXTENSION_FENCED_CODE slows down the markdown parsing significantly. I noticed this while doing parsing on about 300 x 50k files with mixed content. The parsing was slower than I thought it should be so I started playing with the extensions and noticed that EXTENSION_FENCED_CODE had an extreme effect on performance. See below for the numbers I got.

Bench output

test	n	ns/op
BenchmarkNoFenceCode10k	3000	471782 ns/op
BenchmarkFenceCode10k	500	3092759 ns/op
BenchmarkNoFenceCode100k	500	3678906 ns/op
BenchmarkFenceCode100k	10	196948480 ns/op
BenchmarkNoFenceCode1000k	50	33028148 ns/op
BenchmarkFenceCode1000k	1	7491844000 ns/op

Note it's not just a linear increase with size.

Code

func BenchmarkNoFenceCode10k(b *testing.B) {
    for i := 0; i < b.N; i++ {
        content, _ := ioutil.ReadFile(`content\pages\10k.md`)

        htmlFlags := blackfriday.HTML_USE_XHTML
        renderer := blackfriday.HtmlRenderer(htmlFlags, "", "")
        _ = blackfriday.Markdown(content, renderer, 0)
    }
}

func BenchmarkFenceCode10k(b *testing.B) {
    for i := 0; i < b.N; i++ {
        content, _ := ioutil.ReadFile(`content\pages\10k.md`)

        htmlFlags := blackfriday.HTML_USE_XHTML
        renderer := blackfriday.HtmlRenderer(htmlFlags, "", "")
        _ = blackfriday.Markdown(content, renderer, blackfriday.EXTENSION_FENCED_CODE)
    }
}

func BenchmarkNoFenceCode100k(b *testing.B) {
    for i := 0; i < b.N; i++ {
        content, _ := ioutil.ReadFile(`content\pages\100k.md`)

        htmlFlags := blackfriday.HTML_USE_XHTML
        renderer := blackfriday.HtmlRenderer(htmlFlags, "", "")
        _ = blackfriday.Markdown(content, renderer, 0)
    }
}

func BenchmarkFenceCode100k(b *testing.B) {
    for i := 0; i < b.N; i++ {
        content, _ := ioutil.ReadFile(`content\pages\100k.md`)

        htmlFlags := blackfriday.HTML_USE_XHTML
        renderer := blackfriday.HtmlRenderer(htmlFlags, "", "")
        _ = blackfriday.Markdown(content, renderer, blackfriday.EXTENSION_FENCED_CODE)
    }
}

func BenchmarkNoFenceCode1000k(b *testing.B) {
    for i := 0; i < b.N; i++ {
        content, _ := ioutil.ReadFile(`content\pages\1000k.md`)

        htmlFlags := blackfriday.HTML_USE_XHTML
        renderer := blackfriday.HtmlRenderer(htmlFlags, "", "")
        _ = blackfriday.Markdown(content, renderer, 0)
    }
}

func BenchmarkFenceCode1000k(b *testing.B) {
    for i := 0; i < b.N; i++ {
        content, _ := ioutil.ReadFile(`content\pages\1000k.md`)

        htmlFlags := blackfriday.HTML_USE_XHTML
        renderer := blackfriday.HtmlRenderer(htmlFlags, "", "")
        _ = blackfriday.Markdown(content, renderer, blackfriday.EXTENSION_FENCED_CODE)
    }
}

The text was updated successfully, but these errors were encountered:

dmitshur · 2015-02-11T04:47:11Z

The current implementation is simple, but quite inefficient.

I expect a large part of the inefficiency is due to the simple fix for issue #45 (see PRs #56 and #60). It's done by doing additional passes over the input, and possibly rewriting it to allow further passes to work correctly (using up more memory and creating additional work for GC).

I think an optimization effort that preserves correct behavior would be welcome. Unfortunately, I don't have spare cycles for this now, but I can help review any PRs.

tw4452852 · 2015-02-11T07:19:33Z

I try to fix this issue in the PR #149, could you help to review? Thanks.

dmitshur · 2016-07-15T15:39:16Z

@holmstrom, can you share the content of content/pages/10k.md, content/pages/100k.md and content/pages/1000k.md files please?

I'd like to add these benchmarks to blackfriday so it's possible to avoid regressions (like when fixing #279).

In first pass, there may not be a trailing newline after a fenced code block yet. Make newline optional in isFenceLine when calling fencedCodeBlock to detect the fenced code block it anyway. This is more complex, but it avoids creating temporary buffers or modifying input in order to maintain performance (see #148). Document and rename fencedCode to fencedCodeBlock. Add regression tests. Fixes #279.

tw4452852 mentioned this issue Feb 11, 2015

Delete unnecessary copy of input when enable fenced code extension #149

Merged

rtfb closed this as completed in #149 Feb 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extreme slowdown with fenced code blocks extension on big files #148

Extreme slowdown with fenced code blocks extension on big files #148

andersarpi commented Feb 11, 2015

dmitshur commented Feb 11, 2015

tw4452852 commented Feb 11, 2015

dmitshur commented Jul 15, 2016

Extreme slowdown with fenced code blocks extension on big files #148

Extreme slowdown with fenced code blocks extension on big files #148

Comments

andersarpi commented Feb 11, 2015

Bench output

Code

dmitshur commented Feb 11, 2015

tw4452852 commented Feb 11, 2015

dmitshur commented Jul 15, 2016