Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extreme slowdown with fenced code blocks extension on big files #148

Closed
andersarpi opened this issue Feb 11, 2015 · 3 comments · Fixed by #149
Closed

Extreme slowdown with fenced code blocks extension on big files #148

andersarpi opened this issue Feb 11, 2015 · 3 comments · Fixed by #149

Comments

@andersarpi
Copy link

It seems using EXTENSION_FENCED_CODE slows down the markdown parsing significantly. I noticed this while doing parsing on about 300 x 50k files with mixed content. The parsing was slower than I thought it should be so I started playing with the extensions and noticed that EXTENSION_FENCED_CODE had an extreme effect on performance. See below for the numbers I got.

Bench output

test n ns/op
BenchmarkNoFenceCode10k 3000 471782 ns/op
BenchmarkFenceCode10k 500 3092759 ns/op
BenchmarkNoFenceCode100k 500 3678906 ns/op
BenchmarkFenceCode100k 10 196948480 ns/op
BenchmarkNoFenceCode1000k 50 33028148 ns/op
BenchmarkFenceCode1000k 1 7491844000 ns/op

Note it's not just a linear increase with size.

Code

func BenchmarkNoFenceCode10k(b *testing.B) {
    for i := 0; i < b.N; i++ {
        content, _ := ioutil.ReadFile(`content\pages\10k.md`)

        htmlFlags := blackfriday.HTML_USE_XHTML
        renderer := blackfriday.HtmlRenderer(htmlFlags, "", "")
        _ = blackfriday.Markdown(content, renderer, 0)
    }
}

func BenchmarkFenceCode10k(b *testing.B) {
    for i := 0; i < b.N; i++ {
        content, _ := ioutil.ReadFile(`content\pages\10k.md`)

        htmlFlags := blackfriday.HTML_USE_XHTML
        renderer := blackfriday.HtmlRenderer(htmlFlags, "", "")
        _ = blackfriday.Markdown(content, renderer, blackfriday.EXTENSION_FENCED_CODE)
    }
}

func BenchmarkNoFenceCode100k(b *testing.B) {
    for i := 0; i < b.N; i++ {
        content, _ := ioutil.ReadFile(`content\pages\100k.md`)

        htmlFlags := blackfriday.HTML_USE_XHTML
        renderer := blackfriday.HtmlRenderer(htmlFlags, "", "")
        _ = blackfriday.Markdown(content, renderer, 0)
    }
}

func BenchmarkFenceCode100k(b *testing.B) {
    for i := 0; i < b.N; i++ {
        content, _ := ioutil.ReadFile(`content\pages\100k.md`)

        htmlFlags := blackfriday.HTML_USE_XHTML
        renderer := blackfriday.HtmlRenderer(htmlFlags, "", "")
        _ = blackfriday.Markdown(content, renderer, blackfriday.EXTENSION_FENCED_CODE)
    }
}

func BenchmarkNoFenceCode1000k(b *testing.B) {
    for i := 0; i < b.N; i++ {
        content, _ := ioutil.ReadFile(`content\pages\1000k.md`)

        htmlFlags := blackfriday.HTML_USE_XHTML
        renderer := blackfriday.HtmlRenderer(htmlFlags, "", "")
        _ = blackfriday.Markdown(content, renderer, 0)
    }
}

func BenchmarkFenceCode1000k(b *testing.B) {
    for i := 0; i < b.N; i++ {
        content, _ := ioutil.ReadFile(`content\pages\1000k.md`)

        htmlFlags := blackfriday.HTML_USE_XHTML
        renderer := blackfriday.HtmlRenderer(htmlFlags, "", "")
        _ = blackfriday.Markdown(content, renderer, blackfriday.EXTENSION_FENCED_CODE)
    }
}
@dmitshur
Copy link
Collaborator

The current implementation is simple, but quite inefficient.

I expect a large part of the inefficiency is due to the simple fix for issue #45 (see PRs #56 and #60). It's done by doing additional passes over the input, and possibly rewriting it to allow further passes to work correctly (using up more memory and creating additional work for GC).

I think an optimization effort that preserves correct behavior would be welcome. Unfortunately, I don't have spare cycles for this now, but I can help review any PRs.

@tw4452852
Copy link
Contributor

I try to fix this issue in the PR #149, could you help to review? Thanks.

@rtfb rtfb closed this as completed in #149 Feb 11, 2015
@dmitshur
Copy link
Collaborator

@holmstrom, can you share the content of content/pages/10k.md, content/pages/100k.md and content/pages/1000k.md files please?

I'd like to add these benchmarks to blackfriday so it's possible to avoid regressions (like when fixing #279).

dmitshur added a commit that referenced this issue Jul 15, 2016
In first pass, there may not be a trailing newline after a fenced code
block yet. Make newline optional in isFenceLine when calling
fencedCodeBlock to detect the fenced code block it anyway. This is more
complex, but it avoids creating temporary buffers or modifying input in
order to maintain performance (see #148).

Document and rename fencedCode to fencedCodeBlock.

Add regression tests.

Fixes #279.
dmitshur added a commit that referenced this issue Jul 15, 2016
In first pass, there may not be a trailing newline after a fenced code
block yet. Make newline optional in isFenceLine when calling
fencedCodeBlock to detect the fenced code block it anyway. This is more
complex, but it avoids creating temporary buffers or modifying input in
order to maintain performance (see #148).

Document and rename fencedCode to fencedCodeBlock.

Add regression tests.

Fixes #279.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants