Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upReview usage of Rayon & improve performance #420
Comments
Keats
added
enhancement
help wanted
good first issue
labels
Sep 11, 2018
This comment has been minimized.
This comment has been minimized.
|
Heh. Spent a few minutes on this. Before:
After:
This call to
|
This comment has been minimized.
This comment has been minimized.
|
That's interesting :o Just to be sure though: you are not running it in Can you push the branch with timings on? Looks like a good base to experiment on. From what I've seen with more pages (1000-10000), ~95% of the time is spent in https://github.com/Keats/gutenberg/blob/ae7a65b51f3dda4d6789483e930574437c6651e6/components/site/src/lib.rs#L850-L855 writing the pages to disk, which should be pretty fast in theory. |
This comment has been minimized.
This comment has been minimized.
|
That is release mode, and it's generating my own site. (The actual This is on a dual hex core Westmere Xeon, so default thread count is 24. A quick once-over of syscall activity suggests a lot of contention and yielding with higher thread counts, but cutting it with |
This comment has been minimized.
This comment has been minimized.
|
I'll push a test branch in a bit, got soup to make and eat first :) |
This comment has been minimized.
This comment has been minimized.
|
I think https://docs.rs/rayon/1.0.2/rayon/iter/trait.IndexedParallelIterator.html#method.with_min_len should help to avoid wasting time parallelizing small things. Rendering 10k pages from a single section however should be done concurrently. |
This comment has been minimized.
This comment has been minimized.
|
medium-kb (1000 pages):
The scaling is just pants. |
This comment has been minimized.
This comment has been minimized.
|
Something feels wrong, surely with n threads it should be faster than a single one since they don't do any locking... Maybe I'm using rayong wrongly |
added a commit
to Freaky/gutenberg
that referenced
this issue
Sep 12, 2018
This comment has been minimized.
This comment has been minimized.
|
Tried a quick hack with crossbeam scope and channel and see basically the same thing, with scaling for rendering stopping around 4 and going negative soon after. So whatever the problem it doesn't seem rayon-specific. |
This comment has been minimized.
This comment has been minimized.
|
Dear me.
A build on medium-kb spends about half its time just checking orphans by repeatedly searching a vec. Twice. I dread to think how long it would take on a huge site :/ After replacing with a HashSet:
The improvement increases with larger sites. |
This comment has been minimized.
This comment has been minimized.
|
Heh, huge-kb goes from 150 seconds to 9 :) |
This comment has been minimized.
This comment has been minimized.
|
Pull request in #424. |
This comment has been minimized.
This comment has been minimized.
|
I guess this can be closed now :o |
This comment has been minimized.
This comment has been minimized.
|
Just saw Freaky@986fda2 you want to do a PR with it as well? |
This comment has been minimized.
This comment has been minimized.
|
Might be worth using dedicated thread pools for IO and rendering, rather than just throwing everything on the global rayon pool. From what I've seen the IO-bound stuff scales fairly well (at least with SSD/from cache, HDD's might disagree), while rendering bottlenecks quite quickly, at least on my machine.
|
This comment has been minimized.
This comment has been minimized.
|
PR #427 for the fold/reduce → collect tweak. |
This comment has been minimized.
This comment has been minimized.
|
I did a few more tests and while huge-kb is now very fast to render, big-blog is still slow: 44s on my machine. |
Keats
changed the title
Review usage of Rayon
Review usage of Rayon & improve performances
Sep 18, 2018
This comment has been minimized.
This comment has been minimized.
|
I also had a look at replacing some clone() by using |
Keats
changed the title
Review usage of Rayon & improve performances
Review usage of Rayon & improve performance
Sep 18, 2018
This comment has been minimized.
This comment has been minimized.
|
So, erm. huge-blog.
It's up and down like a yoyo, peaking at 24GB, dropping to 5GB, then peaking back at 24GB. Over and over. 44 seconds? I killed it after 6 minutes and nearly 2 hours of CPU burnt roughly equally between user and system. |
This comment has been minimized.
This comment has been minimized.
|
What on earth.
It takes over 6 seconds just to work out what name the template should have? |
This comment has been minimized.
This comment has been minimized.
|
Right. So. This bit. let template_name = match self.root {
PaginationRoot::Section(s) => {
context.insert("section", &s);
s.get_template_name()
}from If I comment that one line out, huge-blog builds in 39 seconds and peaks at 3.9GB instead of 24. |
This comment has been minimized.
This comment has been minimized.
|
I somehow missed that line yesterday |
This comment has been minimized.
This comment has been minimized.
|
After doing that (6903975), |
This comment has been minimized.
This comment has been minimized.
|
Looking at https://forestry.io/blog/hugo-vs-jekyll-benchmark/ it still seems Gutenberg is about 5-10x slower than Hugo but it is at least in the same ballpark now. |
This comment has been minimized.
This comment has been minimized.
|
There's a lot of noise, but this flame graph looks... interesting. A lot of time seems to be going into generating backtraces. |
This comment has been minimized.
This comment has been minimized.
|
I removed Paginator::pagers in the next branch and @Freaky how fast it is to run on your beefy machine? |
This comment has been minimized.
This comment has been minimized.
|
huge-blog vs RAYON_NUM_THREADS:
Still runs into diminishing returns long before I run out of cores, but the negative scaling seems to have mostly gone. Peak memory use is now down to 2.7GB even with full 24-thread concurrency - still fairly high, but much better than when we started, which was more like 24GB. |
This comment has been minimized.
This comment has been minimized.
|
Still seeing negative scaling on my own site:
Might be worth grabbing other real-world examples and see if this is a common pattern. |
This comment has been minimized.
This comment has been minimized.
|
Whoa those are some big differences.
My blog shows roughly the same thing as the doc site: it is the fastest at 2 and gets worse and worse after that. |
This comment has been minimized.
This comment has been minimized.
|
On big-site and huge-kb, RAYON_NUM_THREAD=3 is the best for me. I don't really know what to do there. |
This comment has been minimized.
This comment has been minimized.
|
Docs:
|
This comment has been minimized.
This comment has been minimized.
|
https://hur.st/flame/gutenberg-docs-rayon-24-99215.svg syntect occurs in nearly 62% of the sampled stacks. Nearly 30% in |
This comment has been minimized.
This comment has been minimized.
|
40% of the total samples are in Instrumenting the
|
This comment has been minimized.
This comment has been minimized.
|
39.2% in pub fn new(s: &str) -> Result<Scope, ParseScopeError> {
let mut repo = SCOPE_REPO.lock().unwrap();
repo.build(s.trim())
} |
This comment has been minimized.
This comment has been minimized.
|
A quick hack using syntect master looks promising:
My personal site is much better too. Before:
After:
|
added a commit
to Freaky/gutenberg
that referenced
this issue
Oct 2, 2018
This comment has been minimized.
This comment has been minimized.
|
Updated flame graph for comparison purposes: https://hur.st/flame/gutenberg-docs-rayon-24-syntect3.svg |
This comment has been minimized.
This comment has been minimized.
|
That's some great news! I have been following the work on syntect and v3 should be released soon-ish |
This comment has been minimized.
This comment has been minimized.
|
Wow these speedups look amazing <3 |
This comment has been minimized.
This comment has been minimized.
|
So I tried to remove the It is .... actually slower than I would appreciate some pairs of eyes on that PR to spot what I am doing wrong. The code is still very raw and has some pretty bad parts but it builds site correctly and passes all the tests except the ones from the I believe it would be possible to remove a good chunk of clone by passing borrowed |
This comment has been minimized.
This comment has been minimized.
|
Ooh, neat. huge-blog builds here, but with nearly 2x the memory use and 2x the runtime. Flamegraph: https://hur.st/flame/gutenberg-huge-blog-slotmap-10aba2.svg Nearly half of runtime's in serde. |
This comment has been minimized.
This comment has been minimized.
|
Yep I get the same results with valgrind, it's all spent cloning Values. I don't really understand why it is copying twice more than before though, I expected it to be the same :/ |
This comment has been minimized.
This comment has been minimized.
|
Well, now instead of going from Without Tera being able to borrow it, I don't think it's really going to help. |
This comment has been minimized.
This comment has been minimized.
|
@Freaky I've pushed a commit that uses this commit Keats/tera@efb8af8 and we're back to reasonable speed o/ Next step is to clean/document the code and rewrite the rebuild component as it is now ~30x faster to call |
This comment has been minimized.
This comment has been minimized.
|
Yes, that's much better - half the runtime, nearly half the memory use, though latter's slightly higher than the previous baseline - 2.7GB -> 5GB -> 2.9GB. That's only 1.5% of my main machine, but it's probably worth considering it's also about 75% of a lot of systems, particularly cheap VPS'. |
This comment has been minimized.
This comment has been minimized.
To be fair you probably don't build a 10k pages site on a VPS. Smashing magazine moved to Hugo and only had 7500 pages: https://discourse.gohugo.io/t/smashing-magazine-s-redesign-powered-by-hugo-jamstack/5826/8 |
This comment has been minimized.
This comment has been minimized.
|
Turns out the caching layer is actually completely useless and therefore so is Tera Caching would only help if it was somehow possible to pass |
This comment has been minimized.
This comment has been minimized.
|
#459 is ready for reviews if anyone has time! |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Excellent. I'm seeing maybe ~5% performance bump on huge-blog, but also a ~13% reduction in memory use. Syntect savings are as seen in my experimental branch, which is probably the biggest thing for me - 1.4s to 0.2s is quite noticeable :) |
This comment has been minimized.
This comment has been minimized.
|
Latest idea: #480 This would solve some of the repeated serializations we're doing and very often not even using at all but is a slightly worse UX so I'm a bit conflicted. |
This comment has been minimized.
This comment has been minimized.
|
Memory usage is still way too high but that's good enough for the 0.5.0 release tomorrow (hopefully) |
This comment has been minimized.
This comment has been minimized.
tshepang
commented
Nov 16, 2018
|
What figures are you seeing? |
This comment has been minimized.
This comment has been minimized.
|
To render a blog with 10k pages with taxonomies + pagination + syntax highlighting: around 15s and 3GB of ram. |

Keats commentedSep 11, 2018
•
edited
It looks like the site loading is done in parallel but the rendering is not: threads are spawned but only one seem used.
TODOs:
Site::render_sectionto ensure it uses as many threads as possible efficiently, might be a case of using https://docs.rs/rayon/1.0.2/rayon/iter/trait.IndexedParallelIterator.html#method.with_min_lenThere are some benches in
site/benchesbut you will need to run thegen.pyfirst to generate some. Given the current speed, themedium-blogandmedium-kbare probably the best ones to run.