-
Notifications
You must be signed in to change notification settings - Fork 923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Architecture for less iterations over the library #1112
Comments
We can probably trim that number down but I don't think it will make a significant change in the speed. Of course if we can improve that without making the code harder to understand, I would be very interested |
What's the best way to have some good data on this? I tried running the benchmarks but they don't really help me understand the bottlenecks. So i ran flamegraph on building medium-blog and this was the result. I've never done profiling before so i'm not entirely sure what's happening here. But rayon appears to take quite some space on there.
It's entirely possible, but it will take some time. Let's not hurry too much ;). In the meantime, are benchmarks run periodically? It would be nice to have benchmarks on every commit/PR. Is it easy to achieve with the current pipelines? |
Try to disable rayon (RAYON_NUM_THREADS=1 as env var should work) before profiling to get rid of all the rayon stuff in the profile. I haven't done profiling on Zola for a while though, I'll need to do a small write-up of how I do it for reference.
Not really. The generated benchmarks are just there for sanity before making big changes and to have an order of idea of the speed. It would be nice to have them run on CI though. |
With 1 thread, for medium-blog: For huge-blog (still 1 thread): As we can see, the bigger the site, the more rayon/Vec operations will take a big chunk of the pie. This is confirmed by timing the building of both sites and calculating the average number of pages rendered per second. Medium site built 250 pages in 0.56s (446 pages/s) while huge site built 10000 pages in 265s (38 pages/s) which is more than 10 times slower. I may be wrong but i don't think serialization is the bottleneck here because it should be linear complexity. AFAIK good map/vector algorithms operate in o(n * log(n)). But running many such operations on the whole library we may be closer to o(n² * log(n)). I'll investigate this some more. |
It's possible! I haven't been benchmarking for a long time and maybe it changed since I did. |
So after running some more benchmarks i believe the numbers i had were completely wrong, skewed by swapping. Disabling swapping ( I wrote a simple script which runs 3 passes for each number of sections contained between MIN and MAX (incrementing by INC) and prints the average build time on those 3 passes. Each section contains 10 pages. For example, I'm happy with it as a benchmarking tool. I'm just not sure it should be included in the repo because it would add bash as a dependency for running benchmarks (there's a few bashisms in the script). Maybe gen.py and grow.sh should be rewritten in rust? Or in plain, portable shell? Here's the benchmark data for further reference. Times are in seconds:
|
Some bash scripts would be fine but I would rely on |
Already addressed somewhat in #1218 Looking through the code I think it would be non-trivial, but fairly doable to pull the path collision checks and taxonomy population into the bigger loop to save a few iterations. This would mean moving some of the checking work to the insertion state, which may or may not be faster, I'm not sure. Everything needs to be populated before we can start rendering, so that will almost certainly have to be a separate loop, though I'm willing to be corrected on that. Since rendering needs to be its own loop, the best we can do after that is combining internal and external link checking. That would bring us down to 3 loops in the code, though I'm not sure how many iterations that would actually give us. I could look at it if I have some time in the next few weeks, but I wanted to check first how much demand there still is for this, and whether there are any objections to any of the things I mentioned. |
Performance improvements are always welcome! |
So I have a few things about this issue that are worth discussing:
I'm not 100% sure since the program flow is a bit hard to understand at times, but if we are willing to keep a few more index-like objects in the library (like
The thing is that these things will require more memory since we have to keep those index alive during the whole thing. Personally, I think this is fine, but I didn't want to put time into a PR that might be rejected for that reason, so I want to check here first. One upside of this idea is that I don't think it is possible to unify beyond that unless we find a better way of lazy loading page & section |
@Keats thoughts? would that be an acceptable tradeoff? |
I think 1 can be done but I believe rendering is by far the biggest part of the runtime so while it's nice to optimise the insertion, it's not going to be as impactful as making taxonomies rendering faster for example (the current slowest part of Zola by far I think). |
This is not exactly a bug report, but when i was investigating site::load() i realized the pages/sections collections get iterated over many times. For a simple build, it iterates over:
Without check mode, that's a total of 9 iterations over sections and 8 iterations over pages. This is only for iterations after the library is built. There may be some compiler magic behind the scenes to "reunite" some iterations in there, but my intuition is there's a lot of potential performance improvements to be investigated here.
I think some of the useless iterations are due to the absence of library structure due to the glob pattern (reuniting children pages with related sections, and building content hierarchy). Maybe a recursive descent into the content folder could help here?
The text was updated successfully, but these errors were encountered: