Profiling site rendering performance #1140

Closed
dhcole opened this Issue May 23, 2013 · 21 comments

Projects

None yet

7 participants

Contributor
dhcole commented May 23, 2013

Just a general discussion topic:

I'm looking for some guidance on how to find out what parts of a site are causing long rendering times. Does anyone have ideas on how to profile the rendering process? I'm hoping to see where certain templates or includes are slow, likely because of complex liquid templates.

Owner
parkr commented May 23, 2013

That's a splendid idea - this would be especially helpful for people whose sites take a while to compile. I imagine loops are time-consuming but nothing else really jumps out at me as culprits. Would definitely be an amazing feature to have. @mattr- would probably know more, but I think there are stdlib or gem profilers that work quite well and could be fine-tuned to this use-case.

amitaibu commented Jul 2, 2013

We are also trying to figure out how big a Jekyll site can be (we have several big size sites we consider moving to static files).

We've started with https://github.com/niryariv/bigjekyll/tree/gh-pages that has 100K posts -- That was naive ;)

We've deleted the files and started with 150 files, using time jekyll build:

150 files - 20sec
300 files - 40sec
600 files - 130sec

Is it in the scope of Jekyll to deal with massive sites, or are we abusing it?

btw, here's some info about Dekyll, the tool that should allow us to manage big static sites.

Owner
parkr commented Jul 2, 2013

Jekyll can definitely handle large sites, but it takes a bit longer, as you'd expect. Every time the site is built, the entire site is read in, processed and written to your _site directory. We've been thinking about a caching system but it's lower priority than other functionality and bug fixes.

It took git on my MBA w/SSD a couple minutes just to git checkout gh-pages on your bigjekyll repo. Most of Jekyll is O(n), so I'd expect a proportionate increase in time for each added post.

How many times did your run those tests?

amitaibu commented Jul 2, 2013

@parkr ,
I'll re-execute the tests and report.

One of our sites has indeed around 100K pages, is there a way for us to speed the process build?
btw, Github pages didn't like it as-well:

The page build failed with the following error:

POSIX::Spawn::TimeoutExceeded
Owner
parkr commented Jul 2, 2013

It looks like Jekyll is slightly IO-bound so I'll see if we can limit reads and writes. Even with my SSD, CPU usage was maxing out at 80%. Try using Rubinius or JRuby. I have no idea if Jekyll works with either of them (we only test against MRI) but they might have some performance gains that would be helpful.

@benbalter would be able to tell you more about the GH Pages timeouts. I'd imagine the workers aren't happy with generating such massive sites so it quits after a couple minutes of generation time.

amitaibu commented Jul 3, 2013

Tested on my MacBook, so should be slower:

Tested on 100 files - a copy of the default post from Jekyll-bootstrap

time jekyll build

#1
real    0m18.725s
user    0m18.256s
sys 0m0.407s

#2
real    0m18.578s
user    0m18.124s
sys 0m0.423s

#3
real    0m18.467s
user    0m17.986s
sys 0m0.434s

So a 100K should be around 5 hours to build. Not ideal ;)

@dhcole do you have some new insights?

I like the idea too... Hope to see this in Jekyll someday -

Contributor
dhcole commented Oct 2, 2013

I don't think have any specific ideas on how to optimize for performance in Jekyll's codebase, but the single most effective thing we've done to help speed up rendering is limit or avoid for loops. A for loop in a template file can be an exponential performance hit. Also, break for loops when you can, for instance if you only need the first X iterations.

Here's some testing I did on a few samples of the jekyll new site with v1.0.2:

posts seconds time
1 0.824 0:0.824
100 2.644 0:2.644
1000 25.071 0:25.071
5000 186.715 3:6.715
10000 536.904 8:56.904

render-time

Perhaps some sort of async rendering of pages would be helpful. I'm not sure what's possible in Ruby, but I don't think reading the site is the slow part. So some way to read the whole site files into memory and then render pages and posts in a non-blocking process could allow for much greater scale and perhaps parallelization across CPU cores. Just some basic thoughts. Not sure how to make this actionable at this time.

Owner
parkr commented Oct 2, 2013

@dhcole Thanks for the analysis! In terms of you recommendation for non-blocking IO, the only option in MRI is to fork the process for a batch of writes over and over and let each individual process run the blocking IO "concurrently". In JRuby and Rubinius, we can use true threads, but we would love to be able to support MRI forever and for always.

Owner
parkr commented Oct 2, 2013

That said, if we're I/O-bound, CPU-concurrent writes won't matter. The HDD/SSD can only write so quickly.

Props @benbalter ^^

@parkr I'm looking into understanding how jekyll does all of it's processing and would be curious to look into a caching system that could perhaps be useful for speeding up our build times for the jekyll site we use at Shopify.

You mention that the thought of a caching system has come up but you don't have time/priority to actually work on it. Would it make sense to open another issue so we can talk about the implications/ideas that you have for the system?

Owner
mattr- commented Oct 15, 2013

I've been working on an incremental regeneration enhancement for jekyll for a little bit now. I suppose I should hurry up and finish it. 😉 This will mean that the first build might take some time but subsequent builds should take less time provided we don't hit a condition where the whole site needs to be rebuilt.

Well it wouldn't be any longer than the current build system correct?

What's preventing a PR? Do you need a hand with something or is there any
way I can help out?
On Oct 14, 2013 10:13 PM, "Matt Rogers" notifications@github.com wrote:

I've been working on an incremental regeneration enhancement for jekyll
for a little bit now. I suppose I should hurry up and finish it. [image:
😉] This will mean that the first build might take some time but
subsequent builds should take less time provided we don't hit a condition
where the whole site needs to be rebuilt.


Reply to this email directly or view it on GitHubhttps://github.com/mojombo/jekyll/issues/1140#issuecomment-26304628
.

Owner
mattr- commented Oct 15, 2013

It might end up being slightly longer since we'd have to store metadata about the build on the first run.

I only have two failing cucumber features at the moment. I've spent a lot of the time that I've been working on it trying to find something that would be usable to store the metadata in as well as refactoring some of the bits of jekyll's rendering pipeline so that I have a clean spot to insert this change in the build process at.

Is your branch available somewhere? We're doing some changes to Jekyll for performance reasons and I'l like to compare.

Owner
mattr- commented Oct 18, 2013

Nope. I don't have much in the way of actual code. (which you can read as "I got nothin")

Hmmm... alright. I'm going to open a new issue and pull you into it so we can talk about a caching system since it isn't completely related to this ticket. Cool?

Owner
mattr- commented Oct 18, 2013

There's already several other tickets, so I wouldn't open a new one. :-)

Other than that, it's totally cool.

Cool! #380 looked like the right place to bring this.

@amitaibu amitaibu referenced this issue in robwierzbowski/generator-jekyllrb Nov 23, 2013
Closed

Provide "incremental build" using grunt #38

Owner
parkr commented Jul 31, 2014

This is a great idea. We're starting from the ground up with Jekyll 3.0, and can add this in if necessary. Ruby provides great profiling tools, so it should be able to be done separately. Mostly our algorithm for reading in and writing is linear, so the more posts, the worse your performance (incrementally).

If you build something, please post it here so others can test it out!

@parkr parkr closed this Jul 31, 2014
@jekyllbot jekyllbot locked and limited conversation to collaborators Feb 27, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.