Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two massive performance improvements for large sites #6730

Merged
merged 2 commits into from
Jan 31, 2018

Conversation

parkr
Copy link
Member

@parkr parkr commented Jan 31, 2018

  1. Don't calculate site.related_posts unless we request it.
  2. Calculate site.related_posts without LSI much more efficiently.

I've been using @jvns's new rbspy program. I generated a random site of 10,000 posts (no layouts) to test this. When I got back the flamegraph, it was obvious that most_recent_posts was the culprit for a significant amount of time. After I applied this patch, regeneration went from 150s to 50s. We saved about 100s (1m40s) of time just by not generating related posts unless we needed it!

…d of all the documents

When you have 10,000 posts, this saves a _ton_ of time.
Instead, assign the current document in the SiteDrop and generate the related_posts if requested by Liquid.
@DirtyF DirtyF requested a review from a team January 31, 2018 02:39
# recent posts.
# We should remove this in 4.0 and switch to `{{ post.related_posts }}`.
def related_posts
return nil unless @current_document.is_a?(Jekyll::Document)
Copy link
Member

@ashmaroli ashmaroli Jan 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also return early if the current document is not a Post..?
Like for a site with two posts and another collection with 1000 documents..?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also return early if the current document is not a Post..?

I don't think so. A site could be using site.related_posts on a Document to display the most recent blog posts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point.. 👍

@DirtyF
Copy link
Member

DirtyF commented Jan 31, 2018

FYI: no perceptual difference for build time on real websites — tested with Jekyll's own website (200+ documents) and https://github.com/mmistakes/front-matter-defaults (800+ documents).

@ashmaroli
Copy link
Member

no perceptual difference for build time on real websites

I second Frank's opinion.. I too couldn't see any huge difference with local tests..
(I just assumed it was not perceptible enough on Windows)

@pathawks
Copy link
Member

Not too much difference in build time on a site with a couple hundred posts, but I can see the value here.

Copy link
Member

@pathawks pathawks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even without a measurable performance improvement, most of these changes make sense to me.

As long as we are confident that this won't have unintended side effects, I am 👍

@pathawks pathawks requested a review from a team January 31, 2018 03:34
Copy link
Member

@ashmaroli ashmaroli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the changes look good to me.., something doesn't feel right.. (irrespective of discernible perf gains)

Definitely requires some more testing.. or new tests to our suite..

Blocking this for the time-being..

@parkr
Copy link
Member Author

parkr commented Jan 31, 2018

@ashmaroli Generate a site with 10,000 posts and you'll notice the difference here. Why are you interested in blocking this PR? What doesn't feel right?

I used this script to generate 10,000 posts:

~$ cat script/generate
#!/bin/bash

NUM_TO_WRITE=1000
if [ "$#" -ne "0" ]; then
    NUM_TO_WRITE="$1"
fi

rm -rf _posts
mkdir -p _posts

date_to_write=$(date +%s --utc)

for i in `seq 1 $NUM_TO_WRITE`; do
    yy_mm_dd=$(date --date="@$date_to_write" "+%Y-%m-%d")
    yy_mm_dd_hh_mm_ss_z=$(date --date="@$date_to_write" "+%Y-%m-%dT%H-%M-%S%z")
    post_filename="_posts/$yy_mm_dd-post-$i.md"
    cat > $post_filename <<EOS
---
title: "Post $i"
date: "$yy_mm_dd_hh_mm_ss_z"
---
Hello dear friends!

It is a pleasure to see you all again.
EOS
    echo "Wrote $post_filename"
    date_to_write=$(expr "$date_to_write" - 3600)
done

I called script/generate 10000 then rbspy record jekyll build and received impressive results.

Before: https://rbspy.parkermakes.tools/rbspy-2018-01-30-o4qFcKDwSo.svg
After: https://rbspy.parkermakes.tools/rbspy-2018-01-31-mD30LCP2L5.svg

@ashmaroli ashmaroli dismissed their stale review January 31, 2018 07:01

Tested locally on Windows.. no issues..

@ashmaroli
Copy link
Member

Why are you interested in blocking this PR?

@parkr Since https://github.com/rbspy/rbspy doesn't run on Windows yet, I needed to test things out manually.. and that is a huge time-consumer because Jekyll is 3-4x slower on Windows.. and build times fluctuate over a range of 1-4s for the same Jekyll version.. so it calls for averaging across 3-4 builds..

Anyways, finally walked through each change here and groked through the implications..
The following change being my favorite..

    def most_recent_posts
-     @most_recent_posts ||= (site.posts.docs.reverse - [post]).first(10)
+     @most_recent_posts ||= (site.posts.docs.last(11).reverse - [post]).first(10)
    end

@ashmaroli
Copy link
Member

I set up a fresh new test workspace and generated 5500 posts using Parker's script above..
The following are build times on Windows using jekyll-3.7.2 and this PR

jekyll-3.7.2 related-posts-perf-improvements
74.15 seconds 63.99 seconds
81.706 seconds 62.164 seconds
79.597 seconds 62.966 seconds

@DirtyF DirtyF added this to the v3.8.0 milestone Jan 31, 2018
@DirtyF
Copy link
Member

DirtyF commented Jan 31, 2018

🔥 🚒

@jekyllbot: merge +minor

@jekyllbot jekyllbot merged commit b4985d1 into master Jan 31, 2018
@jekyllbot jekyllbot deleted the related-posts-perf-improvements branch January 31, 2018 20:33
jekyllbot added a commit that referenced this pull request Jan 31, 2018
@pathawks
Copy link
Member

pathawks commented Feb 5, 2018

This is a HUGE performance improvement for a large site! We should release this soon.

@mattr-
Copy link
Member

mattr- commented Feb 6, 2018

If we can, we should push this down to a stable 3.7 release or just release 3.8 now with the massive performance gains.

@jekyll jekyll locked and limited conversation to collaborators Jul 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants