Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two massive performance improvements for large sites #6730

Merged
merged 2 commits into from Jan 31, 2018

Conversation

Projects
None yet
7 participants
@parkr
Copy link
Member

commented Jan 31, 2018

  1. Don't calculate site.related_posts unless we request it.
  2. Calculate site.related_posts without LSI much more efficiently.

I've been using @jvns's new rbspy program. I generated a random site of 10,000 posts (no layouts) to test this. When I got back the flamegraph, it was obvious that most_recent_posts was the culprit for a significant amount of time. After I applied this patch, regeneration went from 150s to 50s. We saved about 100s (1m40s) of time just by not generating related posts unless we needed it!

parkr added some commits Jan 31, 2018

RelatedPosts#most_recent_posts: only reverse the last 11 posts instea…
…d of all the documents

When you have 10,000 posts, this saves a _ton_ of time.
Do not assign site.related_posts unless it's requested.
Instead, assign the current document in the SiteDrop and generate the related_posts if requested by Liquid.

@DirtyF DirtyF requested a review from jekyll/performance Jan 31, 2018

@DirtyF DirtyF added the enhancement label Jan 31, 2018

# recent posts.
# We should remove this in 4.0 and switch to `{{ post.related_posts }}`.
def related_posts
return nil unless @current_document.is_a?(Jekyll::Document)

This comment has been minimized.

Copy link
@ashmaroli

ashmaroli Jan 31, 2018

Member

can we also return early if the current document is not a Post..?
Like for a site with two posts and another collection with 1000 documents..?

This comment has been minimized.

Copy link
@pathawks

pathawks Jan 31, 2018

Member

can we also return early if the current document is not a Post..?

I don't think so. A site could be using site.related_posts on a Document to display the most recent blog posts.

This comment has been minimized.

Copy link
@ashmaroli

ashmaroli Jan 31, 2018

Member

good point.. 👍

@DirtyF

This comment has been minimized.

Copy link
Member

commented Jan 31, 2018

FYI: no perceptual difference for build time on real websites — tested with Jekyll's own website (200+ documents) and https://github.com/mmistakes/front-matter-defaults (800+ documents).

@ashmaroli

This comment has been minimized.

Copy link
Member

commented Jan 31, 2018

no perceptual difference for build time on real websites

I second Frank's opinion.. I too couldn't see any huge difference with local tests..
(I just assumed it was not perceptible enough on Windows)

@pathawks

This comment has been minimized.

Copy link
Member

commented Jan 31, 2018

Not too much difference in build time on a site with a couple hundred posts, but I can see the value here.

@pathawks
Copy link
Member

left a comment

Even without a measurable performance improvement, most of these changes make sense to me.

As long as we are confident that this won't have unintended side effects, I am 👍

@pathawks pathawks requested a review from jekyll/stability Jan 31, 2018

@ashmaroli
Copy link
Member

left a comment

While the changes look good to me.., something doesn't feel right.. (irrespective of discernible perf gains)

Definitely requires some more testing.. or new tests to our suite..

Blocking this for the time-being..

@parkr

This comment has been minimized.

Copy link
Member Author

commented Jan 31, 2018

@ashmaroli Generate a site with 10,000 posts and you'll notice the difference here. Why are you interested in blocking this PR? What doesn't feel right?

I used this script to generate 10,000 posts:

~$ cat script/generate
#!/bin/bash

NUM_TO_WRITE=1000
if [ "$#" -ne "0" ]; then
    NUM_TO_WRITE="$1"
fi

rm -rf _posts
mkdir -p _posts

date_to_write=$(date +%s --utc)

for i in `seq 1 $NUM_TO_WRITE`; do
    yy_mm_dd=$(date --date="@$date_to_write" "+%Y-%m-%d")
    yy_mm_dd_hh_mm_ss_z=$(date --date="@$date_to_write" "+%Y-%m-%dT%H-%M-%S%z")
    post_filename="_posts/$yy_mm_dd-post-$i.md"
    cat > $post_filename <<EOS
---
title: "Post $i"
date: "$yy_mm_dd_hh_mm_ss_z"
---
Hello dear friends!

It is a pleasure to see you all again.
EOS
    echo "Wrote $post_filename"
    date_to_write=$(expr "$date_to_write" - 3600)
done

I called script/generate 10000 then rbspy record jekyll build and received impressive results.

Before: https://rbspy.parkermakes.tools/rbspy-2018-01-30-o4qFcKDwSo.svg
After: https://rbspy.parkermakes.tools/rbspy-2018-01-31-mD30LCP2L5.svg

Tested locally on Windows.. no issues..

@ashmaroli

This comment has been minimized.

Copy link
Member

commented Jan 31, 2018

Why are you interested in blocking this PR?

@parkr Since https://github.com/rbspy/rbspy doesn't run on Windows yet, I needed to test things out manually.. and that is a huge time-consumer because Jekyll is 3-4x slower on Windows.. and build times fluctuate over a range of 1-4s for the same Jekyll version.. so it calls for averaging across 3-4 builds..

Anyways, finally walked through each change here and groked through the implications..
The following change being my favorite..

    def most_recent_posts
-     @most_recent_posts ||= (site.posts.docs.reverse - [post]).first(10)
+     @most_recent_posts ||= (site.posts.docs.last(11).reverse - [post]).first(10)
    end
@ashmaroli

This comment has been minimized.

Copy link
Member

commented Jan 31, 2018

I set up a fresh new test workspace and generated 5500 posts using Parker's script above..
The following are build times on Windows using jekyll-3.7.2 and this PR

jekyll-3.7.2 related-posts-perf-improvements
74.15 seconds 63.99 seconds
81.706 seconds 62.164 seconds
79.597 seconds 62.966 seconds
@ghost

ghost approved these changes Jan 31, 2018

@DirtyF DirtyF added this to the v3.8.0 milestone Jan 31, 2018

@DirtyF

This comment has been minimized.

Copy link
Member

commented Jan 31, 2018

🔥 🚒

@jekyllbot: merge +minor

@jekyllbot jekyllbot merged commit b4985d1 into master Jan 31, 2018

3 checks passed

WIP ready for review
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@jekyllbot jekyllbot deleted the related-posts-perf-improvements branch Jan 31, 2018

jekyllbot added a commit that referenced this pull request Jan 31, 2018

@parkr parkr referenced this pull request Jan 31, 2018

Open

Collect success stories #62

@pathawks

This comment has been minimized.

Copy link
Member

commented Feb 5, 2018

This is a HUGE performance improvement for a large site! We should release this soon.

@mattr-

This comment has been minimized.

Copy link
Member

commented Feb 6, 2018

If we can, we should push this down to a stable 3.7 release or just release 3.8 now with the massive performance gains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.