Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] Smartly Split page metadata and lazy load in client #6651

Closed
wants to merge 1 commit into from

Conversation

KyleAMathews
Copy link
Contributor

Currently we put all page metadata into one bundle. This is quite fine < 1000
pages but by ~4000 pages, the bundle is ~275kb (and growing). So obviously
not good enough.

We want Gatsby v2 to be 100% production ready for sites of 100k+ pages so
getting smart page splitting working is critical.

We should minimize how much code we load upfront while still ensuring that
navigating around the site is incredibly fast.

We also want to maximize long-term cachability of all files.

PR is currently very prototype-y but I've been thinking about the ideas for
awhile and the algorithm is feeling solid.

The algorithm exploits that most pathname structure is meaningful. E.g.
all blog posts are under /blog/

The path bundling algorithm works as follows:

  1. Create a "top-level" bundle of the home page and any other top-level paths. It's
    assumed that on most sites, paths like /blog/, /tutorial/, /about/ are either
    index pages or much more likely to be linked to than a random page.
  2. it creates bundles for each lower-level path segment e.g. products/mens, products/womens, etc.
  3. it selects from the lower-level bundles which are between 37 and 500 pages. The
    numbers are fairly magic but turn into bundles of metadata between ~2 and ~30kb gzipped
    which are safe numbers for all devices to pull in without a) causing blips in UI thread
    while processing the JS and b) delaying fetching a page if a user navigates to a page
    which we have to first fetch the metadata and then fetch the page resources.
  4. Some lower-level bundles will be too small so the algorithm collects those together
    in a "misc" bundle.
  5. Some bundles will be too large so the algorithm splits these into smaller buckets. It does that by hashing the last path segment into
    buckets. The number of buckets varies depending on how many paths are in the bucket. The larger the bucket, the less likely
    that when loading a page of links, that any of the links will fall in the same metadata bucket (poisson distribution baby)
    so we create smaller and smaller buckets e.g. 25-75 pages I'm thinking.

In the browser (this code isn't written yet), the Gatsby runtime loads page
metadata for links when they mount. The algorithm goes in reverse. If the pathname
is top-level, then the runtime loads that bundle. Otherwise it splits the pathname
and then follows it down to the right bundle so e.g. /products/mens/red-fleece
would not be top-level so then go to products, then mens, then not find anything for red-fleece
so would load the products/mens/misc bundle. Or if there was 500+ mens products,
it'd load the bucket the hash of red-fleece directed the runtime towards.

@KyleAMathews
Copy link
Contributor Author

Deploy preview for using-postcss-sass failed.

Built with commit a65726d

https://app.netlify.com/sites/using-postcss-sass/deploys/5b53faafdd28ef57cbf7c87f

@gatsbybot
Copy link
Collaborator

Deploy preview for gatsbygram ready!

Built with commit a65726d

https://deploy-preview-6651--gatsbygram.netlify.com

@gatsbybot
Copy link
Collaborator

Deploy preview for using-drupal ready!

Built with commit a65726d

https://deploy-preview-6651--using-drupal.netlify.com

@pieh pieh changed the title [wip] Smartly Split page metadata and lazy load in client [EPIC] Smartly Split page metadata and lazy load in client Aug 24, 2018
@pieh pieh changed the title [EPIC] Smartly Split page metadata and lazy load in client [wip Smartly Split page metadata and lazy load in client Aug 24, 2018
@coxom
Copy link

coxom commented Nov 26, 2018

Hi,

Any progress on this? Our app bundle analyzer result points this as one of the biggest chunks of js:

image

@KyleAMathews KyleAMathews changed the title [wip Smartly Split page metadata and lazy load in client [wip] Smartly Split page metadata and lazy load in client Jan 11, 2019
@KyleAMathews
Copy link
Contributor Author

One refinement that occurred to me recently — it could make sense (and make this PR a lot simpler) if we split page code/data metadata. The code metadata is much smaller and could perhaps be bundled together even for much larger sites. Then page data metadata could be written out to one file / page.

We want to separate out builds that require full webpack builds from those that are just data derived builds. If only data has changed, then we wouldn't touch anything code related and having a deterministic way to update page data metadata would be much simpler.

The page data metadata filename would need deterministically created e.g. /public/pmd/about.json or whatever. Which would mean they couldn't be cached in the client but they'd be tiny files and very quick to load over http2 so that could be fine.

@KyleAMathews
Copy link
Contributor Author

Closing this as we're going to go a different direction

@bobaaaaa
Copy link

@KyleAMathews can you share maybe some more information? :)

@KyleAMathews
Copy link
Contributor Author

Will soon :-)

It's just this approach is too complicated and has downsides e.g. isn't deterministic so we have to analyze every page every time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants