Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
1.0 improving performance and scalability #431
1.0 is going to have a number of improvements to Gatsby's frontend performance. This issue provides some details about my plans and the reasoning behind them.
What Gatsby gets right already
Gatsby 0.x is very fast. We've worked really hard at this and so far have added:
What needs improved?
In frontend performance parlance, Gatsby has an excellent TTFB (Time To First Byte) and TTFP (Time To First Paint). Gatsby sites load fast and are remarkably quick when clicking around a site.
Changes for Gatsby 1.0 are focused on improving TTI (Time To Interaction) and our ability to scale to larger sites.
Many of these changes are inspired by the fine work of engineers at Google (and elsewhere) who've been researching patterns for improving web performance and building these into the web platform.
Particularly helpful is the PRPL pattern.
PRPL stands for:
Time to interaction
The slower the hardware the more noticeable this is.
For the first category, I think there's a few automated lint-like things we can do to suggest to people there's code they could eliminate. Also tracking the sizes of different pages and how that changes over time would be helpful.
There's a close analogy to just-in-time manufacturing ideas. Companies found that the way to be the most responsive to customers is to actually avoid doing work ahead of time. When they did do work ahead of time this would paradoxically slow them down as the speculative work would get in the way of getting the work done that's actually necessary (resource contention).
For both manufacturing and web apps there's high inventory cost (unused code takes up memory) and a premium on responsiveness. The car customer wants their new car yesterday and the web app consumer wants their app running immediately. Any work you do ahead of time because "they might need it" gets in the way of the app being responsive to the user.
With both you want to wait until the user asks for something and then work overtime to get it to them as fast as possible.
The PPRL pattern says push the initial page as fast as possible and then let a service worker cache the raw ingredients for remaining pages in the browser so they can be quickly assembled when the user asks for them.
That's part of what makes service workers so valuable over previous precaching solutions — they don't evaluate the JS, just load and cache it.
By limiting the work a browser does to what's needed for the current page, Gatsby can scale to almost any sized site as the site only pays the cost for the pages a user visits.
Plans for improving TTI in Gatsby
Loading only critical resources upfront is a fairly obvious idea. The devil of course is in the details. How can Gatsby identify the critical resources for a page without swamping developers with tedious bookkeeping?
A website is made up of roughly four types of things: styles, code, data, and images. Each requires different strategies. Let's take a look.
Identifying and loading critical styles
Gatsby 0.x inlines all CSS for a site in
I like to think in terms of global and component styles. Typically a site will have a set of global stylesheets e.g. for reset/normalize, typography, and various other global concerns. These set the overall look and feel for the site. Then there are styles for individual components. Ideally components are responsible for their own styles one way or another.
Ideally we inline in
Handling global styles is fairly easy. With traditional CSS you could compile the global styles to their own file to be inlined or you could use something like Typography.js.
Pulling out a page's component styles can be trickier.
Identifying and loading critical code and data
Note: I'm calling data the data that is passed client-side into your React.js components. A Gatsby site runs on both the server and client so page data has to be loaded into the client along with component code.
In Gatsby 0.x all data for a site is loaded into the client at boot. This was the easiest way I found to build version 0.x and has proved convenient to use.
The difficulty with this is that every page then pays the cumulative cost of every page on the site. One massive visualization with heavy JS libraries and 1000s of rows of data is loaded on every page.
This isn't ideal obviously.
The ideal is every page can specify exactly the data it needs and that and only that data would get loaded with each page.
Luckily some teams at Facebook have already been thinking hard about this problem and have come up with GraphQL and Relay. GraphQL is an elegant query language for letting client code specify data requirements and Relay provides beautiful and simple integration with React where each route specifies its data requirements with GraphQL and Relay handles the behind-the-scenes work of fetching the data and caching it locally. I used them for close to a year building a product and they are fantastic. Colocating your data query with your component makes it simple to see what data is available on each page and make quick modifications.
I wrote more in another issue about how Gatsby 1.0 will use GraphQL and a Relay-like pattern but in short, each page can now specify exactly the critical data it needs to render which gets written out to a JSON file and loaded along with the page component code. I'm exploring patterns as well for a page to lazy-load data.
For splitting code, this is an area that's been thoroughly explored by the Webpack and React communities. There's a wide variety of options available, most of which I've explored. I spent two days fiddling with custom Webpack configs and plugins working through options and tradeoffs. I even dreamed one night about a code splitting problem (I solved it) :-)
Similar to styles there are global JS modules (used on every page) and route-specific modules. Global JS should be loaded on the first page load along with modules for that page and then other JS is fetched in the background and then evaluated on route transitions.
Another consideration is improving long-term caching. Ideally we should split code in such a way that limits how many bundles are affected by common changes.
This feels quite similar to database normalization. And like database normalization, there's tradeoffs between levels of normalization. The JS bundle equivalent of a fully normalized database is where the browser loads each JS module individually.
Khan Academy explored doing this and found that it was significantly slower (even with HTTP/2).
Each page loads a
When loading subsequent pages, if moving to a different route type (e.g. from a blog post to an index page), load the new route component and the page's data bundle.
This makes for very quick page transitions as the data bundles are often a few kbs and the route components often < 15kbs.
With a service worker, these bundles will be cached and ready to be used further dropping page transition times.
Editing a page means either just one data bundle is changed or one route component bundle.
All this can happen at the framework level as routes and data requirements are declared programmatically. Using this information we write out a custom routes file (for React Router) that has code splitting with named bundles built-in. Using the named bundles, we specify on each statically rendered HTML page which JS bundles to load.
Reducing impact of images
I'd love to make near automatic several image loading techniques. Responsive images, lazy loading images when they enter the viewport, and loading placeholder images first before loading the actual image.
This issue discusses some of those ideas #285.
The new GraphQL data layer should make some of these ideas fairly straightforward to implement. E.g. provide a custom Gatsby React image component which exports a standard GraphQL query for getting responsive image links plus the placeholder (which would be inlined) and has built-in awareness of viewports so knows when to load its image.
Other performance ideas
Many hosts need custom configuration to unlock performance options they have. I can see host-specific Gatsby plugins being really useful to setup caching, server push (as it becomes available), etc.