Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question - incremental builds support = part II #5002

Open
ekdev2 opened this issue Apr 16, 2018 · 41 comments
Open

question - incremental builds support = part II #5002

ekdev2 opened this issue Apr 16, 2018 · 41 comments

Comments

@ekdev2
Copy link

@ekdev2 ekdev2 commented Apr 16, 2018

#4981
I think @LekoArts is right. What I mean is if you generate a site with 2000 pages and deploy to aws, then one of those content pages changes in the cms, can you generate just that one page and deploy it.

@m-allanson

This comment has been minimized.

Copy link
Member

@m-allanson m-allanson commented Apr 17, 2018

It's not something Gatsby does at the moment, but it is something people have asked for. There's been work in version 2 to improve performance on larger sites, but there's no release date for that yet.

@mattferderer

This comment has been minimized.

Copy link
Contributor

@mattferderer mattferderer commented Apr 17, 2018

@m-allanson is there a discussion/issue on how to handle this? I didn't see it in the link you listed. I'm curious to hear conversations on how to handle doing this on a host like Netlify & using a CMS like Wordpress/Drupal that currently require a lot of HTTP requests during build.

@pieh

This comment has been minimized.

Copy link
Contributor

@pieh pieh commented Apr 17, 2018

AFAIK you wouldn't been able to use incremental builds on netlify because .cache and public directories are not preserved between builds, so it will always do clean build

@mattferderer

This comment has been minimized.

Copy link
Contributor

@mattferderer mattferderer commented Apr 17, 2018

That's good to know. I'm tossing around a ton of ideas that aren't well thought out. So even if we could eliminate the need for HTTP requests, we still need to make sure the .cache and public directories can be referenced by the build tool which eliminates many of the hosts that lower the bar to entry.

@robertschneiderman

This comment has been minimized.

Copy link

@robertschneiderman robertschneiderman commented May 8, 2018

Another use case for incremental building is when you have a very large site that you want to build in parts. I was getting "heap out of memory error" when building ~5k pages at once.

We plan on our site getting very large, so we're testing Gatsby at larger scales. We've tried doing something like this path: './src/pages/${subPath}', where subPath is process.argv[3]. This works nicely when we host parts of our site with gatsby develop. It also circumvents the problems with the memory heap when using gatsby build for a 5k+ page site. For it to really be a solution, it would probably depend on the ability to specify an output subdirectory within the public folder: #4756

@ekdev2

This comment has been minimized.

Copy link
Author

@ekdev2 ekdev2 commented Jun 20, 2018

what if another approach is used to achieve the same goal. I wanted to run an idea by you guys and see what people think. So lets say you have a 5k page web site. The initial pages would be generated statically but each page will have a sub component that will load on top of the static content with the same content thats read from static json files. This way if a user wanted to update one page in the CMS in the middle of the day, they can make the update and just that static json file would be regenerated and deployed to a CDN. Then you can just regenerate the whole site maybe once a day as a nightly process. The seo static content might not be the most up to date during the day but I dont see that as a big deal. It will just get updated during the nightly process.

@ekdev2 ekdev2 reopened this Jun 20, 2018
@tsimons

This comment has been minimized.

Copy link
Contributor

@tsimons tsimons commented Jul 18, 2018

@robertschneiderman we've run in to the memory issue as well. We're closer to 1500 pages, but an insane amount of images (design blog). We've turned off source maps, and stopped the build from downloading image files, but ultimately had to edit the build command to increase the memory allocated to the node instance. via the --max_old_space_size flag.

One thing that worries me about this feature is schema building. If we don't have every post available for gatsby to build a schema from, our queries will throw errors. It would be really nice if there was a way to pass schemas to gatsby, or at least provide dummy entities during the build to demonstrate the different shapes they may take.

@agonsalves

This comment has been minimized.

Copy link

@agonsalves agonsalves commented Jul 18, 2018

I am considering using Gatsby to build the UI for a content site with over 5000 items, most with interconnected relationships to each other. The data will come from a database-driven CMS.

The benefit to using Gatsby over a standard API-driven React site is that I would spend a fraction of the time building and maintaining the data API and state management system that loads the remote data and stores it. (Since I plan on deploying this application for multiple sites of similar size, this seems like a very valuable benefit.)

The downside to using Gatsby in this case would be the fact that the entire site would need to be rebuilt for even the most insignificant content update. Forgot to add a comma? Rebuild all 5000 pages! Who even knows how long that would take? This is even more of an issue when considering the experience of the CMS users - they're used to seeing changes appear on the site immediately after they save them. With Gatsby, we're looking at a few minutes' wait (at least) before the change appears.

If there were a way to trigger builds for a subset of pages, it would make Gatsby the clear, definitive choice. At this moment, though, it's a tough sell.

@KyleAMathews

This comment has been minimized.

Copy link
Contributor

@KyleAMathews KyleAMathews commented Jul 18, 2018

BTW, I've been working a lot on improving speeds for larger site builds for v2. On the latest v2 beta — you might be able to build 5000 pages in < 1:30. There'll be more speed improvements coming.

@tsimons

This comment has been minimized.

Copy link
Contributor

@tsimons tsimons commented Jul 19, 2018

That's amazing @KyleAMathews! I definitely look forward to that! Let me know if you want to test against an image heavy blog

@brod-ie

This comment has been minimized.

Copy link

@brod-ie brod-ie commented Oct 16, 2018

@KyleAMathews 5K is nice but we need 1M 😉

@Tawfiqh

This comment has been minimized.

Copy link
Contributor

@Tawfiqh Tawfiqh commented Oct 16, 2018

If we want to compile parts of the site separately, we can set flags on build so that gatsby-node knows only to generate the parts of the site specified. We could then add back in the previously generated static files. This works for us as long as we link to the previously generated files with a basic <a href> as opposed to a <Link to >.

I'm wondering if we can make <Link to> work when linking to previously generated files if we merge in some of the previous data.json at build time. Looking into that a bit more at the moment.

@rbmedia

This comment has been minimized.

Copy link

@rbmedia rbmedia commented Nov 7, 2018

I have no worry with the build time but more with the volume of static files that I need to upload for any update, we launched a large visual portfolio with Gatsby and the static site to upload is over 150 MB
Mostly images.
This makes the site unavailable around 40 minutes during an update
The availability to rebuild a part of the site is definitely a feature that would boost Gatsby.
I plan to use Gatsby for a new site but I will divide the site in a static and dynamic part using a traditional php CMS for the news part.

@mattferderer

This comment has been minimized.

Copy link
Contributor

@mattferderer mattferderer commented Nov 7, 2018

@rbmedia you might want to consider a host that does deployment switching like Netlify so your current site stays running until your new version is ready.

@rbmedia

This comment has been minimized.

Copy link

@rbmedia rbmedia commented Nov 7, 2018

Thanks Matt, I will consider it!
I did built some News websites with Drupal in the past, any update had to be online within a short lapse of time (less than 2 minutes). I would love to use Gatsby in the future for this kind of sites.

@KnisterPeter

This comment has been minimized.

Copy link

@KnisterPeter KnisterPeter commented Jan 17, 2019

Any news on this issue? We plan a site with around 100k pages and incremental builds would be awesome.

@Leocn

This comment has been minimized.

Copy link

@Leocn Leocn commented Jan 25, 2019

make another path as default static page folder, not '/public'.
After run gatsby build, copy the ../public/* to the default path.

@gatsbot

This comment has been minimized.

Copy link

@gatsbot gatsbot bot commented Feb 17, 2019

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.

If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!

Thanks for being a part of the Gatsby community! 💪💜

@gatsbot gatsbot bot added the stale? label Feb 17, 2019
@Tawfiqh

This comment has been minimized.

Copy link
Contributor

@Tawfiqh Tawfiqh commented Feb 19, 2019

I still don't think this is fixed/supported in Gatsby. Any news @ TeamGatsby?

@wardpeet

This comment has been minimized.

Copy link
Member

@wardpeet wardpeet commented Feb 22, 2019

it's a long-standing issue because it's really hard to fix without thinking heavily about it. @Moocar has an issue open to at least get us a step in the right direction.

@coreyward

This comment has been minimized.

Copy link
Member

@coreyward coreyward commented Feb 24, 2019

Does Gatsby currently track which GraphQL nodes are retrieved on a given page? If so, it would seem viable to add incremental rebuilds based on changes to the data. That’s half of the work, no?

The other chunk of work is providing source plugins with a cache and encouraging plugin developers to only fetch changed data where possible. In many instances this is trivial.

@Moocar

This comment has been minimized.

Copy link
Contributor

@Moocar Moocar commented Feb 24, 2019

@coreyward Yes, Gatsby tracks every node that is returned for a query (via page-dependency-resolver.js). It's what powers gatsby develop's ability to only rerun queries only for changed data. We don't currently save that information to disk so it's not used for gatsby build yet but that's definitely the plan.

@mattbloomfield

This comment has been minimized.

Copy link

@mattbloomfield mattbloomfield commented Apr 9, 2019

I know that for our team this will be the go/no-go decision against using Gatsby for our 2019 rebuild of our flagship site. I'm really hoping it can be released or at least be on the horizon as we start building. We support hundreds of web authors editing various pieces of the site throughout the working day. When they hit save they pretty much expect the content to be updated. It's not uncommon for them to go back just to fix a comma or change the date on the post.

@wardpeet

This comment has been minimized.

Copy link
Member

@wardpeet wardpeet commented Apr 11, 2019

@mattbloomfield we have more customers interested in this so we have this high up on the priority list.

@realgt

This comment has been minimized.

Copy link

@realgt realgt commented Apr 19, 2019

we're implementing gatsby with a drupal 8 backend using gatsby-source-graphql plugin, and performance is not an issue so far, with ~4000 pages built in < 30 seconds. we're pulling all data in gatsby-node as opposed to running thousands of StaticQuerys, and bypassing image processing for now.

success run graphql queries — 3.088 s — 4008/4008 1311.56 queries/second
success write out page data — 0.070 s
success write out redirect data — 0.001 s
success Build manifest and related icons — 0.117 s
success onPostBootstrap — 0.127 s

info bootstrap finished - 15.751 s

success Building production JavaScript and CSS bundles — 3.361 s
success Building static HTML for pages — 6.906 s — 4006/4006 609.25 pages/second
info Done building in 26.047 sec
@mpoisot

This comment has been minimized.

Copy link

@mpoisot mpoisot commented Apr 23, 2019

I'm currently evaluating using Gatsby to speed up an old Heroku-hosted Rails 3.x site that's slow as molasses. It has about 1 million pages so incremental builds are the only way it would work. Most pages don't change so making them static feels like a huge win, but new pages are constantly added and some old pages get edited. Users expect to see the changes within seconds. My hope was to add just enough code to the Rails app to make it a JSON API server, and generate a new frontend with Gatsby, with static assets hosted somewhere like Netlify or S3.

I was thinking I would be able to do something like run an incremental Gatsby build via a job queue worker. The Rails API server knows when a page gets updated, so it would create an 'update page job' using the page_id (a key in the postgres DB), and the worker would pass that to Gatsby with an ENV var with something like PAGE_ID=1235 gatsby build. I'd use that ENV var within createPages() to look up just what's needed for that one page and build it. The resulting output file(s) would get transferred to the static host (I'm hoping there's a build hook for that). If no PAGE_ID var is set it would build all pages as usual.

If a page is deleted, the Rails API would create a job that either deletes the assets directly from the static host, or maybe there's something needed from Gatsby so I'd still run that with a different ENV variable. (I'm thinking I'd need the page's path at the minimum).

Am I barking up the wrong tree thinking that Gatsby is compatible with this kind of project? Thanks for any help.

@wardpeet

This comment has been minimized.

Copy link
Member

@wardpeet wardpeet commented May 28, 2019

We have an alpha version up. It's not incremental builds yet but at least the path forward.
you can use it by installing npm install --save gatsby@per-page-manifest

More info:
#13004

@mpoisot for now per page building isn't working yet. I'm not sure what the timeframe you're looking at for this project. If the queries are light, gatsby might be an okay fit for your site even without incremental builds.

cc @KyleAMathews @Moocar to give a better explanation of this.

@scandeezy

This comment has been minimized.

Copy link

@scandeezy scandeezy commented Aug 21, 2019

Pinging this, as it's been a few months since last update and it seems to be the place of action. I see that the breaking down of the page-data.json has been in, and I've been using it.

Is there a more concrete set of requirements and tasks driving this forward? I understand that it's a big problem, but it always helps if it's visibly broken down into smaller problems that can show progress and traction.

@scandeezy

This comment has been minimized.

Copy link

@scandeezy scandeezy commented Sep 4, 2019

@wardpeet @Moocar I'm unsure who's the most appropriate person/list to ping on this, but I see you as both being the last actives from the project on here. Any updates as to the primary goal of this ticket?

@dominicfallows

This comment has been minimized.

Copy link
Contributor

@dominicfallows dominicfallows commented Sep 4, 2019

Having a good convo with @KyleAMathews about incremental builds and how they might get delivered https://twitter.com/dominicfallows/status/1169152367964643328?s=19

@dominicfallows

This comment has been minimized.

Copy link
Contributor

@dominicfallows dominicfallows commented Sep 16, 2019

Having a good convo with @KyleAMathews about incremental builds and how they might get delivered https://twitter.com/dominicfallows/status/1169152367964643328?s=19

TLDR;

@KyleAMathews confirmed that Gatsby are working on the Gastsby Cloud hosted incremental build features.

Self-hosted/on-premises "Gatsby Enterprise" version, with incremental builds, is possible, but they are not working on it yet....

Dominic Fallows - Sep 4 - Most vendors we choose offer a self-managed/on-premises option, as Gatsby OSS does. We happily pay for those, as we would an on-premises Gatsby Enterprise Cloud solution from you.

Kyle Mathews - Sep 4 - yeah for sure — we have a pretty clear path for supporting onprem versions of what we're doing — it's all Kubernetes so it should be possible — but onprem adds a lot of overhead when we're initially just working on shipping something that works 😅

Dominic Fallows - Sep 4 - Now that is great news to hear! Sorry if I've missed that discussed elsewhere, but that onprem roadmap would be super useful for businesses and developers alike to have sight of.

Kyle Mathews - Sep 4 - It's far enough away right now that I couldn't give a timeline. Definitely not this year and wouldn't want to promise next year either. Depends on how fast we can scale revenue and our engineering team

@sielay

This comment has been minimized.

Copy link
Contributor

@sielay sielay commented Oct 2, 2019

It's a pity as it blocks using Gatsby as a tool for publishers where we talk about millions of canonical pages and another same or indexing ones.

Wouldn't it make sense to "eject" such use case as separate project using same concepts/core?

@bob-obringer

This comment has been minimized.

Copy link

@bob-obringer bob-obringer commented Oct 26, 2019

Make or break feature for 2020 decisions. Seems to be a good place to invest all that VC money 😀

@LekoArts LekoArts removed the not stale label Nov 18, 2019
@AdamZaczek

This comment has been minimized.

Copy link

@AdamZaczek AdamZaczek commented Dec 1, 2019

Gatsby does a lot of things right but long build times make it absolutely unusable in larger projects :/ We discussed moving away from the framework this week just because of that.
Please make faster build happen!

@mattbloomfield

This comment has been minimized.

Copy link

@mattbloomfield mattbloomfield commented Dec 1, 2019

Agree with above! Gatsby either gets niched into a quick and easy blogging solution or implements incremental/faster builds and becomes enterprise ready.

@Vacilando

This comment has been minimized.

Copy link

@Vacilando Vacilando commented Dec 1, 2019

Absolutely correct; bumping against this over and over on larger projects. Without incremental builds Gatsby is not an option.

@xaviemirmon

This comment has been minimized.

Copy link
Contributor

@xaviemirmon xaviemirmon commented Dec 1, 2019

Incremental builds on Gatsby Cloud fixes these issues. You can signup for the private beta here https://www.gatsbyjs.com/builds-beta/

@dwightwatson

This comment has been minimized.

Copy link

@dwightwatson dwightwatson commented Dec 1, 2019

Nothing about that seems to suggest it supports incremental builds though - just that it has the "fastest build times for Gatsby sites".

I'd be concerned about the implication that incremental builds would only be available on a hosted Gatsby service rather than available to be used standalone.

@xaviemirmon

This comment has been minimized.

Copy link
Contributor

@xaviemirmon xaviemirmon commented Dec 2, 2019

I see what you mean @dwightwatson there's nothing on the website that says it's "incremental." At Gatsby Days London they demoed builds and it was definitely incremental builds. Not sure how it's done though and if it will be apart of the Gatsby package or if it's just going to be a service they provide.

@agonsalves

This comment has been minimized.

Copy link

@agonsalves agonsalves commented Dec 2, 2019

Investors gotta make their money back somehow. 🙄

@gomflo

This comment has been minimized.

Copy link

@gomflo gomflo commented Dec 10, 2019

trying to build very large website 140k+ pages
image

gatsby build is somewhat good… but doing the deployment its painful (zeit.co)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.