-
Notifications
You must be signed in to change notification settings - Fork 241
ci/pr-deploy: cache node modules and gatsby output #1182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There's a version of the docs published here: https://mdr-ci.staging.k6.io/docs/refs/pull/1182/merge It will be deleted automatically in 30 days. |
1 similar comment
This comment was marked as duplicate.
This comment was marked as duplicate.
If you find this line interesting, some other things I've observed:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've frankly resigned to the fact that anything with Gatsby is slow as molasses, but I don't see a reason to not speed up the things we have control over. Using GH's cache seems like a small change for such a large improvement, so 👍 from me.
Re: your other suggestions, it makes sense to remove the upload/download of the build artifact, but realistically I see two issues here:
The raw size of all the files that were specified for upload is 350143080 bytes
The size of all the files that were uploaded is 96326272 bytes. This takes into account any gzip compression used to reduce the upload size, time and storage
-
How is the total size of
public/
350MB?? I get that we have many images, but this seems way too excessive. Are we sure that Gatsby's build process is properly optimized? -
The total uploaded size was just 96MB, so a lot of it is text that compresses well. Why does this take 2m42s to upload, and 5m23s(!) to download??
Considering that there were a couple of these in the upload step:
A 503 status code has been received, will attempt to retry the upload
Exponential backoff for retry #1. Waiting for 6748 milliseconds before continuing the upload at offset 0
... I would place the blame for the slowness entirely on GH's infrastructure. I mean, downloading at ~300KBps is ridiculous, considering this is all happening on GH's servers.
So yes, we should probably still do the change of avoiding uploading and downloading the artifact, but there's only so much we can do to workaround infrastructure issues.
As for the rclone suggestion: makes sense 👍. I'd still like us to look into why Gatsby's public/
directory is so large, though.
This is certainly strange. If I build the site locally (with the script in #1181, which AFAICT should be the same) I don't get 350MB, I get
With The next culprit,
I definitely agree, but I would read that as another point in favor of dropping the upload/download flow altogether. Either way, I'm happy to see you find this useful! I'll merge this for now and see if I can spare one hour merging the two jobs, so we can get rid of the artifact up/down overhead. |
This will create different caches for each PR, and restore the most recent cache (that still matches gatsby config files) if there is not a cache entry for a particular PR. This allows to keep the cache fresh, as the cache is not re-uploaded if `key` hits, but _is_ reuploaded if restore-keys. As the most recent key is fetched from `restore-keys`, this ensures we keep refreshing the cache instead of using always the same one that could have been generated a long time ago.
I have added the github ref to the cache ID to allow better cache recycling, see the full rationale in the commit message 6259aea. This still allows for the cache to be used across PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 Thanks for taking the initiative to look into this 🙇
I don't get 350MB, I get 136.7M.
I'm not able to build the project locally, because of some obscure GraphQL error I have no intention to dig into now, but yeah, this is strange.
The next culprit,
javascript-api
, I have no idea what it is.
Ah, yes, this is the way we version JS API documentation. 😞 I mentioned some drawbacks in #966, but apparently it doesn't bother anyone else...
In any case, don't worry too much about it. If we proceed with #1183, then the size of public/
shouldn't be a major issue anymore. I still think someone knowledgeable with Gatsby should look into this and see if we can optimize both the build speed and final size, but if it was up to me I'd get rid of Gatsby altogether. It's a painful project to work with in many ways, and we'd be better off without it.
@imiric RE:
I also faced a weird graphql error and the culprit was (🥁) not setting |
Ah, right, thanks for the tip. I see that we set these in the CI job, otherwise So it built after 6m22s, and
Unsuprisingly, |
This looks fishy. Even 350Mb shouldn't take +5m to download. |
While working in #1180 I noticed PR build took more than 10 minutes to build, which was a bit of an annoying wait. It should be possible to reduce this time by leveraging https://github.com/actions/cache to restore the gatsby build cache across builds.
The current key should make this cache reusable across all PRs in the repo that do not modify
package.json
,gatsby-config.js
orgatsby-node.js
.Before
After
Summary
This PR cuts build time by ~10 minutes, and that saving should happen even the first time a PR is opened.