Skip to content

Conversation

nadiamoe
Copy link
Member

@nadiamoe nadiamoe commented May 18, 2023

While working in #1180 I noticed PR build took more than 10 minutes to build, which was a bit of an annoying wait. It should be possible to reduce this time by leveraging https://github.com/actions/cache to restore the gatsby build cache across builds.

The current key should make this cache reusable across all PRs in the repo that do not modify package.json, gatsby-config.js or gatsby-node.js.

Before

image

After

image

Summary

This PR cuts build time by ~10 minutes, and that saving should happen even the first time a PR is opened.

@github-actions
Copy link
Contributor

There's a version of the docs published here:

https://mdr-ci.staging.k6.io/docs/refs/pull/1182/merge

It will be deleted automatically in 30 days.

1 similar comment
@github-actions

This comment was marked as duplicate.

@nadiamoe
Copy link
Member Author

nadiamoe commented May 18, 2023

If you find this line interesting, some other things I've observed:

  • Build an deploy being done in two different jobs causes the whole site to be uploaded to GH artifacts just to be downloaded gain, wasting 7 minutes round-trip.
    • The whole 7 minutes could be saved relatively easy by doing build and deploy on the same job.
  • When the site is uploaded to S3 for preview, the whole site is rewritten in subsequent builds of the same PR, as all the files have a different timestamp.
    • It should be possible to, either leverage the cache so build time does not change, or use rclone with checksumming to better detect which files have changed. This could save maybe 1m30s out of the 2m spent uploading to S3.

@nadiamoe nadiamoe marked this pull request as ready for review May 18, 2023 16:48
Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've frankly resigned to the fact that anything with Gatsby is slow as molasses, but I don't see a reason to not speed up the things we have control over. Using GH's cache seems like a small change for such a large improvement, so 👍 from me.

Re: your other suggestions, it makes sense to remove the upload/download of the build artifact, but realistically I see two issues here:

The raw size of all the files that were specified for upload is 350143080 bytes
The size of all the files that were uploaded is 96326272 bytes. This takes into account any gzip compression used to reduce the upload size, time and storage
  1. How is the total size of public/ 350MB?? I get that we have many images, but this seems way too excessive. Are we sure that Gatsby's build process is properly optimized?

  2. The total uploaded size was just 96MB, so a lot of it is text that compresses well. Why does this take 2m42s to upload, and 5m23s(!) to download??

Considering that there were a couple of these in the upload step:

A 503 status code has been received, will attempt to retry the upload
Exponential backoff for retry #1. Waiting for 6748 milliseconds before continuing the upload at offset 0

... I would place the blame for the slowness entirely on GH's infrastructure. I mean, downloading at ~300KBps is ridiculous, considering this is all happening on GH's servers.

So yes, we should probably still do the change of avoiding uploading and downloading the artifact, but there's only so much we can do to workaround infrastructure issues.

As for the rclone suggestion: makes sense 👍. I'd still like us to look into why Gatsby's public/ directory is so large, though.

@nadiamoe
Copy link
Member Author

How is the total size of public/ 350MB?? I get that we have many images, but this seems way too excessive. Are we sure that Gatsby's build process is properly optimized?

This is certainly strange. If I build the site locally (with the script in #1181, which AFAICT should be the same) I don't get 350MB, I get 136.7M. The distribution looks like the following:

136.7M	.
52.6M	./static
29.1M	./javascript-api
17.0M	./v0.43
12.4M	./page-data
5.0M	./es
4.1M	./cloud
2.9M	./using-k6
1.5M	./examples
1.4M	./results-output
1.3M	./446d23555d09c86812ebe7234e206c08
1007.0K	./extensions
556.0K	./integrations
548.0K	./test-types
494.5K	./images
479.0K	./misc
425.0K	./testing-guides
398.0K	./get-started
349.0K	./using-k6-browser
337.0K	./test-authoring
154.5K	./~partytown
85.0K	./404
80.5K	./icons
29.5K	./sitemap

With /static being mostly images in (as far as I can see) webp format. Perhaps there is a way to compress those more, like 95% quality jpeg, or resize them to a max size, but that may deteriorate the experience so perhaps should be done with care.

The next culprit, javascript-api, I have no idea what it is. But it seems that inside v0.43 there is another javascript-api with similar, but not identical contents. Perhaps someone with more gatsby knowledge than I have may be able to make sense out that 😅

... I would place the blame for the slowness entirely on GH's infrastructure. I mean, downloading at ~300KBps is ridiculous, considering this is all happening on GH's servers.

I definitely agree, but I would read that as another point in favor of dropping the upload/download flow altogether.

Either way, I'm happy to see you find this useful! I'll merge this for now and see if I can spare one hour merging the two jobs, so we can get rid of the artifact up/down overhead. rclone seems tempting but may require more trial and error, so I'd say we can tackle that if we see there's a big opportunity for time savings.

nadiamoe added 2 commits May 18, 2023 21:11
This will create different caches for each PR, and restore the most recent cache (that still matches gatsby config files) if there is not a cache entry for a particular PR.
This allows to keep the cache fresh, as the cache is not re-uploaded if `key` hits, but _is_ reuploaded if restore-keys. As the most recent key is fetched from `restore-keys`, this ensures we keep refreshing the cache instead of using always the same one that could have been generated a long time ago.
@nadiamoe
Copy link
Member Author

I have added the github ref to the cache ID to allow better cache recycling, see the full rationale in the commit message 6259aea. This still allows for the cache to be used across PRs.

Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Thanks for taking the initiative to look into this 🙇

I don't get 350MB, I get 136.7M.

I'm not able to build the project locally, because of some obscure GraphQL error I have no intention to dig into now, but yeah, this is strange.

The next culprit, javascript-api, I have no idea what it is.

Ah, yes, this is the way we version JS API documentation. 😞 I mentioned some drawbacks in #966, but apparently it doesn't bother anyone else...

In any case, don't worry too much about it. If we proceed with #1183, then the size of public/ shouldn't be a major issue anymore. I still think someone knowledgeable with Gatsby should look into this and see if we can optimize both the build speed and final size, but if it was up to me I'd get rid of Gatsby altogether. It's a painful project to work with in many ways, and we'd be better off without it.

@nadiamoe nadiamoe merged commit 5bc509d into main May 22, 2023
@nadiamoe nadiamoe deleted the gha-cache branch May 22, 2023 09:32
@nadiamoe
Copy link
Member Author

@imiric RE:

I'm not able to build the project locally, because of some obscure GraphQL error I have no intention to dig into now, but yeah, this is strange.

I also faced a weird graphql error and the culprit was (🥁) not setting GATSBY_DEFAULT_MAIN_URL and GATSBY_DEFAULT_DOC_URL, in case it helps. Don't ask me why because I don't have the slightest clue 😅

@imiric
Copy link
Contributor

imiric commented May 22, 2023

Ah, right, thanks for the tip. I see that we set these in the CI job, otherwise siteUrl would be empty, which is what Gatsby complains about.

So it built after 6m22s, and public/ does take 357MB on my machine. Here are the top 20 largest directories:

11M     ./v0.43/javascript-api/k6-experimental/redis
11M     ./v0.43/javascript-api/k6-experimental/redis/client
12M     ./page-data/javascript-api/k6-experimental/browser
14M     ./cloud
15M     ./javascript-api/k6-experimental/browser/page
15M     ./v0.43/javascript-api/k6-experimental/browser
16M     ./javascript-api/jslib
17M     ./es
19M     ./page-data/javascript-api/k6-experimental
21M     ./page-data/v0.43
21M     ./page-data/v0.43/javascript-api
30M     ./v0.43/javascript-api/k6-experimental
31M     ./javascript-api/k6-experimental/browser
36M     ./page-data/javascript-api
51M     ./javascript-api/k6-experimental
57M     ./static
61M     ./v0.43
61M     ./v0.43/javascript-api
66M     ./page-data
103M    ./javascript-api
357M    .

Unsuprisingly, javascript-api is the largest.

@pablochacin
Copy link
Contributor

The total uploaded size was just 96MB, so a lot of it is text that compresses well. Why does this take 2m42s to upload, and 5m23s(!) to download??

This looks fishy. Even 350Mb shouldn't take +5m to download.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: browser The browser module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants