Skip to content

Higher fidelity GitHub Actions cache server fake#232

Merged
aomarks merged 9 commits intomainfrom
github-cache-testing
May 15, 2022
Merged

Higher fidelity GitHub Actions cache server fake#232
aomarks merged 9 commits intomainfrom
github-cache-testing

Conversation

@aomarks
Copy link
Copy Markdown
Member

@aomarks aomarks commented May 14, 2022

I'm working on a clean implementation of our GitHub Caching client, which will be much simpler because we'll have full control over cache keys and tarball generation, and will drop our install size and deps significantly.

As part of this, I wanted to improve the fidelity of the GitHub Actions cache server fake that we use in testing, plus add an extra test.

  • Returns 201 and 204 status codes in some cases
  • Has a unique identifier in the base path
  • Requires that you provide an expected total size when you commit a cache entry and validates it
  • Handles splitting up upload requests into multiple chunks
  • Validates that each upload chunk is <= 32MB
  • Validates Content-Range headers when tarball chunks are uploaded
  • Validates Content-Type, Accept, Transfer-Encoding, and User-Agent headers
  • Runs on HTTPS

Part of #107

@aomarks aomarks requested a review from rictic May 14, 2022 18:23
@aomarks aomarks force-pushed the github-cache-testing branch from bd2c0e5 to 9164f75 Compare May 14, 2022 18:34
@aomarks aomarks force-pushed the github-cache-testing branch from 543eea6 to c9b1706 Compare May 14, 2022 20:31
@aomarks aomarks merged commit 607c8ed into main May 15, 2022
@aomarks aomarks deleted the github-cache-testing branch May 15, 2022 15:59
aomarks added a commit that referenced this pull request May 15, 2022
Replaces [`@actions/cache`](https://github.com/actions/toolkit/tree/main/packages/cache) with our own implementation for interacting with the GitHub Actions cache service.

### API mismatch with @actions/cache

`@actions/cache` takes a key and a set of glob patterns, and the literal glob patterns themselves are automatically used to generate a cache "version". The key + version then serve as a compound key for the cache entry.

This didn't work well for us because the glob implementation it uses doesn't support exclusions the way we need, so it is incompatible with our glob behavior, and may also be different in other subtle ways. We really just want to provide a list of files/directories for the tarball, and set the key independently.

But if we list all the files directly, then we can never get a cache hit, because the full list of files needs to be part of the cache key, but knowing the list of files requires running the script, which is what we're trying to avoid by restoring from cache!

### The workaround we had before

The workaround we had before was to reach into the `internal/` directory of `@actions/cache` and call some of the lower level functions directly, which let us control exactly what the cache key + version was, while still passing an explicit list of files to `tar`.

This worked ok, but had a high risk of breakage, because it uses non-public APIs. #227 is an example of us breaking because of that. It also required us to write some of our own typings, and to turn off `skipLibCheck` in our `tsconfig.json`. It was still quite a lot of code, too.

In addition, we still had to deal with an edge case relating to empty directories. By default, `tar` includes all recursive contents of a directory -- so if we wanted to cache an empty directory that wasn't actually empty on disk, we had to resort to a weird "empty directories manifest" hack. But now we can just use `tar --no-recursion` flag which avoids that whole issue.

### This PR

This PR completely replaces `@actions/cache` with a custom implementation that uses `https.request` and `execFile('tar', ...)`.

As a big bonus, **our install size decreased from 25MB to 2.4MB** and our **transitive dependency count decreased from 93 to 29**.

### Testing

- In #232 I improved our fake in a number of ways to make it higher fidelity, such as validating headers, and handling multiple upload chunks.

- I read through the implementation details in https://github.com/actions/toolkit/tree/main/packages/cache to make sure we are compatible, picking the same chunk sizes, etc.

- https://github.com/google/wireit/actions/runs/2325580687 is a run that tests this PR against the live servers by installing from an `npm pack`ed version. https://github.com/google/wireit/actions/runs/2325616423 is a follow up re-run, to show that everything got restored from cache.

Fixes #107
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants