-
Notifications
You must be signed in to change notification settings - Fork 6
Remove quickstart as base image and TARGET_NETWORK for local embedded network #147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… uses an external rpc server url via TARGET_NETWORK_RPC_URL
.github/workflows/test-workflow.yml
Outdated
| if: needs.prepare-config.outputs.registry_allowed == 'true' | ||
| run: | | ||
| docker tag $SYSTEM_TEST_IMAGE ${{ needs.prepare-config.outputs.system-test-image-cache-registry }}/$SYSTEM_TEST_IMAGE | ||
| docker push ${{ needs.prepare-config.outputs.system-test-image-cache-registry }}/$SYSTEM_TEST_IMAGE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we get enough benefit to ship system test to dockerhub? I see there's also code in here for using artifacts. Would using the GitHub cache suffice instead of dockerhub? That's what we do in quickstart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the cache doesn't work on forked repos, same as ghcr.io, so, the public dockerhub allows forked and non-forked pr to access the same cache, but only the non-forked pr's can push to the cache as they will have the gh token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below. Forks can access the cache. It's important to note that GitHub is pretty protective of the cache, in a good way, and so new PRs only access cache entries from their own branch, and the base branch (e.g. main). So cache entries from one PR don't effect other PRs.
Additionally, forks of a repository can create pull requests on the base branch and access caches on the base branch.
https://docs.github.com/en/actions/reference/workflows-and-actions/dependency-caching
Another thing to note is that when you run the quickstart workflow in the system-test repo, the cache's for the quickstart workflow when running in this repo are all cached inside the system-test repo.
These limitations are all good things though. Sharing caches across PRs and across repos is a recipe for cache poisoning, exploits, and surprising behavior. But I also don't think these limitations are a bad thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I need to understand the 10gb quota on cache per repo. In the case of nested callable workflows is that on the original initiating calling repo or the nearest calling repo? As in this case stellar-rpc->system-test-workflow->quickstart-workflow.
If the cache is enforced on one single repo in this case and quickstart and now system-test flows are both storing images to cache, it seems like may hit the 10gb quota quickly and/or evictions will negate some of the gains in that case? the system test image is about 3g.
I will try pushing up a change for caching and see how it runs as the advantages mentioned and simplicity are nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache is stored in the calling repo not the called repo.
For example, look at the cache on the stellar/rpc repo, it has images there created by the quickstart workflow: https://github.com/stellar/stellar-rpc/actions/caches
This means first use of the same images will result in a rebuild, but it's not that big of a deal imo. If that's a problem, there are some other safer things we can do using dockerhub instead pushing everything to dockerhub. For example, we could push all the intermediary images to dockerhub from the quickstart main branch only, and then pull those for layer reuse. That would be a bit safer than using dockerhub to reuse the images by name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can compress the docker tar file before caching to gha, which helps:
$ docker save stellar/system-test:cache_b54d49fa4159ec38 | zstd -19 -T0 -o myimage.tar.zstd
/*stdin*\ : 18.81% ( 3.08 GiB => 593 MiB, myimage.tar.zstd)
Cache capacity needed depends on how many relevant cache versions are present, for system-test workflow the cache is permutations of system-test, stellar-cli, js-stellar-sdk refs. from stellar-rpc usage, it wouldn't change those much, as most pr's don't change the e2e.yml.
we could push all the intermediary images to dockerhub from the quickstart main branch only, and then pull those for layer reuse. That would be a bit safer than using dockerhub to reuse the images by name.
I think I could do docker buildx with --cache-from=type=gha and --cache-to=type=gha right here in workflow and won't do any explicit cache key checks instead just run the sytem-test image build every time in workflow and the duration will just depend on how many layers it finds from cache. This approach would achieve same obfuscation and not incur conflating changes to quickstart?
I'll try this out and start with compression also, --cache-to=type=gha,mode=max,compression=zstd and hopefully see similar reduction as i noted above. Later, if we see maxing out cache quota, could revisit this and considertype=registry,ref=docker.io/stellar/system-test:cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I could do docker buildx with
--cache-from=type=ghaand--cache-to=type=gharight here in workflow
I tried using the gha cache on the quickstart repo itself in stellar/quickstart#815, but it's not a good fit for that repos build process. It resulted in inefficient cache usage that wasn't only a size problem but slowed the build down.
It might work well for this repo, but just something to watch out, is that you won't have any control over what gets cached, and inefficient cache usage can slow a build down a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, upon consecutive reruns of this workflow from a stellar-rpc/520 wherein no source code versions are changed the docker builds appear to still be building layers that would have expected to be present in gha cache from prior runs. i will look into verbose logging on docker and debug for a root cause on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leighmcculloch , I found the immediate issue with type=gha not working in here, it was due to not including the crazy-max plugin to expose the gh runtime settings to buildkit. After using that was able to see some export to cache activity logged during builds via --cache-to but never imported from cache via --cache-from . In case it was just issue with gha or builder, I tried type=local and manually running the buildkit daemon as a container using moby:latest image instead of using docker/setup-buildx-action and still saw the same one way cache behavior with that. It seems like there's a lower issue between buildkitd process, builders using that daemone and the docker daemon on the runner as caching doesn't work between them.
So, I reverted back to caching per system-test and cli images, and this callable workflow now has expected cache hit behavior. verified by invoking the workflow and cache hits on a stellar-rpc/e2e run
If you can re-review this pr for the final cache setup in here, once it merges, I'll get the stellar-rpc/520 pointed at system-test:master and move that up for review also. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leighmcculloch , bumping this in case was lost in the emails, I was asking for re-review with the final caching strategy I used, thanks!
…n image build when running tests
…ner host cache dir
…-test image cache key
leighmcculloch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Looks like this reduces the infra in here a lot, which is a good thing.
In terms of the long term vision, system-test is less clear to me. It depends how the tests get built out / or how the test evolve here and what gets put here to test. Looking forward to seeing how that plays out.
Remove quickstart as a base image and all other references. This means also remove TARGET_NETWORK for option of running the standalone network locally in system-test image. instead, tests must now include TARGET_NETWORK_RPC_URL for a remote rpc url.
Add a callable workflow for system-test build/run.
Closes #92