Remove quickstart as base image and TARGET_NETWORK for local embedded network #147

sreuland · 2025-10-14T22:45:00Z

Remove quickstart as a base image and all other references. This means also remove TARGET_NETWORK for option of running the standalone network locally in system-test image. instead, tests must now include TARGET_NETWORK_RPC_URL for a remote rpc url.

Add a callable workflow for system-test build/run.

Closes #92

… uses an external rpc server url via TARGET_NETWORK_RPC_URL

…st to docker

… container

leighmcculloch · 2025-10-16T02:37:27Z

.github/workflows/test-workflow.yml

+        if: needs.prepare-config.outputs.registry_allowed == 'true'
+        run: |
+          docker tag $SYSTEM_TEST_IMAGE ${{ needs.prepare-config.outputs.system-test-image-cache-registry }}/$SYSTEM_TEST_IMAGE
+          docker push ${{ needs.prepare-config.outputs.system-test-image-cache-registry }}/$SYSTEM_TEST_IMAGE


Do we get enough benefit to ship system test to dockerhub? I see there's also code in here for using artifacts. Would using the GitHub cache suffice instead of dockerhub? That's what we do in quickstart.

the cache doesn't work on forked repos, same as ghcr.io, so, the public dockerhub allows forked and non-forked pr to access the same cache, but only the non-forked pr's can push to the cache as they will have the gh token.

See below. Forks can access the cache. It's important to note that GitHub is pretty protective of the cache, in a good way, and so new PRs only access cache entries from their own branch, and the base branch (e.g. main). So cache entries from one PR don't effect other PRs.

Additionally, forks of a repository can create pull requests on the base branch and access caches on the base branch.

https://docs.github.com/en/actions/reference/workflows-and-actions/dependency-caching

Another thing to note is that when you run the quickstart workflow in the system-test repo, the cache's for the quickstart workflow when running in this repo are all cached inside the system-test repo.

These limitations are all good things though. Sharing caches across PRs and across repos is a recipe for cache poisoning, exploits, and surprising behavior. But I also don't think these limitations are a bad thing.

Ok, I need to understand the 10gb quota on cache per repo. In the case of nested callable workflows is that on the original initiating calling repo or the nearest calling repo? As in this case stellar-rpc->system-test-workflow->quickstart-workflow.

If the cache is enforced on one single repo in this case and quickstart and now system-test flows are both storing images to cache, it seems like may hit the 10gb quota quickly and/or evictions will negate some of the gains in that case? the system test image is about 3g.

I will try pushing up a change for caching and see how it runs as the advantages mentioned and simplicity are nice.

The cache is stored in the calling repo not the called repo.

For example, look at the cache on the stellar/rpc repo, it has images there created by the quickstart workflow: https://github.com/stellar/stellar-rpc/actions/caches

This means first use of the same images will result in a rebuild, but it's not that big of a deal imo. If that's a problem, there are some other safer things we can do using dockerhub instead pushing everything to dockerhub. For example, we could push all the intermediary images to dockerhub from the quickstart main branch only, and then pull those for layer reuse. That would be a bit safer than using dockerhub to reuse the images by name.

I can compress the docker tar file before caching to gha, which helps:

$ docker save stellar/system-test:cache_b54d49fa4159ec38 | zstd -19 -T0 -o myimage.tar.zstd /*stdin*\ : 18.81% ( 3.08 GiB => 593 MiB, myimage.tar.zstd)

Cache capacity needed depends on how many relevant cache versions are present, for system-test workflow the cache is permutations of system-test, stellar-cli, js-stellar-sdk refs. from stellar-rpc usage, it wouldn't change those much, as most pr's don't change the e2e.yml.

we could push all the intermediary images to dockerhub from the quickstart main branch only, and then pull those for layer reuse. That would be a bit safer than using dockerhub to reuse the images by name.

I think I could do docker buildx with --cache-from=type=gha and --cache-to=type=gha right here in workflow and won't do any explicit cache key checks instead just run the sytem-test image build every time in workflow and the duration will just depend on how many layers it finds from cache. This approach would achieve same obfuscation and not incur conflating changes to quickstart?

I'll try this out and start with compression also, --cache-to=type=gha,mode=max,compression=zstd and hopefully see similar reduction as i noted above. Later, if we see maxing out cache quota, could revisit this and considertype=registry,ref=docker.io/stellar/system-test:cache.

I think I could do docker buildx with --cache-from=type=gha and --cache-to=type=gha right here in workflow

I tried using the gha cache on the quickstart repo itself in stellar/quickstart#815, but it's not a good fit for that repos build process. It resulted in inefficient cache usage that wasn't only a size problem but slowed the build down.

It might work well for this repo, but just something to watch out, is that you won't have any control over what gets cached, and inefficient cache usage can slow a build down a lot.

yes, upon consecutive reruns of this workflow from a stellar-rpc/520 wherein no source code versions are changed the docker builds appear to still be building layers that would have expected to be present in gha cache from prior runs. i will look into verbose logging on docker and debug for a root cause on this.

@leighmcculloch , I found the immediate issue with type=gha not working in here, it was due to not including the crazy-max plugin to expose the gh runtime settings to buildkit. After using that was able to see some export to cache activity logged during builds via --cache-to but never imported from cache via --cache-from . In case it was just issue with gha or builder, I tried type=local and manually running the buildkit daemon as a container using moby:latest image instead of using docker/setup-buildx-action and still saw the same one way cache behavior with that. It seems like there's a lower issue between buildkitd process, builders using that daemone and the docker daemon on the runner as caching doesn't work between them.

So, I reverted back to caching per system-test and cli images, and this callable workflow now has expected cache hit behavior. verified by invoking the workflow and cache hits on a stellar-rpc/e2e run

If you can re-review this pr for the final cache setup in here, once it merges, I'll get the stellar-rpc/520 pointed at system-test:master and move that up for review also. Thanks!

@leighmcculloch , bumping this in case was lost in the emails, I was asking for re-review with the final caching strategy I used, thanks!

…n image build when running tests

… use ref term

…em test

…i image

…t for cache path

…ner host cache dir

…-test image cache key

leighmcculloch

Looks good. Looks like this reduces the infra in here a lot, which is a good thing.

In terms of the long term vision, system-test is less clear to me. It depends how the tests get built out / or how the test evolve here and what gets put here to test. Looking forward to seeing how that plays out.

.github/workflows/Dockerfile.yml

sreuland added 14 commits October 12, 2025 21:11

#92: removed quickstart as base image and TARGET_NETWORK, as now only…

3c14b11

… uses an external rpc server url via TARGET_NETWORK_RPC_URL

#92: added callable workflow using qs pipeline from system-test

edd58db

#92: fixed cli ref url parsing for sha lookup

fef8f6b

#92: fixed cli ref url parse with delimeter

625411e

#92: fixed system test checkout to workspace in job

e62f762

get the proper commit sha for system test git ref job parameter

5a0766e

use artifacts for image across jobs on forked prs

bf2c832

use tmp for artifact path

6ff5513

use docker.io for image cache

100b62c

add some debug output on artifact upload

f6efb85

add more debug output

f6f7324

run quickstart first as it resets docker context, then load system te…

2c18c7f

…st to docker

add ddebug out on startup

786126c

use add-host to reference quickstart container ports from system test…

e844ce2

… container

sreuland marked this pull request as ready for review October 15, 2025 17:38

sreuland added 3 commits October 15, 2025 11:01

#92: renamed new workflow to test-workflow

e8ce9e8

#92: refined registry allowed check for dockerhub

4355789

#92: fix check for absent docker creds

d83626d

leighmcculloch reviewed Oct 16, 2025

View reviewed changes

sreuland added 11 commits October 16, 2025 11:51

fixed job filter to check for skipped due to image cache or success o…

9305bc3

…n image build when running tests

#92: make rpc version an input variable and all input component names…

68bf397

… use ref term

#92: allow cli crate version input variable on workflow

c71f463

#92: consistent js stellar sdk gh input var names

77445f4

#92: consistent input variable names for soroban examples repo/ref

ab61d6d

#92: fixed cli repo docker git url for context format

f396c7a

#92: use gha caches for docker build layer caching

6f8fddd

#92: fix gha cache verstion syntax on step

ac79f9f

#92: remove unnecessary cache key creation

60f098c

#92: use artifacts instead of cache for interim docker image for syst…

405dab9

…em test

#92: specify system-test repo for gha cache ownership

34febf8

sreuland added 25 commits November 4, 2025 20:59

#92: corrected debug on buildx

272cc86

#92: don't use equals on the buildx cache params

ff8a819

#92: don't use quotes on makefile variable

9ec6778

#92: use new horizon and friendbot refs to quickstart flow

8b77666

#92: include crazy-max gha action to assist docker gha caching

68bbc32

#92: use docker container driver for buildx

64ad22d

#92: fix getEvents endLedger requirement for latest v24 js sdk

0357394

#92: use docker-container driver for cache exports, oci export for cl…

de79a02

…i image

#92: use docker-container for driver with buildx on callable workflow

25b5aec

#92: focus on local buildx cache using incremental action cache keys

3ef80d1

#92: trying to use volume mount from buildkit container to runner hos…

a3e28fc

…t for cache path

#92: use driver-opt param for vol mount

3d77671

#92: run buildkitd container directly to specify vol mount to gha run…

5c4485d

…ner host cache dir

#92: fix unix socket name, don't use setup-buildx action, not needed

de1f98a

#92: fix vol mounts for socket path on buildkitd run

e0eb6e5

#92: print logs of buildkit container start error

f91366a

#92: use tcp port for buildkitd instead of unix socket to gha runner

3fdc993

#92: revert back to image file cache pattern for ci build

6b90e1f

#92: revert back to doing cache per image on callable workflow

1923043

#92: pass the compressed image tar from build to run job

92ba87e

#92: added console output for compression stats on image files

5c00664

#92 use lower compression ration on image files for faster time

4316c54

#92 console verbose logging on compression status

4a6862e

#92: include the git ref for system-test version in the hashed system…

16aa20c

…-test image cache key

#92: resolve component refs to shas for cache keys

df2e9f4

sreuland requested a review from leighmcculloch November 18, 2025 23:48

leighmcculloch approved these changes Nov 23, 2025

View reviewed changes

.github/workflows/Dockerfile.yml Show resolved Hide resolved

sreuland merged commit b140302 into master Nov 25, 2025
5 checks passed

sreuland deleted the no_quickstart branch November 25, 2025 18:42

sreuland mentioned this pull request Nov 25, 2025

e2e tests: use system test callable workflow stellar/stellar-rpc#520

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove quickstart as base image and TARGET_NETWORK for local embedded network #147

Remove quickstart as base image and TARGET_NETWORK for local embedded network #147

Uh oh!

sreuland commented Oct 14, 2025

Uh oh!

leighmcculloch Oct 16, 2025

Uh oh!

sreuland Oct 17, 2025

Uh oh!

leighmcculloch Oct 17, 2025 •

edited

Loading

Uh oh!

sreuland Oct 17, 2025

Uh oh!

leighmcculloch Oct 17, 2025

Uh oh!

sreuland Oct 17, 2025

Uh oh!

leighmcculloch Oct 29, 2025 •

edited

Loading

Uh oh!

sreuland Oct 29, 2025

Uh oh!

sreuland Nov 18, 2025

Uh oh!

sreuland Nov 21, 2025

Uh oh!

leighmcculloch left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Remove quickstart as base image and TARGET_NETWORK for local embedded network #147

Remove quickstart as base image and TARGET_NETWORK for local embedded network #147

Uh oh!

Conversation

sreuland commented Oct 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leighmcculloch Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leighmcculloch Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leighmcculloch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

leighmcculloch Oct 17, 2025 •

edited

Loading

leighmcculloch Oct 29, 2025 •

edited

Loading