Move dev server image build into main CI workflow #141

mark-idleman · 2023-04-12T21:01:20Z

https://replicahq.atlassian.net/browse/RAD-6212

Currently, our CI setup doesn’t build a docker image that can be used to spin up routing servers in the model repo - this step happens only when you kick off a functional test CI flow manually. In theory, this separation is intended, as we technically don’t want to “stamp” a runnable server image with dev before we’ve even run unit tests.

However, in practice, I basically never sit and wait for all unit tests to pass when I’m iterating quickly and need a runnable image to use ASAP. Instead, I wait until the “unit test” image finished building (the first part of the docker build process), then right away kick off a functional test build to get a working server image I can plug into the model repo.

This change moves that server dev image building step to the main CI flow. The change technically goes against the intended pattern of our existing CI/functional test flow, but in practice would save me loads of time that’s currently wasted waiting for CI to pass a particular step and then manually kicking off another flow

bryanwilliams1025

Hey, had a couple more thoughts after chewing on this for a bit. I'm all for easier iteration with the model repo, but I'm wondering if we can keep some of the protections we have in place and only work around them when we have to. For instance, with this change CI would now build server-dev and server-sandbox on every push to a branch, right? Usually we only need to use the image when running the FTs after merging to master, so we wouldn't need all the intermediate builds. This could waste some compute (and docker storage? are we charged for that?). Also, CI does these new builds before running the unit tests, so every push to branch would take longer to get basic correctness feedback even when not trying to quickly deploy to model...and we could potentially build buggy images.

Is there a way to keep our builds fast and cheap in the usual case, but still provide a convenient break-glass solution when a dev needs to quickly deploy to the model repo? Github supports automatically running a given workflow if a branch matches some regex, and elsewhere in industry I've seen that used to enable specific DevX workflows like this. Maybe we could have a build-server-breakglass.yaml workflow or something that's triggered if the branch name contains breakglass or buildme or something? We could also keep the new builds in build_test_push.yaml with a branch name condition or something, but then we could potentially try to build and publish the same image twice. In a separate workflow, we could just use different suffixes entirely so we avoid conflicts (e.g. breakglass-server-dev instead of server-dev).

Anyways, lemme know what you think. Not trying to unnecessarily complicate things here, just want to discuss some of the potential downsides to the standard case where we're not trying to quickly deploy.

…ed to gcr)

mark-idleman · 2023-04-20T19:56:43Z

@bryanwilliams1025 great points, thanks for taking another look at this! I messed around with a different approach over the last week and think I found something that checks all of the boxes 🤞

You're correct that previous approach in this PR would be wasting a lot of disk space by forcing server + sandbox image pushes on every commit, vs. just when the func test ran. One minor clarification on compute time though - only the first step (building the "base" image) takes a long time. The other build phases are essentially just adding a docker CMD to make executable images that start routing servers, so those build steps only take a few seconds.

This got me thinking about how I could accomplish what I wanted (getting server images I can test with built as quickly as possible in the main CI flow) while also not pushing any more images to GCR than we do today. Here's what I came up with:

Build the base image as usual, but don't push to our remote GCR repo; instead, push to the local docker repo that's been spun up in earlier steps in the github action
Build the server image right away, using the locally-stored base image as BASE_IMAGE; push the server image to GCR
Run unit tests as we normally would, using the locally-stored base image
Still build the sandbox image in the functional test workflow, but use the server-dev image as BASE_IMAGE, instead of the base image (as we used to). This isn't an issue because the sandbox build (Dockerfile.sandbox) just copies a few files and adds a CMD to it's base image, and CMDs in child dockerfiles overwrite CDMs declared in their parent dockerfile

This actually saves us ~~1/3~~ [some] disk space vs. our old build setup, because we now only push 2 images to GCR (server-dev every time and server-sandbox in func test) vs. 3 before (base image every time, server-dev + server-sandbox in func test), and still gives us reasonable unit test/functional test behavior

How I tested:

successfully ran functional test
Used resulting sandbox image to successfully build sandbox router (hasn't quite finished running, 🤞 )

How does this sound to you?

mark-idleman · 2023-04-20T19:57:36Z

.github/workflows/build_test_push.yaml

        uses: docker/build-push-action@v2
        with:
          context: .
-          push: true
+          push: false
+          load: true


load: true is what tells docker to push/pull from the local docker repo. vs remote

mark-idleman · 2023-04-20T19:58:08Z

.github/workflows/build_test_push.yaml

@@ -54,6 +54,8 @@ jobs:
      - name: Set up Docker Buildx
        id: buildx
        uses: docker/setup-buildx-action@master
+        with:


This was needed due to a weird detail of how load: true works. Some context here, it failed without this and then worked when I added it, so I didn't dig too much deeper 🤷

bryanwilliams1025

beautiful, i like the new approach! strikes a great balance between unblocking deploying to model and keeping storage and build times minimal. appreciate you iterating here!

bryanwilliams1025 · 2023-04-20T22:08:56Z

.github/workflows/build_test_push.yaml

+        uses: docker/build-push-action@v2
+        with:
+          context: .
+          push: false


can we specify push: true here and get rid of the following Push server-dev image step?

Hmm great question, I vaguely recall hitting some issue with this but I can't find the relevant stack overflow post now lol, so let's try it 🤞

Aha, at least the error is explicit 😀

Error: buildx failed with: ERROR: push and load may not be set together at the moment

.github/workflows/build_test_push.yaml

This reverts commit b47f09c.

Move dev server + sandbox image builds into main CI workflow

5b088b8

mark-idleman requested review from danielhfrank and bryanwilliams1025 April 12, 2023 21:01

bryanwilliams1025 reviewed Apr 12, 2023

View reviewed changes

mark-idleman added 9 commits April 14, 2023 15:23

don't push main build image; try running unit tests with server-dev

d2095dd

move sandbox build back to func test flow

dd4598f

make sure sandbox build uses server-dev as base image (it's been push…

e930626

…ed to gcr)

try building base image with load: true

82089b2

use load: true for both build steps, push to gcr separately

bb868a9

try setting buildx driver to docker

422d2c6

use base build image in unit test (access to mvn needed)

0a0364e

fix comment

cfd1e53

fix up comment + job name in func test flow

b0716a2

mark-idleman commented Apr 20, 2023

View reviewed changes

mark-idleman changed the title ~~Move dev server + sandbox image builds into main CI workflow~~ Move dev server image build into main CI workflow Apr 20, 2023

bryanwilliams1025 approved these changes Apr 20, 2023

View reviewed changes

mark-idleman added 3 commits April 21, 2023 11:40

comment

83fee45

try removing explicit docker push step

b47f09c

Revert "try removing explicit docker push step"

7b239e0

This reverts commit b47f09c.

mark-idleman merged commit 9dbf0c6 into original-direction Apr 21, 2023

mark-idleman deleted the ci_image_build_earlier branch April 21, 2023 19:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move dev server image build into main CI workflow #141

Move dev server image build into main CI workflow #141

mark-idleman commented Apr 12, 2023 •

edited

Loading

bryanwilliams1025 left a comment

mark-idleman commented Apr 20, 2023 •

edited

Loading

mark-idleman Apr 20, 2023

mark-idleman Apr 20, 2023

bryanwilliams1025 left a comment

bryanwilliams1025 Apr 20, 2023

mark-idleman Apr 21, 2023

mark-idleman Apr 21, 2023

Move dev server image build into main CI workflow #141

Move dev server image build into main CI workflow #141

Conversation

mark-idleman commented Apr 12, 2023 • edited Loading

bryanwilliams1025 left a comment

Choose a reason for hiding this comment

mark-idleman commented Apr 20, 2023 • edited Loading

mark-idleman Apr 20, 2023

Choose a reason for hiding this comment

mark-idleman Apr 20, 2023

Choose a reason for hiding this comment

bryanwilliams1025 left a comment

Choose a reason for hiding this comment

bryanwilliams1025 Apr 20, 2023

Choose a reason for hiding this comment

mark-idleman Apr 21, 2023

Choose a reason for hiding this comment

mark-idleman Apr 21, 2023

Choose a reason for hiding this comment

mark-idleman commented Apr 12, 2023 •

edited

Loading

mark-idleman commented Apr 20, 2023 •

edited

Loading