Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move dev server image build into main CI workflow #141

Merged
merged 13 commits into from
Apr 21, 2023

Conversation

mark-idleman
Copy link

@mark-idleman mark-idleman commented Apr 12, 2023

https://replicahq.atlassian.net/browse/RAD-6212

Currently, our CI setup doesn’t build a docker image that can be used to spin up routing servers in the model repo - this step happens only when you kick off a functional test CI flow manually. In theory, this separation is intended, as we technically don’t want to “stamp” a runnable server image with dev before we’ve even run unit tests.

However, in practice, I basically never sit and wait for all unit tests to pass when I’m iterating quickly and need a runnable image to use ASAP. Instead, I wait until the “unit test” image finished building (the first part of the docker build process), then right away kick off a functional test build to get a working server image I can plug into the model repo.

This change moves that server dev image building step to the main CI flow. The change technically goes against the intended pattern of our existing CI/functional test flow, but in practice would save me loads of time that’s currently wasted waiting for CI to pass a particular step and then manually kicking off another flow

Copy link

@bryanwilliams1025 bryanwilliams1025 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, had a couple more thoughts after chewing on this for a bit. I'm all for easier iteration with the model repo, but I'm wondering if we can keep some of the protections we have in place and only work around them when we have to. For instance, with this change CI would now build server-dev and server-sandbox on every push to a branch, right? Usually we only need to use the image when running the FTs after merging to master, so we wouldn't need all the intermediate builds. This could waste some compute (and docker storage? are we charged for that?). Also, CI does these new builds before running the unit tests, so every push to branch would take longer to get basic correctness feedback even when not trying to quickly deploy to model...and we could potentially build buggy images.

Is there a way to keep our builds fast and cheap in the usual case, but still provide a convenient break-glass solution when a dev needs to quickly deploy to the model repo? Github supports automatically running a given workflow if a branch matches some regex, and elsewhere in industry I've seen that used to enable specific DevX workflows like this. Maybe we could have a build-server-breakglass.yaml workflow or something that's triggered if the branch name contains breakglass or buildme or something? We could also keep the new builds in build_test_push.yaml with a branch name condition or something, but then we could potentially try to build and publish the same image twice. In a separate workflow, we could just use different suffixes entirely so we avoid conflicts (e.g. breakglass-server-dev instead of server-dev).

Anyways, lemme know what you think. Not trying to unnecessarily complicate things here, just want to discuss some of the potential downsides to the standard case where we're not trying to quickly deploy.

@mark-idleman
Copy link
Author

mark-idleman commented Apr 20, 2023

@bryanwilliams1025 great points, thanks for taking another look at this! I messed around with a different approach over the last week and think I found something that checks all of the boxes 🤞

You're correct that previous approach in this PR would be wasting a lot of disk space by forcing server + sandbox image pushes on every commit, vs. just when the func test ran. One minor clarification on compute time though - only the first step (building the "base" image) takes a long time. The other build phases are essentially just adding a docker CMD to make executable images that start routing servers, so those build steps only take a few seconds.

This got me thinking about how I could accomplish what I wanted (getting server images I can test with built as quickly as possible in the main CI flow) while also not pushing any more images to GCR than we do today. Here's what I came up with:

  • Build the base image as usual, but don't push to our remote GCR repo; instead, push to the local docker repo that's been spun up in earlier steps in the github action
  • Build the server image right away, using the locally-stored base image as BASE_IMAGE; push the server image to GCR
  • Run unit tests as we normally would, using the locally-stored base image
  • Still build the sandbox image in the functional test workflow, but use the server-dev image as BASE_IMAGE, instead of the base image (as we used to). This isn't an issue because the sandbox build (Dockerfile.sandbox) just copies a few files and adds a CMD to it's base image, and CMDs in child dockerfiles overwrite CDMs declared in their parent dockerfile

This actually saves us 1/3 [some] disk space vs. our old build setup, because we now only push 2 images to GCR (server-dev every time and server-sandbox in func test) vs. 3 before (base image every time, server-dev + server-sandbox in func test), and still gives us reasonable unit test/functional test behavior

How I tested:

How does this sound to you?

uses: docker/build-push-action@v2
with:
context: .
push: true
push: false
load: true
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load: true is what tells docker to push/pull from the local docker repo. vs remote

@@ -54,6 +54,8 @@ jobs:
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@master
with:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was needed due to a weird detail of how load: true works. Some context here, it failed without this and then worked when I added it, so I didn't dig too much deeper 🤷

@mark-idleman mark-idleman changed the title Move dev server + sandbox image builds into main CI workflow Move dev server image build into main CI workflow Apr 20, 2023
Copy link

@bryanwilliams1025 bryanwilliams1025 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beautiful, i like the new approach! strikes a great balance between unblocking deploying to model and keeping storage and build times minimal. appreciate you iterating here!

uses: docker/build-push-action@v2
with:
context: .
push: false

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we specify push: true here and get rid of the following Push server-dev image step?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm great question, I vaguely recall hitting some issue with this but I can't find the relevant stack overflow post now lol, so let's try it 🤞

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, at least the error is explicit 😀

Error: buildx failed with: ERROR: push and load may not be set together at the moment

.github/workflows/build_test_push.yaml Outdated Show resolved Hide resolved
@mark-idleman mark-idleman merged commit 9dbf0c6 into original-direction Apr 21, 2023
@mark-idleman mark-idleman deleted the ci_image_build_earlier branch April 21, 2023 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants