campaigns: add and use volume mounts by default on Intel macOS #412

LawnGnome · 2020-12-25T01:59:05Z

Background

As we now know all too well, the default bind mount behaviour we use to manage workspaces when executing campaigns is slow on macOS: any file I/O in the workspace has to go through the loopback interface, and this causes problems with I/O heavy operations such as pretty much anything to do with npm or yarn.

The steps we perform within a workspace are extremely simple, however: we unzip the repository archive and run a variety of git commands to generate the diff. It's handy for this to be done on the host filesystem, since it allows the user to inspect what happens to their repository if a campaign step fails, but it's not a technical requirement.

What's in the box PR

This PR generalises our existing concept of a workspace, and expands it significantly. Workspace modes are now defined by implementing a WorkspaceCreator interface, which knows how to create Workspace implementations which implement the common operations we need to set up, diff, and tear down workspaces. This is used to add a new workspace mode that uses a Docker volume to contain the workspace, rather than bind mounting the workspace from the host. This is controlled by the new -workspace flag, and the default for Intel macOS has been switched to the new volume mode.

For this to work, we have to perform the workspace management commands — the aforementioned unzipping and git malarkey — within a container, since that's the only way to access the volume. As such, part of this PR adds a Docker image that extends alpine to always have git available and configured appropriately, along with unzip and curl. A GitHub Action has been added that will rebuild and push this image on src release.

There's also a lot of new testing stuff: the volume workspace code is copiously unit tested. In order to support this, there's a (very) minimal implementation of a mock API client, and an expanded ~~ripoff~~ implementation of the method Go's os/exec test suite uses to mock external commands, which means the new tests don't need dummydocker and work on Windows.

Adam's guess at obvious questions

Why is CI failing?

The os/exec method for mocking external commands basically redirects exec.Cmd calls back into the test binary, which gains a little bit of logic in its TestMain to handle those correctly.

This works fine cross-platform, but something about AppVeyor specifically makes this not work on Windows. The errors are fairly inscrutable. However, the approach doesn't have any systemic issue on Windows: it works fine both for me locally, and in GitHub Actions, which now has Windows runner support.

I've opened #415, which is included in this PR as well, to remove our AppVeyor support and only use GitHub Actions for CI. If we decide to go ahead with that, then we can remove the AppVeyor integration and this PR will start passing.

How much faster is this on macOS?

It depends.

For campaigns that generate small-to-medium sized diffs and do little I/O, there isn't much difference: maybe a few percent here and there. For campaigns that generate large diffs and do little I/O, this can actually be slower. (That feels like a really weird case, though.)

For campaigns that perform lots of I/O, this is a big win. A test campaign that upgrades TypeScript in Sourcegraph completes in approximately a quarter of the time in volume mode compared to bind mode. (~11 minutes compared to ~42.)

Why change the default on macOS only?

Volume mode isn't universally better. On Linux (assuming nothing weird like a remote Docker server), there's no reason not to use bind mount mode: the performance is the same, and you have the advantage that you can inspect what happened in your workspace if a step fails.

Why change the default on Intel macOS only?

Conservatism. I've put in a bunch of effort to make multi-architecture builds of the Docker image needed by volume mode work, but I don't have an M1 Mac sitting around to test this. It should work, but I'd prefer to find that out by testing it using -workspace volume explicitly, rather than finding out that it doesn't work after making it the default.

What about Windows?

My suspicion is that this will provide similar performance improvements on Windows, but I don't have good numbers on this right now, don't have an environment ready to go to prove that out, and I'd prefer to focus on macOS for now. It's easy enough to change the default later.

That said, I think there's another reason we should consider this for Windows sooner rather than later: it effectively removes the requirement to have git in your PATH, and removes any potential for Windows-specific git weirdness.

What future work could we do to improve this further?

Glad you asked! I had more ideas, but this PR was more than big enough already.

Bind mode tests: now that we have comprehensive tests for volume workspaces, we could retrofit the same techniques onto the bind workspace tests.
Volume mode debugging: right now, if a step fails, the volume is removed, which means the user would have to do some hackery using something like sleep to exec into the container to inspect the workspace if that was necessary. We should probably provide some sort of interactive debugging mode where you get dropped into a container with the workspace already set up and running on error. (I mean, this is probably a good idea for bind mode, too.)
Volume container communication: I have a suspicion that the large diff slowness noted above might be more to do with extracting the diff from the Docker container via stdout than anything endemic to the approach. Another way of doing this would be to have the utility container running the whole time while executing a campaign on a repository, and provide some sort of RPC interface to perform setup and teardown and (most importantly) get diffs without having to go through Docker.
Exec output: to allow mocking external commands, the volume workspace uses the new internal/exec package instead of os/exec directly. If we migrate other modules to use this, then we could use that down the track to provide verbose logging of every command that's run, since command execution will go through a central place.

PR links

Skeletal end user documentation is provided by sourcegraph/sourcegraph#16979.

Fixes sourcegraph/sourcegraph#16809.

This mode is added by sourcegraph/src-cli#412.

mrnugget

Fantastic, thorough work - docs, tests, scripts, code. And things can get up to 4 times faster. Good start into the new year :)

CHANGELOG.md

DEVELOPMENT.md

internal/api/mock/mock.go

internal/campaigns/bind_workspace.go

internal/campaigns/bind_workspace_test.go

internal/campaigns/volume_workspace.go

internal/campaigns/volume_workspace_test.go

internal/exec/expect/expect.go

The tests don't pass, but this is getting closer.

LawnGnome · 2021-01-06T01:16:37Z

OK, so this has changed around a bit in response to @mrnugget's review. Most notably: there's now a RepoFetcher type that is responsible for fetching repos (gasp!) and is called by runSteps(), which means that repo fetching logic is no longer within the workspace types (and, as a side benefit, means that the volume workspace uses the archive cache correctly).

PTAL!

eseliger

Very nice work 🌟

.github/workflows/docker.yml

CHANGELOG.md

cmd/src/campaigns_common.go

internal/campaigns/repo_fetcher_test.go

internal/campaigns/run_steps.go

internal/campaigns/volume_workspace.go

internal/campaigns/workspace.go

Co-authored-by: Thorsten Ball <mrnugget@gmail.com>

This mode is added by sourcegraph/src-cli#412.

Co-authored-by: Thorsten Ball <mrnugget@gmail.com>

LawnGnome added the team/code-search label Dec 25, 2020

LawnGnome added this to the Campaigns Sprint 7 milestone Dec 25, 2020

LawnGnome self-assigned this Dec 25, 2020

sourcegraph-bot mentioned this pull request Dec 25, 2020

Campaigns Sprint 7 Tracking issue sourcegraph/sourcegraph-public-snapshot#16732

Closed

11 tasks

LawnGnome mentioned this pull request Dec 30, 2020

ci: migrate all CI to GitHub Actions #415

Merged

LawnGnome added a commit to sourcegraph/sourcegraph-public-snapshot that referenced this pull request Jan 1, 2021

doc: describe the new volume mount mode

4afe62a

This mode is added by sourcegraph/src-cli#412.

LawnGnome mentioned this pull request Jan 1, 2021

doc: describe the new volume mount mode sourcegraph/sourcegraph-public-snapshot#16979

Merged

LawnGnome changed the title ~~campaigns: add and use volume mounts by default on macOS~~ campaigns: add and use volume mounts by default on Intel macOS Jan 1, 2021

LawnGnome requested a review from a team January 1, 2021 02:25

LawnGnome marked this pull request as ready for review January 1, 2021 02:25

mrnugget approved these changes Jan 4, 2021

View reviewed changes

LawnGnome added 19 commits January 5, 2021 15:20

Initial refactor of workspace concerns.

f32ea10

More WIP.

fe2798c

More WIP.

c989ec0

The tests don't pass, but this is getting closer.

Add workflow for building the Docker image.

fadf6f3

First crack at a CHANGELOG.

dcde796

Fix test issue.

29bf3f6

Handle potential nil pointer in Files.

6bad0e4

Switch default on macOS.

e8e127e

Add initial workspace tests.

946d9bb

Flesh out volume workspace tests.

d937be1

Fix comments.

524e7e8

Temporary enable build on all pushes for testing.

c57bbeb

Fix username.

7ab154b

Replace the canned Action with a script.

e74781f

Run on Ubuntu 20.04.

59ee60b

Use text mode when logging in.

201b120

Only build the Docker image on tag push.

2779612

Multiplatform experimentation, part one.

cd7d70f

Multiplatform experimentation, part _n_.

baf2e90

Unify repo fetching into a single type.

155e213

LawnGnome force-pushed the aharvey/in-docker-workspace branch from 211151e to 155e213 Compare January 6, 2021 01:14

LawnGnome added 6 commits January 5, 2021 17:17

Remove unused API mock functionality.

0c2fb66

Fix Windows compatibility.

f1a5666

temp ci hackery

9e330c1

more ci fuckery

17f7c2c

revert ci fuckery

4c81744

Add PR to CHANGELOG.

df126e1

sourcegraph-bot mentioned this pull request Jan 6, 2021

Campaigns Sprint 8 Tracking issue sourcegraph/sourcegraph-public-snapshot#17030

Closed

18 tasks

LawnGnome modified the milestones: Campaigns Sprint 7, sprint 8, Campaigns Sprint 8 Jan 6, 2021

LawnGnome added the planned/CampaignsSprint7 label Jan 6, 2021

eseliger approved these changes Jan 6, 2021

View reviewed changes

.github/workflows/docker.yml Show resolved Hide resolved

Remove unused dependency.

605b0f1

mrnugget approved these changes Jan 7, 2021

View reviewed changes

LawnGnome and others added 5 commits January 7, 2021 13:29

Update CHANGELOG.md

7479a24

Co-authored-by: Thorsten Ball <mrnugget@gmail.com>

Update cmd/src/campaigns_common.go

ffe31df

Co-authored-by: Thorsten Ball <mrnugget@gmail.com>

Use filepath.Join.

b7d5505

Add documentation.

203868d

Migrate fetch utility functions to the right place.

13b0eed

LawnGnome merged commit 7ce5b18 into main Jan 7, 2021

LawnGnome deleted the aharvey/in-docker-workspace branch January 7, 2021 21:43

LawnGnome added a commit to sourcegraph/sourcegraph-public-snapshot that referenced this pull request Jan 7, 2021

doc: describe the new volume mount mode

86c2eff

This mode is added by sourcegraph/src-cli#412.

This was referenced Jan 8, 2021

campaigns: use a persistent RPC container in volume mode #418

Closed

campaigns: fall back to bind workspaces when running steps with mixed users #422

Merged

mrnugget mentioned this pull request Jan 19, 2021

pin-docker-images2 failing with src version 3.23.3 #432

Closed

scjohns pushed a commit that referenced this pull request Apr 24, 2023

campaigns: add and use volume mounts by default on Intel macOS (#412)

2e6bf37

Co-authored-by: Thorsten Ball <mrnugget@gmail.com>

campaigns: add and use volume mounts by default on Intel macOS #412

campaigns: add and use volume mounts by default on Intel macOS #412

Uh oh!

Conversation

LawnGnome commented Dec 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

What's in the box PR

Adam's guess at obvious questions

Why is CI failing?

How much faster is this on macOS?

Why change the default on macOS only?

Why change the default on Intel macOS only?

What about Windows?

What future work could we do to improve this further?

PR links

Uh oh!

mrnugget left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LawnGnome commented Jan 6, 2021

Uh oh!

eseliger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

LawnGnome commented Dec 25, 2020 •

edited

Loading