Switch to larger Helios instances for CI by emilyalbini · Pull Request #10018 · oxidecomputer/omicron

emilyalbini · 2026-03-10T12:19:32Z

This PR switches the slowest Helios jobs to run on larger instance sizes, significantly speeding up CI times. Along with this I already deployed a Buildomat configuration change to run all Helios jobs on Zen 4 AWS instances, instead of Zen 3 instances either on AWS or lab Gimlets. Together, these two changes should bring CI times down considerably.

Unfortunately we cannot use Zen 5 AWS instances (like we did on Linux) until oxidecomputer/stlouis#938 is fixed.

build-and-test (helios)

	Architecture	CPU cores	RAM	Execution time	Price per build
Old:	Zen 3	8	32 GB	135 minutes	$1.04
	Zen 4	8	32 GB	110 minutes	$0.85
	Zen 4	16	128 GB	58 minutes	$1.24
New:	Zen 4	32	256 GB	45 minutes	$1.87

This job was actually slowed down for a nondeterministic amount of time by it running out of memory and being forced to aggressively page memory to disk. Turns out it was using around 150% of the RAM the VM had allocated. Switching to memory-optimized AWS instances (2x the RAM) fixed the problem.

The switch from 16 cores to 32 cores is fairly expensive and has diminishing returns, like for the Linux instance, but still, it's a 15 minutes win. When we switch to Zen 5 it might be worth it to go back to 16 cores.

omicron-common

	Architecture	CPU cores	RAM	Execution time	Price per build
Old:	Zen 3	8	32 GB	6 minutes	$0.05
New:	Zen 4	8	32 GB	5 minutes	$0.04

The switch has negligible impact on a job this short, but it's not worth it to create a dedicated target just to keep this job back on Zen 3. So it gets unintentionally updated to Zen 4.

helios / package

	Architecture	CPU cores	RAM	Execution time	Price per build
Old:	Zen 3	8	32 GB	46 minutes	$0.35
	Zen 4	8	32 GB	30 minutes	$0.23
New:	Zen 4	16	64 GB	24 minutes	$0.37

The wins from 8 cores to 16 cores are not that impressive, but this job is a dependency of the "deploy" job which we cannot really speed up (it needs to run on a lab Gimlet, and we can't shard it as far as I'm aware), so any time we can shave is worth it.

helios / build TUF repo

	Architecture	CPU cores	RAM	Execution time	Price per build
Old:	Zen 3	8	32 GB	80 minutes	$0.62
New:	Zen 4	16	128 GB	43 minutes	$0.92

Similarly to the build and test job this was paging memory to disk due to not having enough memory in the VM (even though to a less extent). After the size increase there was a lot of single-thread CPU, so I didn't bother testing more cores.

check-features (helios)

	Architecture	CPU cores	RAM	Execution time	Price per build
Old:	Zen 3	8	32 GB	38 minutes	$0.29
	Zen 4	1	4 GB	55 minutes	$0.04
	Zen 4	2	8 GB	38 minutes	$0.07
New:	Zen 4	8	32 GB	24 minutes	$0.19

This job was mostly single-threaded so I tried aggressively reducing the VM size but with mixed results. In the end decided to keep it with the now-Zen4 standard target.

clippy (helios)

	Architecture	CPU cores	RAM	Execution time	Price per build
Old:	Zen 3	8	32 GB	26 minutes	$0.20
New:	Zen 4	8	32 GB	18 minutes	$0.14
	Zen 4	16	64 GB	17 minutes	$0.26

Turns out there was zero benefit going from 8 to 16 cores for this job.

davepacheco

Looks great (once you rebase onto #10001).

emilyalbini requested review from smklein and sunshowers March 10, 2026 12:19

davepacheco approved these changes Mar 10, 2026

View reviewed changes

switch some helios jobs to larger VMs

1a2568e

emilyalbini force-pushed the ea-large-helios branch from 2a54172 to 1a2568e Compare March 10, 2026 18:00

emilyalbini enabled auto-merge (squash) March 10, 2026 18:01

sunshowers approved these changes Mar 10, 2026

View reviewed changes

emilyalbini merged commit a4f3f9d into main Mar 10, 2026
16 checks passed

emilyalbini deleted the ea-large-helios branch March 10, 2026 19:20

sruggier mentioned this pull request Mar 12, 2026

More cleanup related to error logging #9942

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to larger Helios instances for CI#10018

Switch to larger Helios instances for CI#10018
emilyalbini merged 1 commit intomainfrom
ea-large-helios

emilyalbini commented Mar 10, 2026 •

edited

Loading

Uh oh!

davepacheco left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

emilyalbini commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

build-and-test (helios)

omicron-common

helios / package

helios / build TUF repo

check-features (helios)

clippy (helios)

Uh oh!

davepacheco left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

emilyalbini commented Mar 10, 2026 •

edited

Loading