Add an AmdTurinV2 CPU platform by iximeow · Pull Request #10560 · oxidecomputer/omicron

iximeow · 2026-06-06T00:51:22Z

As chronicled in oxidecomputer/propolis#1152, the hopes and dreams we had in RFD 314 of avoiding reporting cache information to guests have been dashed by a (since-fixed!) glibc bug.

So, in support of guest software, add cache information into the AmdTurin CPU platform series.

The older AmdMilan platform reflects what guests happened to see when moving to CPU platforms, meaning every AmdMilan guest believes itself to have a 256MiB L3. Given the Turin V2 strategy for describing L3 here, we probably should go back and add a new rev of the Milan platform that does similar.

Note that since Propolis#940 is still an open issue (until I've got the broader virtual platform work up and landed), a VM with an AmdTurinV2 CPU platform will think itself to have vCPU-many more-nicely-sized L3s, much like how an AmdMilan guest also believes itself to have vCPU-many 256MiB L3 caches. There isn't anything to do to address this in the Nexus-side CPUID leaf generation, since the topology that's incorrect is specialized by Propolis per-cpu when setting up the VM, so we can't help that issue along the way here.

As chronicled in oxidecomputer/propolis#1152, the hopes and dreams we had in RFD 314 of avoiding reporting cache information to guests have been dashed by a (since-fixed!) glibc bug. So, in support of guest software, add cache information into the AmdTurin CPU platform series. The older AmdMilan platform reflects what guests happened to see when moving to CPU platforms, meaning every AmdMilan guest believes itself to have a 256MiB L3. Given the Turin V2 strategy for describing L3 here, we probably should go back and add a new rev of the Milan platform that does similar. Note that since Propolis#940 is still an open issue (until I've got the broader virtual platform work up and landed), a VM with an AmdTurinV2 CPU platform will think itself to have vCPU-many more-nicely-sized L3s, much like how an AmdMilan guest also believes itself to have vCPU-many 256MiB L3 caches. There isn't anything to do to address this in the Nexus-side CPUID leaf generation, since the topology that's incorrect is specialized by Propolis per-cpu when setting up the VM, so we can't help that issue along the way here.

iximeow · 2026-06-06T00:59:24Z

since I'm in here fiddling with API docs: docs issue 773 gets into this more but:

RFD 314 is not published, so this is just confusing in the API docs

RFD 314 really only talks about how we went from "all the hardware features" to "AmdMilan".

cpu_platform.rs is (intended) to be more of a living document where we can describe why we did or didn't set up features in new platforms. that's also why there's so much elaboration in turin_v2 for a "small" change. so, strike the RFD mention and in the future we'll want something more user-facing for CPU platforms than the long-form "why it's this way" in both 314 and the docs here.

iximeow · 2026-06-06T01:00:50Z

in parallel with review here I should be able to boot an Ubuntu 22.04 guest with this profile and see it boot (whereas udevadmd and others segfault in libc on boot under amd_turin today). I'll follow up here with some testing notes before looking to merge.

hawkw · 2026-06-06T01:02:17Z

+    // We're free to switch this instance from the initial AmdTurin to AmdTurinV2, which has
+    // identical placement constraints and should land on the same simulated Turin sled.


thanks for this!

…n-v2

david-crespo · 2026-06-09T15:01:49Z

          },
          {
-            "description": "An AMD Turin-like CPU platform.",
+            "description": "An AMD Turin-like CPU platform. Prefer `AmdTurinV2` over this; this CPU platform is retained for instances that specifically requested it before `AmdTurinV2` was added.\n\nThis initial version of the Turin CPU platform includes no cache or TLB information in CPUID leaf `8000_0006`. While this was intentional, Oxide later discovered some guest software interprets the zeroed leaves as reporting cache sizes of 0 bytes and behaves incorrectly and unpredictably as a result (see [Propolis#1152](https://github.com/oxidecomputer/propolis/issues/1152)).",


Since the API string is amd_turin_v2, it might be marginally better to refer to that in the docstring instead of AmdTurinV2, even though it will look weird in the Rust code.

oh yeah I was wondering how the identifier shows up in generated clients, if it's typical to get the wire-format amd_turin_v2 versus AmdTurinV2. definitely if the snake-case form is what you usually see, I'll switch that.

david-crespo

API and docs LGTM

hawkw

oh, whoops, looks like the branch merged while i was in the midst of the review, my bad

hawkw · 2026-06-09T16:06:46Z

 use crate::SledCpuFamily;

 use super::impl_enum_type;
+use nexus_types::external_api::instance as instance_api;


style nit, not a big deal: I don't think we're particularly consistent about this, but elsewhere, I feel like I've often seen these imports renamed to external, instead, so that you can reference it as external::InstanceCpuPlatform. but instance_api is fine too, and doesn't particularly bother me.

I'd looked for use nexus_types::external_api:: in nexus/db-model/src and saw this pattern, though using external does seem like a nice name to use for the reason you describe.. either way, I think you might be thinking about a different part of Nexus maybe?

hawkw · 2026-06-09T16:07:50Z

+        method = GET,
+        path = "/v1/instances/{instance}",
+        tags = ["instances"],
+        versions = VERSION_EXTERNAL_JUMBO_FRAMES..VERSION_INSTANCE_CPU_TYPE_TURIN_V2,


sigh i wish there was a way to not have this line be so long but there isn't. sigh.

hawkw · 2026-06-09T16:08:53Z

+    // actual hardware a guest is placed on. We *must not* try to populate this
+    // based on an actual sled selection: this would either limit migration to
+    // same-or-better caches *or* make migration liable to having CPUID lie
+    // w.r.t actual hardware. Instead, we'll assume that future caches are at


super-duper ultra nitpick (sorry)

Suggested change

// w.r.t actual hardware. Instead, we'll assume that future caches are at

// w.r.t. actual hardware. Instead, we'll assume that future caches are at

hawkw · 2026-06-09T16:09:15Z


+pub fn turin_v2(vcpu_count: u16) -> Result<CpuIdDump, CpuPlatformError> {
+    // Turin V2 is a result of discovering that some guest software does depend
+    // on cache size information in AMD CPUID leaf 8000_0006. See


nit, sorry: maybe should make sure to be explicit that this is hex?

Suggested change

// on cache size information in AMD CPUID leaf 8000_0006. See

// on cache size information in AMD CPUID leaf 8000_0006h. See

hawkw · 2026-06-09T16:09:38Z

+pub fn turin_v2(vcpu_count: u16) -> Result<CpuIdDump, CpuPlatformError> {
+    // Turin V2 is a result of discovering that some guest software does depend


up to you, but i feel like this whole comment could be a doc comment on the turin_v2 function?

hawkw · 2026-06-09T16:16:52Z

    source
 }

+pub fn turin_v2(vcpu_count: u16) -> Result<CpuIdDump, CpuPlatformError> {


This should probably include a doc comment saying that vcpu_count must be <= 4096, since it will currently accept any u16? Though, again, in practice this won't matter because we will reject anything over 254 vCPUs several layers above actually making the CPUID values.

(also, it occurs to me that this could be NonZeroU16 if you also wanted to ensure we weren't trying to make a nonsensical 0-vCPU VM?)

hawkw · 2026-06-09T16:18:03Z

+    // Gee, it sure is curious that these sizes are *lower* than Milan, huh?
+    // The AMD APM as of 3.37 (2025-07-02) describes the {I,D}TLB sizes as "The
+    // value returned is for the number of entries available for the 2 MB page
+    // size", but Zen 5 hardware reports 0x40 and 0x80, below. Surely the TLB
+    // sizes are not actually 64/128 entries? Correct! AMD marketing material
+    // describes the L2 TLB as having 2048 entries. That means the hardware
+    // values are scaled by 32 on Zen 5. (And PPRs agree with this.)


thanks for writing this down <3

hawkw · 2026-06-09T16:19:11Z

+    // L3 size is the really tricky one. The big comment at the top of this
+    // function talks about why *this* math.
+    const L3_KB_PER_CCD: u32 = 32 * 1024;
+    const VCPU_PER_CCD: u32 = 16;


this is where we assume that everything is "normal Turin" and over/under-report on the dense and high-cache F SKUs respectively. Maybe worth a note here?

hawkw · 2026-06-09T16:19:47Z

+
+    cpuid
+        .set_l2_l3_cache_and_tlb_info(Some(leaf))
+        .expect("can set leaf 8000_0006h");


take it or leave it: since we now return an error from this function, maybe this could also return an error?

hawkw · 2026-06-09T16:20:29Z

+    // effectively a limit of 4096 vCPUs.
+    if leaf_edx_l3_size_scaled >= (1 << 14) {
+        return Err(CpuPlatformError::OutOfRange {
+            leaf: 8000_0006,


oh, hey, this is actually a bug (though not a particularly bad one): we are reporting the wrong leaf in the error here, since this is the decimal number:

Suggested change

leaf: 8000_0006,

leaf: 0x8000_0006,

iximeow added 2 commits June 6, 2026 00:31

important typo

f4e1481

iximeow added the virtualization Propolis Integration & VM Management label Jun 6, 2026

iximeow commented Jun 6, 2026

View reviewed changes

hawkw reviewed Jun 6, 2026

View reviewed changes

schema version does need to bump in a migration + openapi docs

0503153

iximeow force-pushed the ixi/cpu-platform-turin-v2 branch from 643f537 to 0503153 Compare June 8, 2026 15:08

morlandi7 added this to the 20 milestone Jun 8, 2026

citrus-it approved these changes Jun 8, 2026

View reviewed changes

Comment thread nexus/src/app/instance_platform/cpu_platform.rs Outdated

external api version bump too

a27450b

hawkw reviewed Jun 8, 2026

View reviewed changes

Comment thread common/src/api/external/mod.rs Outdated

Comment thread nexus/src/app/instance_platform/cpu_platform.rs Outdated

Comment thread nexus/src/app/instance_platform/cpu_platform.rs

Comment thread nexus/src/app/instance_platform/cpu_platform.rs

hawkw approved these changes Jun 8, 2026

View reviewed changes

iximeow added 8 commits June 8, 2026 18:02

working at external API bits

cbbf85f

Merge remote-tracking branch 'github/main' into ixi/cpu-platform-turi…

92f4022

…n-v2

external api :3

a7b9a84

and openapi

087bd6c

rewording comments

851fbd3

tests should build too, yes

e31fa2d

more wordsmithing (and dead code elim)

7167ca0

hey guys! guess who just got yellllled aaat! (by openapi-check)

07578fd

iximeow mentioned this pull request Jun 9, 2026

helios/deploy failed in CI: collections.len() != 1 #10573

Open

thought i got that but i guess i did not got that

5866801

david-crespo reviewed Jun 9, 2026

View reviewed changes

AmdTurinV2 -> amd_turin_v2 in docs

0385a5c

david-crespo approved these changes Jun 9, 2026

View reviewed changes

iximeow merged commit 456b293 into main Jun 9, 2026
20 checks passed

iximeow deleted the ixi/cpu-platform-turin-v2 branch June 9, 2026 16:17

hawkw reviewed Jun 9, 2026

View reviewed changes

		// We're free to switch this instance from the initial AmdTurin to AmdTurinV2, which has
		// identical placement constraints and should land on the same simulated Turin sled.

	// w.r.t actual hardware. Instead, we'll assume that future caches are at
	// w.r.t. actual hardware. Instead, we'll assume that future caches are at

	// on cache size information in AMD CPUID leaf 8000_0006. See
	// on cache size information in AMD CPUID leaf 8000_0006h. See

		pub fn turin_v2(vcpu_count: u16) -> Result<CpuIdDump, CpuPlatformError> {
		// Turin V2 is a result of discovering that some guest software does depend

Conversation

iximeow commented Jun 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iximeow commented Jun 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

david-crespo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hawkw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants