Add an AmdTurinV2 CPU platform#10560
Conversation
As chronicled in oxidecomputer/propolis#1152, the hopes and dreams we had in RFD 314 of avoiding reporting cache information to guests have been dashed by a (since-fixed!) glibc bug. So, in support of guest software, add cache information into the AmdTurin CPU platform series. The older AmdMilan platform reflects what guests happened to see when moving to CPU platforms, meaning every AmdMilan guest believes itself to have a 256MiB L3. Given the Turin V2 strategy for describing L3 here, we probably should go back and add a new rev of the Milan platform that does similar. Note that since Propolis#940 is still an open issue (until I've got the broader virtual platform work up and landed), a VM with an AmdTurinV2 CPU platform will think itself to have vCPU-many more-nicely-sized L3s, much like how an AmdMilan guest also believes itself to have vCPU-many 256MiB L3 caches. There isn't anything to do to address this in the Nexus-side CPUID leaf generation, since the topology that's incorrect is specialized by Propolis per-cpu when setting up the VM, so we can't help that issue along the way here.
There was a problem hiding this comment.
since I'm in here fiddling with API docs: docs issue 773 gets into this more but:
- RFD 314 is not published, so this is just confusing in the API docs
- RFD 314 really only talks about how we went from "all the hardware features" to "
AmdMilan".
cpu_platform.rs is (intended) to be more of a living document where we can describe why we did or didn't set up features in new platforms. that's also why there's so much elaboration in turin_v2 for a "small" change. so, strike the RFD mention and in the future we'll want something more user-facing for CPU platforms than the long-form "why it's this way" in both 314 and the docs here.
|
in parallel with review here I should be able to boot an Ubuntu 22.04 guest with this profile and see it boot (whereas udevadmd and others segfault in libc on boot under |
| // We're free to switch this instance from the initial AmdTurin to AmdTurinV2, which has | ||
| // identical placement constraints and should land on the same simulated Turin sled. |
643f537 to
0503153
Compare
| }, | ||
| { | ||
| "description": "An AMD Turin-like CPU platform.", | ||
| "description": "An AMD Turin-like CPU platform. Prefer `AmdTurinV2` over this; this CPU platform is retained for instances that specifically requested it before `AmdTurinV2` was added.\n\nThis initial version of the Turin CPU platform includes no cache or TLB information in CPUID leaf `8000_0006`. While this was intentional, Oxide later discovered some guest software interprets the zeroed leaves as reporting cache sizes of 0 bytes and behaves incorrectly and unpredictably as a result (see [Propolis#1152](https://github.com/oxidecomputer/propolis/issues/1152)).", |
There was a problem hiding this comment.
Since the API string is amd_turin_v2, it might be marginally better to refer to that in the docstring instead of AmdTurinV2, even though it will look weird in the Rust code.
There was a problem hiding this comment.
oh yeah I was wondering how the identifier shows up in generated clients, if it's typical to get the wire-format amd_turin_v2 versus AmdTurinV2. definitely if the snake-case form is what you usually see, I'll switch that.
hawkw
left a comment
There was a problem hiding this comment.
oh, whoops, looks like the branch merged while i was in the midst of the review, my bad
| use crate::SledCpuFamily; | ||
|
|
||
| use super::impl_enum_type; | ||
| use nexus_types::external_api::instance as instance_api; |
There was a problem hiding this comment.
style nit, not a big deal: I don't think we're particularly consistent about this, but elsewhere, I feel like I've often seen these imports renamed to external, instead, so that you can reference it as external::InstanceCpuPlatform. but instance_api is fine too, and doesn't particularly bother me.
There was a problem hiding this comment.
I'd looked for use nexus_types::external_api:: in nexus/db-model/src and saw this pattern, though using external does seem like a nice name to use for the reason you describe.. either way, I think you might be thinking about a different part of Nexus maybe?
| method = GET, | ||
| path = "/v1/instances/{instance}", | ||
| tags = ["instances"], | ||
| versions = VERSION_EXTERNAL_JUMBO_FRAMES..VERSION_INSTANCE_CPU_TYPE_TURIN_V2, |
There was a problem hiding this comment.
sigh i wish there was a way to not have this line be so long but there isn't. sigh.
| // actual hardware a guest is placed on. We *must not* try to populate this | ||
| // based on an actual sled selection: this would either limit migration to | ||
| // same-or-better caches *or* make migration liable to having CPUID lie | ||
| // w.r.t actual hardware. Instead, we'll assume that future caches are at |
There was a problem hiding this comment.
super-duper ultra nitpick (sorry)
| // w.r.t actual hardware. Instead, we'll assume that future caches are at | |
| // w.r.t. actual hardware. Instead, we'll assume that future caches are at |
|
|
||
| pub fn turin_v2(vcpu_count: u16) -> Result<CpuIdDump, CpuPlatformError> { | ||
| // Turin V2 is a result of discovering that some guest software does depend | ||
| // on cache size information in AMD CPUID leaf 8000_0006. See |
There was a problem hiding this comment.
nit, sorry: maybe should make sure to be explicit that this is hex?
| // on cache size information in AMD CPUID leaf 8000_0006. See | |
| // on cache size information in AMD CPUID leaf 8000_0006h. See |
| pub fn turin_v2(vcpu_count: u16) -> Result<CpuIdDump, CpuPlatformError> { | ||
| // Turin V2 is a result of discovering that some guest software does depend |
There was a problem hiding this comment.
up to you, but i feel like this whole comment could be a doc comment on the turin_v2 function?
| source | ||
| } | ||
|
|
||
| pub fn turin_v2(vcpu_count: u16) -> Result<CpuIdDump, CpuPlatformError> { |
There was a problem hiding this comment.
This should probably include a doc comment saying that vcpu_count must be <= 4096, since it will currently accept any u16? Though, again, in practice this won't matter because we will reject anything over 254 vCPUs several layers above actually making the CPUID values.
There was a problem hiding this comment.
(also, it occurs to me that this could be NonZeroU16 if you also wanted to ensure we weren't trying to make a nonsensical 0-vCPU VM?)
| // Gee, it sure is curious that these sizes are *lower* than Milan, huh? | ||
| // The AMD APM as of 3.37 (2025-07-02) describes the {I,D}TLB sizes as "The | ||
| // value returned is for the number of entries available for the 2 MB page | ||
| // size", but Zen 5 hardware reports 0x40 and 0x80, below. Surely the TLB | ||
| // sizes are not actually 64/128 entries? Correct! AMD marketing material | ||
| // describes the L2 TLB as having 2048 entries. That means the hardware | ||
| // values are scaled by 32 on Zen 5. (And PPRs agree with this.) |
| // L3 size is the really tricky one. The big comment at the top of this | ||
| // function talks about why *this* math. | ||
| const L3_KB_PER_CCD: u32 = 32 * 1024; | ||
| const VCPU_PER_CCD: u32 = 16; |
There was a problem hiding this comment.
this is where we assume that everything is "normal Turin" and over/under-report on the dense and high-cache F SKUs respectively. Maybe worth a note here?
|
|
||
| cpuid | ||
| .set_l2_l3_cache_and_tlb_info(Some(leaf)) | ||
| .expect("can set leaf 8000_0006h"); |
There was a problem hiding this comment.
take it or leave it: since we now return an error from this function, maybe this could also return an error?
| // effectively a limit of 4096 vCPUs. | ||
| if leaf_edx_l3_size_scaled >= (1 << 14) { | ||
| return Err(CpuPlatformError::OutOfRange { | ||
| leaf: 8000_0006, |
There was a problem hiding this comment.
oh, hey, this is actually a bug (though not a particularly bad one): we are reporting the wrong leaf in the error here, since this is the decimal number:
| leaf: 8000_0006, | |
| leaf: 0x8000_0006, |
As chronicled in oxidecomputer/propolis#1152, the hopes and dreams we had in RFD 314 of avoiding reporting cache information to guests have been dashed by a (since-fixed!) glibc bug.
So, in support of guest software, add cache information into the AmdTurin CPU platform series.
The older AmdMilan platform reflects what guests happened to see when moving to CPU platforms, meaning every AmdMilan guest believes itself to have a 256MiB L3. Given the Turin V2 strategy for describing L3 here, we probably should go back and add a new rev of the Milan platform that does similar.
Note that since Propolis#940 is still an open issue (until I've got the broader virtual platform work up and landed), a VM with an AmdTurinV2 CPU platform will think itself to have vCPU-many more-nicely-sized L3s, much like how an AmdMilan guest also believes itself to have vCPU-many 256MiB L3 caches. There isn't anything to do to address this in the Nexus-side CPUID leaf generation, since the topology that's incorrect is specialized by Propolis per-cpu when setting up the VM, so we can't help that issue along the way here.