Skip to content

glibc 2.34-2.36 need CPUID leaf 8000_0006 #1152

@iximeow

Description

@iximeow

for glibc in this range, leaf 8000_0006 is consulted to get the processor's L3 cache size, which in turn informs __x86_shared_non_temporal_threshold (seems to me this is set to 3/4 of the processor's L3 in total). in oxidecomputer/omicron#9043 for turin_v1 we report an L3 size of 0 by virtue of having that leaf zeroed. this ends up with glibc trying to copy a very wide region, as AFAICT there is an assumption that if the memmove() size is larger than the non-temporal threshold, that it is also larger than at least one page. so instead, when memmove() of a particular range of sizes is done, glibc gets into the multi-page form of large_memcpy_2x, reads and writes two pages in parallel to try pipelining memory access, and faults (around here) with the second page's "pipeline" acting more like as a probe for an unmapped page..

this is kind of a bug in glibc, tracked and fixed at http://sourceware.org/bugzilla/show_bug.cgi?id=30428, plus https://bugzilla.redhat.com/show_bug.cgi?id=2196271 as their report of who had a zeroed 8000_0005 leaf (XCP-ng, which iiuc is a Xen distribution?). in the Milan CPU profile we populated this leaf with whatever came from the hardware, for compatibility when we moved to explicit profiles, and in a propolis-standalone the issue easily reproduces by punching out leaf 8000_0006 and booting an Ubuntu 22.04 VM.

towards the bottom of the comments in the glibc bug, the glibc patch was backported to 2.37 and 2.38, in addition to having landed in 2.39. so my inference is that in addition to 2.34 in Ubuntu 22.04 having this issue, 2.35 and 2.36 probably readily hit it too.

the goal here was to avoid lying about whatever the actual cache topology is, for wherever a VM has landed, but I was a bit pessimistic that this would survive reality.. Ubuntu 22.04 is in LTS until April 2027, so it seems like we should at least keep this leaf around for that long. so we've got all the questions of "what is the least-lying L3 cache topology we should report?" that we were hoping to avoid..

this was an unfortunate miss in testing oxidecomputer/omicron#9043 a while back, I am certain I'd only tested Ubuntu back to 24.04 (which has glibc 2.39, happily skipping this whole problem). I'm not sure where else glibc 2.34-2.36 might be.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething that isn't working.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions