Skip to content

>4GB offsets in MCDataFragment in llvm-21 (llvm-dwp OOM) #168923

@MatzeB

Description

@MatzeB

We are having trouble with llvm-dwp running out of memory in llvm-21 for big builds resulting in giant dwp files where some sections are > 4GB in size. This makes creating a reproducer a bit tricky, but I'll describe what we are seeing:

It looks like recent MC layer rewrites, like 9beb467 introduce fields like uint32_t ContentStart to MCEncodedFragment. And my understanding is that in our case llvm-dwp ends up creating a single fragment bigger than 4GB after repeatet emitBytes calls here: https://github.com/llvm/llvm-project/blob/main/llvm/lib/DWP/DWP.cpp#L860
(The OOM situation seems to be accidental by ContentEnd overflowing causing MCEncodedFragment::getContentsForAppending to wrongly grow the contents vector to arbitrary size).

The code seems to have changed lately but I am still seeing various 32bit offsets in the fragment code. Should we increase those fields to 64bits or better find a way to not write everything into a single fragment in this use case?

CC @MaskRay

Metadata

Metadata

Assignees

Labels

llvm:mcMachine (object) code

Type

No type

Projects

Status

Needs Triage

Relationships

None yet

Development

No branches or pull requests

Issue actions