Emit DIE's size in bits when size is not a multiple of 8 #69741

augusto2112 · 2023-10-20T17:24:21Z

The existing code will always round down the size in bits when
calculating the size in bytes (for example, a types with 1-7 bits will
be emitted as 0 bytes long). Fix this by emitting the size in bits if
the size is not aligned to a byte.

llvmbot · 2023-10-20T17:25:33Z

@llvm/pr-subscribers-lldb

@llvm/pr-subscribers-debuginfo

Author: Augusto Noronha (augusto2112)

Changes

The existing code will always round down the size in bits when calculating the size in bytes (for example, a types with 1-7 bits will be emitted as 0 bytes long). Fix this by always rounding up, which would be the minimum amount of bytes to fit the size in bits.

Full diff: https://github.com/llvm/llvm-project/pull/69741.diff

1 Files Affected:

(modified) llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp (+1-1)

diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
index d30f0ef7af348af..b61660b189236fb 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
@@ -874,7 +874,7 @@ void DwarfUnit::constructTypeDIE(DIE &Buffer, const DICompositeType *CTy) {
   // Add name if not anonymous or intermediate type.
   StringRef Name = CTy->getName();
 
-  uint64_t Size = CTy->getSizeInBits() >> 3;
+  uint64_t Size = (CTy->getSizeInBits() + 7) >> 3;
   uint16_t Tag = Buffer.getTag();
 
   switch (Tag) {

dwblaikie · 2023-10-20T17:58:07Z

Please include a test case (& perhaps soem motivation - are there any targets that actually have non-byte-size types? Perhaps we should disallow that in the verifier instead?)

augusto2112 · 2023-10-20T18:10:40Z

Just added some tests!

Motivation wise: Swift has builtin types which are non-byte-size. For example, the builtin that the standard boolean wraps is 1 bit long.

dwblaikie · 2023-10-20T18:41:24Z

Just added some tests!

Thanks!

Motivation wise: Swift has builtin types which are non-byte-size. For example, the builtin that the standard boolean wraps is 1 bit long.

Hmm, I'm still not sure about this - if the DWARF is describing it as being 1 byte long, presumably that ABI has to actually allocate a byte for this type? So in what sense is the type only 1 bit long?

adrian-prantl · 2023-10-23T20:26:37Z

Just added some tests!

Thanks!

Motivation wise: Swift has builtin types which are non-byte-size. For example, the builtin that the standard boolean wraps is 1 bit long.

Hmm, I'm still not sure about this - if the DWARF is describing it as being 1 byte long, presumably that ABI has to actually allocate a byte for this type? So in what sense is the type only 1 bit long?

Swift allows packing values in aggregates under certain circumstances. For example the type Bool? a.k.a. Optional<Bool> stores the discriminator bit of the Optional enum type in bit #1 and the Bool in bit #0 of the same byte. For these applications the bit size is relevant.

dwblaikie · 2023-11-04T17:19:18Z

Just added some tests!

Thanks!

Motivation wise: Swift has builtin types which are non-byte-size. For example, the builtin that the standard boolean wraps is 1 bit long.

Hmm, I'm still not sure about this - if the DWARF is describing it as being 1 byte long, presumably that ABI has to actually allocate a byte for this type? So in what sense is the type only 1 bit long?

Swift allows packing values in aggregates under certain circumstances. For example the type Bool? a.k.a. Optional<Bool> stores the discriminator bit of the Optional enum type in bit #1 and the Bool in bit #0 of the same byte. For these applications the bit size is relevant.

Ah - this sounds like a different thing though - like structs with tail padding that derived classes can use, we don't describe the base class with tail padding as being smaller.

I guess maybe a different direction to discuss: I think it's useful for the DWARF output to match the IR metadata. The fact that we encode the size in bits instead of bytes is perhaps a mistake/certainly opens things up to potential variation between the bit size encoding and the byte size encoding..

But what does the non-rounded-up/storage bit size being in the IR metadata provide, if the result in the DWARF is rounded up anyway? Are we going to encode both the storage size and the bit size separately? Using what attributes? and/or Is the IR metadata consumed by something else that uses the bit-size?

adrian-prantl · 2023-11-07T22:04:11Z

@augusto2112 Would emitting a DW_AT_bit_size be a reasonable thing to do here?

augusto2112 · 2023-11-08T00:17:31Z

@adrian-prantl would we always emit bit sizes in that case?

adrian-prantl · 2023-11-08T17:53:57Z

I'm thinking: if (sizeInBits % 8) emitBitSize(sizeInBits) else emitSize(sizeInBits/8)

dwblaikie · 2023-11-08T18:34:46Z

I'm still a bit confused - it seems like the actual size (like what happens if you create an array of these things? I assume it's size is actually in bytes, right?) is still in bytes. But like tail padding in C++ - sometimes derived objects can use some of that padding/size for other stuff too. It doesn't mean the base class is smaller - it's got to be a whole byte for alignment reasons, etc, but the padding bits can be reused.

I think it'd be suitable to model tihs situation more like that situation - where the size is the storage size, but there's some way to describe the padding bits and allow reuse of those.

augusto2112 · 2023-11-08T20:36:19Z

I'm still a bit confused - it seems like the actual size (like what happens if you create an array of these things? I assume it's size is actually in bytes, right?) is still in bytes. But like tail padding in C++ - sometimes derived objects can use some of that padding/size for other stuff too. It doesn't mean the base class is smaller - it's got to be a whole byte for alignment reasons, etc, but the padding bits can be reused.

Yes, the actual size is still in bytes. I think I will emit the correct size in the front end instead, but I think the current implementation that silently rounds down the bits to the bytes is wrong, so we should do something about that separately.

dwblaikie · 2023-11-09T16:53:44Z

I'm still a bit confused - it seems like the actual size (like what happens if you create an array of these things? I assume it's size is actually in bytes, right?) is still in bytes. But like tail padding in C++ - sometimes derived objects can use some of that padding/size for other stuff too. It doesn't mean the base class is smaller - it's got to be a whole byte for alignment reasons, etc, but the padding bits can be reused.

Yes, the actual size is still in bytes. I think I will emit the correct size in the front end instead, but I think the current implementation that silently rounds down the bits to the bytes is wrong, so we should do something about that separately.

Yeah - not sure why we even encode it as bits. Maybe we use the bit size for bitfields? A verifier check that the size is a multiple of 8 in cases other than bitfields might be suitable?

augusto2112 · 2023-11-10T00:40:48Z

@dwblaikie I talked with Adrian, his idea is to emit the size in bits whenever it does not cleanly fit in a byte. His point (and I agree) is that there may be tools that can use this bit size, and if we always round up when emitting the DWARF we will lose this information, with no way to recover it. I updated the patch to implement this behavior.

github-actions · 2023-11-10T00:48:27Z

✅ With the latest revision this PR passed the C/C++ code formatter.

lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp

Michael137 · 2023-11-10T05:56:13Z

lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp

+    case DW_AT_bit_size:
+      // Convert the bit size to byte size, and round it up to the minimum about
+      // of bytes that will fit the bits.
+      byte_size = (form_value.Unsigned() + 7) / 8;


Completely out of scope of the PR but wonder if we should introduce something in llvm/Support/MathExtras.h that does this for us.

clang-tools-extra/test/clang-tidy/checkers/bugprone/branch-clone.cpp:969: int n = (count + 7) / 8; clang/test/SemaCXX/constexpr-duffs-device.cpp:6: unsigned long n = (count + 7) / 8; clang/lib/CodeGen/Targets/X86.cpp:2988: uint64_t SizeInBytes = (CGF.getContext().getTypeSize(Ty) + 7) / 8; flang/runtime/edit-input.cpp:84: auto significantBytes{static_cast<std::size_t>(digits * LOG2_BASE + 7) / 8}; lldb/source/Core/ValueObjectChild.cpp:172: (bitfield_end - *type_bit_size + 7) / 8; lldb/source/Core/DumpDataExtractor.cpp:674: (llvm::APFloat::getSizeInBits(semantics) + 7) / 8; lldb/source/Plugins/ExpressionParser/Clang/IRForTarget.cpp:1044: lldb::offset_t value_alignment = (*opt_alignment + 7ull) / 8ull; lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.h:816: return (*bit_size + 7) / 8; lldb/source/Symbol/CompilerType.cpp:561: return (*bit_size + 7) / 8; lldb/source/Expression/Materializer.cpp:554: size_t byte_align = (*opt_bit_align + 7) / 8; lldb/source/Expression/Materializer.cpp:952: size_t byte_align = (*opt_bit_align + 7) / 8; lldb/source/Utility/Scalar.cpp:118: StoreIntToMemory(val, storage.data(), (val.getBitWidth() + 7) / 8); llvm/include/llvm/CodeGen/MachineValueType.h:354: return {(BaseSize.getKnownMinValue() + 7) / 8, BaseSize.isScalable()}; llvm/include/llvm/CodeGen/TargetLowering.h:3466: uint32_t CondCodeActions[ISD::SETCC_INVALID][(MVT::VALUETYPE_SIZE + 7) / 8]; llvm/include/llvm/CodeGen/ValueTypes.h:375: return {(BaseSize.getKnownMinValue() + 7) / 8, BaseSize.isScalable()}; llvm/include/llvm/CodeGen/LowLevelType.h:187: return {(BaseSize.getKnownMinValue() + 7) / 8, BaseSize.isScalable()}; llvm/utils/TableGen/AsmWriterEmitter.cpp:407: unsigned BytesNeeded = ((OpcodeInfoBits - BitsLeft) + 7) / 8; llvm/lib/Analysis/ConstantFolding.cpp:598: unsigned BytesLoaded = (IntType->getBitWidth() + 7) / 8; llvm/lib/Target/M68k/M68kISelLowering.cpp:736: uint32_t OpSize = (VA.getLocVT().getSizeInBits() + 7) / 8; llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp:1834: size_t NumBytes = (Val.getBitWidth() + 7) / 8; llvm/lib/Target/Lanai/MCTargetDesc/LanaiAsmBackend.cpp:101: unsigned NumBytes = (getFixupKindInfo(Kind).TargetSize + 7) / 8; llvm/lib/Target/PowerPC/PPCISelLowering.cpp:5129: uint32_t OpSize = (Arg.getValueSizeInBits() + 7) / 8; llvm/lib/Target/Mips/MCTargetDesc/MipsAsmBackend.cpp:258: unsigned NumBytes = (getFixupKindInfo(Kind).TargetSize + 7) / 8; llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:1360: const unsigned MemSize = (Ty.getSizeInBits() + 7) / 8; llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:6441: const unsigned MemSize = (Size + 7) / 8; llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCAsmBackend.cpp:183: unsigned Size = (BitSize + 7) / 8; llvm/lib/Target/AArch64/AArch64FastISel.cpp:3087: unsigned ArgSize = (ArgVT.getSizeInBits() + 7) / 8; llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:6537: unsigned NumRegs = (Size + 7) / 8; llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:7700: OpSize = (OpSize + 7) / 8; llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:7451: unsigned SizeInBytes = (Ty.getScalarSizeInBits() + 7) / 8; llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:19493: PtrOff = (BitWidth + 7 - NewBW) / 8 - PtrOff; llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:28001: Intervals.insert(0, (St->getMemoryVT().getSizeInBits() + 7) / 8, llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:28021: int64_t Length = (Chain->getMemoryVT().getSizeInBits() + 7) / 8; llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp:284: OffsetByte = -((AllocBefore + 7) / 8 + (BitWidth + 7) / 8); llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp:291: Target.setBeforeBytes(AllocBefore, (BitWidth + 7) / 8); llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp:301: OffsetByte = (AllocAfter + 7) / 8; llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp:308: Target.setAfterBytes(AllocAfter, (BitWidth + 7) / 8); llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp:1859: (AllocBefore + 7) / 8 - Target.allocatedBeforeBytes() - 1, 0); llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp:1861: (AllocAfter + 7) / 8 - Target.allocatedAfterBytes() - 1, 0); llvm/lib/Transforms/Utils/VNCoercion.cpp:308: (DL.getTypeSizeInBits(SrcVal->getType()).getFixedValue() + 7) / 8; llvm/lib/Transforms/Utils/VNCoercion.cpp:309: uint64_t LoadSize = (DL.getTypeSizeInBits(LoadTy).getFixedValue() + 7) / 8; llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:1112: return Log2_32_Ceil((TypeSizeFixed + 7) / 8); mlir/lib/IR/Operation.cpp:160: propertiesStorageSize((fullPropertiesStorageSize + 7) / 8), name(name) {

Oh kind-of-related discourse post from last year: https://discourse.llvm.org/t/rfc-proper-low-level-abstractions-to-llvm-bits-bytes-addresses-and-masks/63081

Looks like it was never resolved

lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp

Michael137 · 2023-11-10T06:33:15Z

lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp

+    parent_bit_size = parent_byte_size * 8;
+  else
+    parent_bit_size =
+        parent_die.GetAttributeValueAsUnsigned(DW_AT_bit_size, UINT64_MAX);


Do we already have tests in LLDB that cover this?

Hmm, I'll definitely have a downstream test that exercises this behavior, but I don't think on the clang side this would be easy to test, do you have any suggestions on how to do that?

DWARFASTParserClang has unittests where we sometimes use Yaml2Obj and directly invoke the parser. But it's quite finicky. Let me know if you run into troubles with it

The existing code will always round down the size in bits when calculating the size in bytes (for example, a types with 1-7 bits will be emitted as 0 bytes long). Fix this by emitting the size in bits if the size is not aligned to a byte.

dwblaikie · 2023-11-14T01:20:38Z

@dwblaikie I talked with Adrian, his idea is to emit the size in bits whenever it does not cleanly fit in a byte. His point (and I agree) is that there may be tools that can use this bit size, and if we always round up when emitting the DWARF we will lose this information, with no way to recover it. I updated the patch to implement this behavior.

Yeah - ultimately it's pretty much up to you folks how you encode Swift and ObjC things - so take anything I'm saying as advice but not requirement.

But it feels a bit at-odds with precedent in the way other things that seem similar to me are encoded in DWARF already, and precedent is pretty much most/all we have to go with in DWARF.

I guess one question that might be relevant - does Swift have something like sizeof and what result does it give for these sort of types with bits to spare?

But like I said - it seems like structs with tail padding are similar to this situation - we still describe the whole size of the struct, because that's used for creating arrays of instances, ABI passing, etc. But the tail padding can still be used, in certain circumstances, when laying out a derived class. We encode this as the POD-ness of the type, and so if you wanted to create a class that derived from one described in DWARF you could do so & would know whether or not to put the derived class's members into the tail padding of the base or not.

augusto2112 · 2023-11-14T23:31:47Z

I guess one question that might be relevant - does Swift have something like sizeof and what result does it give for these sort of types with bits to spare?

You can't actually use that with these types as these are special compiler builtin types which aren't actually accessible in source code.

But like I said - it seems like structs with tail padding are similar to this situation - we still describe the whole size of the struct, because that's used for creating arrays of instances, ABI passing, etc. But the tail padding can still be used, in certain circumstances, when laying out a derived class. We encode this as the POD-ness of the type, and so if you wanted to create a class that derived from one described in DWARF you could do so & would know whether or not to put the derived class's members into the tail padding of the base or not.

I understand the rationale of basing this on precedent, but in this case in this case we should break from it for two reasons:

DW_AT_BIT_SIZE is already a standardized attribute in Dwarf that fits this use case.
Round up to the nearest byte would lose information, which can be kept with fairly minimal downsides in my opinion.

dwblaikie · 2023-11-14T23:50:09Z

I guess one question that might be relevant - does Swift have something like sizeof and what result does it give for these sort of types with bits to spare?

You can't actually use that with these types as these are special compiler builtin types which aren't actually accessible in source code.

Perhaps observable indirectly?

But like I said - it seems like structs with tail padding are similar to this situation - we still describe the whole size of the struct, because that's used for creating arrays of instances, ABI passing, etc. But the tail padding can still be used, in certain circumstances, when laying out a derived class. We encode this as the POD-ness of the type, and so if you wanted to create a class that derived from one described in DWARF you could do so & would know whether or not to put the derived class's members into the tail padding of the base or not.

I understand the rationale of basing this on precedent, but in this case in this case we should break from it for two reasons:

DW_AT_BIT_SIZE is already a standardized attribute in Dwarf that fits this use case.

I'm arguing it doesn't fit it particularly well. We use it for bit fields - which are pretty special, for instance, but it seems like this thing isn't quite like that - it does have a whole byte size (if you allocated an array of them, for instance, I'm guessing they're a byte each, right?) but then has some padding bits that can be reused in some circumstances? That's why I'm saying it seems like it fits more closely to the struct padding representation.

Round up to the nearest byte would lose information, which can be kept with fairly minimal downsides in my opinion.

Seems like it'd still need to be special cased, right? The consumer would go "oh, this has a bit size, but if we want an array of them, or to allocate them for ABI purposes, etc, I have to round it up to the nearest byte"? or something like that.

Some pointers to documentation about these types, and the range of uses/instances there are might be handy (like is this a general concept? Or is it only one type that uses this (bool equivalent, with 7 padding bytes unused) or a class of types (a small finite list of them? Unbounded (like if I put a bool in my custom struct - does my custom struct end up with a bit size too?))

adrian-prantl · 2023-11-16T01:33:11Z

I'm arguing it doesn't fit it particularly well. We use it for bit fields - which are pretty special, for instance, but it seems like this thing isn't quite like that - it does have a whole byte size (if you allocated an array of them, for instance, I'm guessing they're a byte each, right?) but then has some padding bits that can be reused in some circumstances? That's why I'm saying it seems like it fits more closely to the struct padding representation.

Swift is really clever at packing at packing aggregate types. For example, the discriminator bits for enums are always stored in unused bits of the payload type. For a contrived example, the type Optional<Optional> has a size of 3 bits.

dwblaikie · 2023-11-16T19:41:04Z

I'm arguing it doesn't fit it particularly well. We use it for bit fields - which are pretty special, for instance, but it seems like this thing isn't quite like that - it does have a whole byte size (if you allocated an array of them, for instance, I'm guessing they're a byte each, right?) but then has some padding bits that can be reused in some circumstances? That's why I'm saying it seems like it fits more closely to the struct padding representation.

Swift is really clever at packing at packing aggregate types. For example, the discriminator bits for enums are always stored in unused bits of the payload type. For a contrived example, the type Optional has a size of 3 bits.

Sure enough - C++ tail padding does similar things: https://godbolt.org/z/4jsYWK6ev - where t2's member is packed into the tail padding of t1, but still leaves more tail padding for t3. (it's not identical, of course - it's somewhere between C++ tail padding and the ad-hoc stuff we have in LLVM for PointerIntPair, where multiple of those can be nested together and use the remaining bits for another element)

llvmbot added the debuginfo label Oct 20, 2023

augusto2112 force-pushed the fix-size-in-bytes branch from 0582dc1 to 426fe25 Compare October 20, 2023 18:06

augusto2112 requested a review from dwblaikie November 3, 2023 23:10

augusto2112 force-pushed the fix-size-in-bytes branch from 426fe25 to b34b8f1 Compare November 10, 2023 00:38

augusto2112 requested a review from JDevlieghere as a code owner November 10, 2023 00:38

llvmbot added the lldb label Nov 10, 2023

Michael137 reviewed Nov 10, 2023

View reviewed changes

augusto2112 force-pushed the fix-size-in-bytes branch from b34b8f1 to 997da62 Compare November 10, 2023 18:41

augusto2112 changed the title ~~Fix size in bytes of type DIEs when size in bits is not a multiple of 8~~ Emit DIE's size in bits when size is not a multiple of 8 Nov 10, 2023

augusto2112 requested a review from Michael137 November 10, 2023 18:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emit DIE's size in bits when size is not a multiple of 8 #69741

Emit DIE's size in bits when size is not a multiple of 8 #69741

augusto2112 commented Oct 20, 2023 •

edited

llvmbot commented Oct 20, 2023 •

edited

dwblaikie commented Oct 20, 2023

augusto2112 commented Oct 20, 2023

dwblaikie commented Oct 20, 2023

adrian-prantl commented Oct 23, 2023 •

edited

dwblaikie commented Nov 4, 2023

adrian-prantl commented Nov 7, 2023

augusto2112 commented Nov 8, 2023

adrian-prantl commented Nov 8, 2023

dwblaikie commented Nov 8, 2023

augusto2112 commented Nov 8, 2023

dwblaikie commented Nov 9, 2023

augusto2112 commented Nov 10, 2023

github-actions bot commented Nov 10, 2023 •

edited

Michael137 Nov 10, 2023

Michael137 Nov 10, 2023

Michael137 Nov 10, 2023

augusto2112 Nov 10, 2023

Michael137 Nov 10, 2023

dwblaikie commented Nov 14, 2023

augusto2112 commented Nov 14, 2023

dwblaikie commented Nov 14, 2023

adrian-prantl commented Nov 16, 2023

dwblaikie commented Nov 16, 2023

Emit DIE's size in bits when size is not a multiple of 8 #69741

Are you sure you want to change the base?

Emit DIE's size in bits when size is not a multiple of 8 #69741

Conversation

augusto2112 commented Oct 20, 2023 • edited

llvmbot commented Oct 20, 2023 • edited

dwblaikie commented Oct 20, 2023

augusto2112 commented Oct 20, 2023

dwblaikie commented Oct 20, 2023

adrian-prantl commented Oct 23, 2023 • edited

dwblaikie commented Nov 4, 2023

adrian-prantl commented Nov 7, 2023

augusto2112 commented Nov 8, 2023

adrian-prantl commented Nov 8, 2023

dwblaikie commented Nov 8, 2023

augusto2112 commented Nov 8, 2023

dwblaikie commented Nov 9, 2023

augusto2112 commented Nov 10, 2023

github-actions bot commented Nov 10, 2023 • edited

Michael137 Nov 10, 2023

Choose a reason for hiding this comment

Michael137 Nov 10, 2023

Choose a reason for hiding this comment

Michael137 Nov 10, 2023

Choose a reason for hiding this comment

augusto2112 Nov 10, 2023

Choose a reason for hiding this comment

Michael137 Nov 10, 2023

Choose a reason for hiding this comment

dwblaikie commented Nov 14, 2023

augusto2112 commented Nov 14, 2023

dwblaikie commented Nov 14, 2023

adrian-prantl commented Nov 16, 2023

dwblaikie commented Nov 16, 2023

augusto2112 commented Oct 20, 2023 •

edited

llvmbot commented Oct 20, 2023 •

edited

adrian-prantl commented Oct 23, 2023 •

edited

github-actions bot commented Nov 10, 2023 •

edited