Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit DIE's size in bits when size is not a multiple of 8 #69741

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

augusto2112
Copy link
Contributor

@augusto2112 augusto2112 commented Oct 20, 2023

The existing code will always round down the size in bits when
calculating the size in bytes (for example, a types with 1-7 bits will
be emitted as 0 bytes long). Fix this by emitting the size in bits if
the size is not aligned to a byte.

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 20, 2023

@llvm/pr-subscribers-lldb

@llvm/pr-subscribers-debuginfo

Author: Augusto Noronha (augusto2112)

Changes

The existing code will always round down the size in bits when calculating the size in bytes (for example, a types with 1-7 bits will be emitted as 0 bytes long). Fix this by always rounding up, which would be the minimum amount of bytes to fit the size in bits.


Full diff: https://github.com/llvm/llvm-project/pull/69741.diff

1 Files Affected:

  • (modified) llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp (+1-1)
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
index d30f0ef7af348af..b61660b189236fb 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
@@ -874,7 +874,7 @@ void DwarfUnit::constructTypeDIE(DIE &Buffer, const DICompositeType *CTy) {
   // Add name if not anonymous or intermediate type.
   StringRef Name = CTy->getName();
 
-  uint64_t Size = CTy->getSizeInBits() >> 3;
+  uint64_t Size = (CTy->getSizeInBits() + 7) >> 3;
   uint16_t Tag = Buffer.getTag();
 
   switch (Tag) {

@dwblaikie
Copy link
Collaborator

Please include a test case (& perhaps soem motivation - are there any targets that actually have non-byte-size types? Perhaps we should disallow that in the verifier instead?)

@augusto2112
Copy link
Contributor Author

Just added some tests!

Motivation wise: Swift has builtin types which are non-byte-size. For example, the builtin that the standard boolean wraps is 1 bit long.

@dwblaikie
Copy link
Collaborator

Just added some tests!

Thanks!

Motivation wise: Swift has builtin types which are non-byte-size. For example, the builtin that the standard boolean wraps is 1 bit long.

Hmm, I'm still not sure about this - if the DWARF is describing it as being 1 byte long, presumably that ABI has to actually allocate a byte for this type? So in what sense is the type only 1 bit long?

@adrian-prantl
Copy link
Collaborator

adrian-prantl commented Oct 23, 2023

Just added some tests!

Thanks!

Motivation wise: Swift has builtin types which are non-byte-size. For example, the builtin that the standard boolean wraps is 1 bit long.

Hmm, I'm still not sure about this - if the DWARF is describing it as being 1 byte long, presumably that ABI has to actually allocate a byte for this type? So in what sense is the type only 1 bit long?

Swift allows packing values in aggregates under certain circumstances. For example the type Bool? a.k.a. Optional<Bool> stores the discriminator bit of the Optional enum type in bit #1 and the Bool in bit #0 of the same byte. For these applications the bit size is relevant.

@dwblaikie
Copy link
Collaborator

Just added some tests!

Thanks!

Motivation wise: Swift has builtin types which are non-byte-size. For example, the builtin that the standard boolean wraps is 1 bit long.

Hmm, I'm still not sure about this - if the DWARF is describing it as being 1 byte long, presumably that ABI has to actually allocate a byte for this type? So in what sense is the type only 1 bit long?

Swift allows packing values in aggregates under certain circumstances. For example the type Bool? a.k.a. Optional<Bool> stores the discriminator bit of the Optional enum type in bit #1 and the Bool in bit #0 of the same byte. For these applications the bit size is relevant.

Ah - this sounds like a different thing though - like structs with tail padding that derived classes can use, we don't describe the base class with tail padding as being smaller.

I guess maybe a different direction to discuss: I think it's useful for the DWARF output to match the IR metadata. The fact that we encode the size in bits instead of bytes is perhaps a mistake/certainly opens things up to potential variation between the bit size encoding and the byte size encoding..

But what does the non-rounded-up/storage bit size being in the IR metadata provide, if the result in the DWARF is rounded up anyway? Are we going to encode both the storage size and the bit size separately? Using what attributes? and/or Is the IR metadata consumed by something else that uses the bit-size?

@adrian-prantl
Copy link
Collaborator

@augusto2112 Would emitting a DW_AT_bit_size be a reasonable thing to do here?

@augusto2112
Copy link
Contributor Author

@adrian-prantl would we always emit bit sizes in that case?

@adrian-prantl
Copy link
Collaborator

I'm thinking: if (sizeInBits % 8) emitBitSize(sizeInBits) else emitSize(sizeInBits/8)

@dwblaikie
Copy link
Collaborator

I'm still a bit confused - it seems like the actual size (like what happens if you create an array of these things? I assume it's size is actually in bytes, right?) is still in bytes. But like tail padding in C++ - sometimes derived objects can use some of that padding/size for other stuff too. It doesn't mean the base class is smaller - it's got to be a whole byte for alignment reasons, etc, but the padding bits can be reused.

I think it'd be suitable to model tihs situation more like that situation - where the size is the storage size, but there's some way to describe the padding bits and allow reuse of those.

@augusto2112
Copy link
Contributor Author

I'm still a bit confused - it seems like the actual size (like what happens if you create an array of these things? I assume it's size is actually in bytes, right?) is still in bytes. But like tail padding in C++ - sometimes derived objects can use some of that padding/size for other stuff too. It doesn't mean the base class is smaller - it's got to be a whole byte for alignment reasons, etc, but the padding bits can be reused.

Yes, the actual size is still in bytes. I think I will emit the correct size in the front end instead, but I think the current implementation that silently rounds down the bits to the bytes is wrong, so we should do something about that separately.

@dwblaikie
Copy link
Collaborator

I'm still a bit confused - it seems like the actual size (like what happens if you create an array of these things? I assume it's size is actually in bytes, right?) is still in bytes. But like tail padding in C++ - sometimes derived objects can use some of that padding/size for other stuff too. It doesn't mean the base class is smaller - it's got to be a whole byte for alignment reasons, etc, but the padding bits can be reused.

Yes, the actual size is still in bytes. I think I will emit the correct size in the front end instead, but I think the current implementation that silently rounds down the bits to the bytes is wrong, so we should do something about that separately.

Yeah - not sure why we even encode it as bits. Maybe we use the bit size for bitfields? A verifier check that the size is a multiple of 8 in cases other than bitfields might be suitable?

@augusto2112
Copy link
Contributor Author

@dwblaikie I talked with Adrian, his idea is to emit the size in bits whenever it does not cleanly fit in a byte. His point (and I agree) is that there may be tools that can use this bit size, and if we always round up when emitting the DWARF we will lose this information, with no way to recover it. I updated the patch to implement this behavior.

Copy link

github-actions bot commented Nov 10, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

case DW_AT_bit_size:
// Convert the bit size to byte size, and round it up to the minimum about
// of bytes that will fit the bits.
byte_size = (form_value.Unsigned() + 7) / 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely out of scope of the PR but wonder if we should introduce something in llvm/Support/MathExtras.h that does this for us.

clang-tools-extra/test/clang-tidy/checkers/bugprone/branch-clone.cpp:969:    int n = (count + 7) / 8;                                                                                                                 
clang/test/SemaCXX/constexpr-duffs-device.cpp:6:        unsigned long n = (count + 7) / 8;                                                                                                                            
clang/lib/CodeGen/Targets/X86.cpp:2988:  uint64_t SizeInBytes = (CGF.getContext().getTypeSize(Ty) + 7) / 8;                                                                                                           
flang/runtime/edit-input.cpp:84:  auto significantBytes{static_cast<std::size_t>(digits * LOG2_BASE + 7) / 8};                                                                                                        
lldb/source/Core/ValueObjectChild.cpp:172:                    (bitfield_end - *type_bit_size + 7) / 8;                                                                                                                
lldb/source/Core/DumpDataExtractor.cpp:674:          (llvm::APFloat::getSizeInBits(semantics) + 7) / 8;                                                                                                               
lldb/source/Plugins/ExpressionParser/Clang/IRForTarget.cpp:1044:    lldb::offset_t value_alignment = (*opt_alignment + 7ull) / 8ull;                                  
lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.h:816:      return (*bit_size + 7) / 8;                                                                                                                          
lldb/source/Symbol/CompilerType.cpp:561:    return (*bit_size + 7) / 8;                                                                                                                                               
lldb/source/Expression/Materializer.cpp:554:        size_t byte_align = (*opt_bit_align + 7) / 8;                                                                                                                     
lldb/source/Expression/Materializer.cpp:952:      size_t byte_align = (*opt_bit_align + 7) / 8;                                                                                                                       
lldb/source/Utility/Scalar.cpp:118:    StoreIntToMemory(val, storage.data(), (val.getBitWidth() + 7) / 8);                                                                                                            
llvm/include/llvm/CodeGen/MachineValueType.h:354:      return {(BaseSize.getKnownMinValue() + 7) / 8, BaseSize.isScalable()};                                 
llvm/include/llvm/CodeGen/TargetLowering.h:3466:  uint32_t CondCodeActions[ISD::SETCC_INVALID][(MVT::VALUETYPE_SIZE + 7) / 8];                                                                                     
llvm/include/llvm/CodeGen/ValueTypes.h:375:      return {(BaseSize.getKnownMinValue() + 7) / 8, BaseSize.isScalable()};                                                                                               
llvm/include/llvm/CodeGen/LowLevelType.h:187:    return {(BaseSize.getKnownMinValue() + 7) / 8, BaseSize.isScalable()};                                                                                               
llvm/utils/TableGen/AsmWriterEmitter.cpp:407:  unsigned BytesNeeded = ((OpcodeInfoBits - BitsLeft) + 7) / 8;                                                                                                          
llvm/lib/Analysis/ConstantFolding.cpp:598:  unsigned BytesLoaded = (IntType->getBitWidth() + 7) / 8;                                                                                                                  
llvm/lib/Target/M68k/M68kISelLowering.cpp:736:      uint32_t OpSize = (VA.getLocVT().getSizeInBits() + 7) / 8;                                                                        
llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp:1834:    size_t NumBytes = (Val.getBitWidth() + 7) / 8;                                                                                                                     
llvm/lib/Target/Lanai/MCTargetDesc/LanaiAsmBackend.cpp:101:  unsigned NumBytes = (getFixupKindInfo(Kind).TargetSize + 7) / 8;                                                                                         
llvm/lib/Target/PowerPC/PPCISelLowering.cpp:5129:  uint32_t OpSize = (Arg.getValueSizeInBits() + 7) / 8;                                                                                                              
llvm/lib/Target/Mips/MCTargetDesc/MipsAsmBackend.cpp:258:  unsigned NumBytes = (getFixupKindInfo(Kind).TargetSize + 7) / 8;                                                                                           
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:1360:  const unsigned MemSize = (Ty.getSizeInBits() + 7) / 8;                                                                                                       
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:6441:  const unsigned MemSize = (Size + 7) / 8;                                                                                                                        
llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCAsmBackend.cpp:183:  unsigned Size = (BitSize + 7) / 8;                                                                                                                 
llvm/lib/Target/AArch64/AArch64FastISel.cpp:3087:      unsigned ArgSize = (ArgVT.getSizeInBits() + 7) / 8;                                                                                                            
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:6537:      unsigned NumRegs = (Size + 7) / 8;                                                                                                                         
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:7700:      OpSize = (OpSize + 7) / 8;                                                                                                                                 
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:7451:  unsigned SizeInBytes = (Ty.getScalarSizeInBits() + 7) / 8;                                                                       
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:19493:        PtrOff = (BitWidth + 7 - NewBW) / 8 - PtrOff;
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:28001:  Intervals.insert(0, (St->getMemoryVT().getSizeInBits() + 7) / 8,
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:28021:    int64_t Length = (Chain->getMemoryVT().getSizeInBits() + 7) / 8;
llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp:284:    OffsetByte = -((AllocBefore + 7) / 8 + (BitWidth + 7) / 8);
llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp:291:      Target.setBeforeBytes(AllocBefore, (BitWidth + 7) / 8);
llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp:301:    OffsetByte = (AllocAfter + 7) / 8;
llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp:308:      Target.setAfterBytes(AllocAfter, (BitWidth + 7) / 8);
llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp:1859:          (AllocBefore + 7) / 8 - Target.allocatedBeforeBytes() - 1, 0);
llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp:1861:          (AllocAfter + 7) / 8 - Target.allocatedAfterBytes() - 1, 0);
llvm/lib/Transforms/Utils/VNCoercion.cpp:308:      (DL.getTypeSizeInBits(SrcVal->getType()).getFixedValue() + 7) / 8;
llvm/lib/Transforms/Utils/VNCoercion.cpp:309:  uint64_t LoadSize = (DL.getTypeSizeInBits(LoadTy).getFixedValue() + 7) / 8;
llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:1112:  return Log2_32_Ceil((TypeSizeFixed + 7) / 8);
mlir/lib/IR/Operation.cpp:160:      propertiesStorageSize((fullPropertiesStorageSize + 7) / 8), name(name) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh kind-of-related discourse post from last year: https://discourse.llvm.org/t/rfc-proper-low-level-abstractions-to-llvm-bits-bytes-addresses-and-masks/63081

Looks like it was never resolved

parent_bit_size = parent_byte_size * 8;
else
parent_bit_size =
parent_die.GetAttributeValueAsUnsigned(DW_AT_bit_size, UINT64_MAX);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we already have tests in LLDB that cover this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'll definitely have a downstream test that exercises this behavior, but I don't think on the clang side this would be easy to test, do you have any suggestions on how to do that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DWARFASTParserClang has unittests where we sometimes use Yaml2Obj and directly invoke the parser. But it's quite finicky. Let me know if you run into troubles with it

The existing code will always round down the size in bits when
calculating the size in bytes (for example, a types with 1-7 bits will
be emitted as 0 bytes long). Fix this by emitting the size in bits if
the size is not aligned to a byte.
@augusto2112 augusto2112 changed the title Fix size in bytes of type DIEs when size in bits is not a multiple of 8 Emit DIE's size in bits when size is not a multiple of 8 Nov 10, 2023
@dwblaikie
Copy link
Collaborator

@dwblaikie I talked with Adrian, his idea is to emit the size in bits whenever it does not cleanly fit in a byte. His point (and I agree) is that there may be tools that can use this bit size, and if we always round up when emitting the DWARF we will lose this information, with no way to recover it. I updated the patch to implement this behavior.

Yeah - ultimately it's pretty much up to you folks how you encode Swift and ObjC things - so take anything I'm saying as advice but not requirement.

But it feels a bit at-odds with precedent in the way other things that seem similar to me are encoded in DWARF already, and precedent is pretty much most/all we have to go with in DWARF.

I guess one question that might be relevant - does Swift have something like sizeof and what result does it give for these sort of types with bits to spare?

But like I said - it seems like structs with tail padding are similar to this situation - we still describe the whole size of the struct, because that's used for creating arrays of instances, ABI passing, etc. But the tail padding can still be used, in certain circumstances, when laying out a derived class. We encode this as the POD-ness of the type, and so if you wanted to create a class that derived from one described in DWARF you could do so & would know whether or not to put the derived class's members into the tail padding of the base or not.

@augusto2112
Copy link
Contributor Author

I guess one question that might be relevant - does Swift have something like sizeof and what result does it give for these sort of types with bits to spare?

You can't actually use that with these types as these are special compiler builtin types which aren't actually accessible in source code.

But like I said - it seems like structs with tail padding are similar to this situation - we still describe the whole size of the struct, because that's used for creating arrays of instances, ABI passing, etc. But the tail padding can still be used, in certain circumstances, when laying out a derived class. We encode this as the POD-ness of the type, and so if you wanted to create a class that derived from one described in DWARF you could do so & would know whether or not to put the derived class's members into the tail padding of the base or not.

I understand the rationale of basing this on precedent, but in this case in this case we should break from it for two reasons:

  • DW_AT_BIT_SIZE is already a standardized attribute in Dwarf that fits this use case.
  • Round up to the nearest byte would lose information, which can be kept with fairly minimal downsides in my opinion.

@dwblaikie
Copy link
Collaborator

I guess one question that might be relevant - does Swift have something like sizeof and what result does it give for these sort of types with bits to spare?

You can't actually use that with these types as these are special compiler builtin types which aren't actually accessible in source code.

Perhaps observable indirectly?

But like I said - it seems like structs with tail padding are similar to this situation - we still describe the whole size of the struct, because that's used for creating arrays of instances, ABI passing, etc. But the tail padding can still be used, in certain circumstances, when laying out a derived class. We encode this as the POD-ness of the type, and so if you wanted to create a class that derived from one described in DWARF you could do so & would know whether or not to put the derived class's members into the tail padding of the base or not.

I understand the rationale of basing this on precedent, but in this case in this case we should break from it for two reasons:

  • DW_AT_BIT_SIZE is already a standardized attribute in Dwarf that fits this use case.

I'm arguing it doesn't fit it particularly well. We use it for bit fields - which are pretty special, for instance, but it seems like this thing isn't quite like that - it does have a whole byte size (if you allocated an array of them, for instance, I'm guessing they're a byte each, right?) but then has some padding bits that can be reused in some circumstances? That's why I'm saying it seems like it fits more closely to the struct padding representation.

  • Round up to the nearest byte would lose information, which can be kept with fairly minimal downsides in my opinion.

Seems like it'd still need to be special cased, right? The consumer would go "oh, this has a bit size, but if we want an array of them, or to allocate them for ABI purposes, etc, I have to round it up to the nearest byte"? or something like that.

Some pointers to documentation about these types, and the range of uses/instances there are might be handy (like is this a general concept? Or is it only one type that uses this (bool equivalent, with 7 padding bytes unused) or a class of types (a small finite list of them? Unbounded (like if I put a bool in my custom struct - does my custom struct end up with a bit size too?))

@adrian-prantl
Copy link
Collaborator

I'm arguing it doesn't fit it particularly well. We use it for bit fields - which are pretty special, for instance, but it seems like this thing isn't quite like that - it does have a whole byte size (if you allocated an array of them, for instance, I'm guessing they're a byte each, right?) but then has some padding bits that can be reused in some circumstances? That's why I'm saying it seems like it fits more closely to the struct padding representation.

Swift is really clever at packing at packing aggregate types. For example, the discriminator bits for enums are always stored in unused bits of the payload type. For a contrived example, the type Optional<Optional> has a size of 3 bits.

@dwblaikie
Copy link
Collaborator

I'm arguing it doesn't fit it particularly well. We use it for bit fields - which are pretty special, for instance, but it seems like this thing isn't quite like that - it does have a whole byte size (if you allocated an array of them, for instance, I'm guessing they're a byte each, right?) but then has some padding bits that can be reused in some circumstances? That's why I'm saying it seems like it fits more closely to the struct padding representation.

Swift is really clever at packing at packing aggregate types. For example, the discriminator bits for enums are always stored in unused bits of the payload type. For a contrived example, the type Optional has a size of 3 bits.

Sure enough - C++ tail padding does similar things: https://godbolt.org/z/4jsYWK6ev - where t2's member is packed into the tail padding of t1, but still leaves more tail padding for t3. (it's not identical, of course - it's somewhere between C++ tail padding and the ad-hoc stuff we have in LLVM for PointerIntPair, where multiple of those can be nested together and use the remaining bits for another element)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants