Skip to content

Conversation

vortex73
Copy link
Contributor

@vortex73 vortex73 commented Sep 12, 2025

This PR implements the first part of the LLVM ABI lowering library, proposed in this RFC. It is split out of #140112, which demonstrates how this is going to be used.

The ABI type system is intended to represent all the type information that is necessary to make call lowering decisions. As such, it contains less information than Clang QualTypes, but more information than LLVM IR types. The current type system has enough information to implement the x86_64 SysV ABI, but some extensions will likely be needed in the future for other targets (e.g. unadjusted alignment).

The type system expects layout information (like size, offset and alignment) to already be computed by the frontend.

The types are constructed using TypeBuilder, which uses a BumpPtrAllocator. The types themselves are not uniqued -- instead we cache the QualType -> ABI type translation (in future patches).

@vortex73 vortex73 marked this pull request as ready for review September 12, 2025 17:00
@nikic nikic changed the title [LLVMABI] The ABI Typesystem [LLVMABI] Implement the ABI Typesystem Sep 13, 2025
@rjmccall
Copy link
Contributor

I appreciate the elegance of having a type system that eliminates differences that don't really matter at the ABI level, but having to allocate a bunch of memory to translate types to a new representation that's used for a few cycles while we call this portable library seems like a pretty high amount of overhead.

@vortex73
Copy link
Contributor Author

vortex73 commented Sep 16, 2025

I appreciate the elegance of having a type system that eliminates differences that don't really matter at the ABI level, but having to allocate a bunch of memory to translate types to a new representation that's used for a few cycles while we call this portable library seems like a pretty high amount of overhead.

The prototype shows compile-time impact of <0.1% at -O0, even without any performance optimization work. Isn't that pretty negligible in practice?

@rjmccall
Copy link
Contributor

rjmccall commented Sep 16, 2025

I appreciate the elegance of having a type system that eliminates differences that don't really matter at the ABI level, but having to allocate a bunch of memory to translate types to a new representation that's used for a few cycles while we call this portable library seems like a pretty high amount of overhead.

The prototype shows compile-time impact of <0.1% at -O0, even without any performance optimization work. Isn't that pretty negligible in practice?

Those measurements show a .6% compile time impact. Are you looking at the file sizes?

Also, this change is not supposed to actually change the emitted instructions, so something's pretty clearly up there.

@vortex73
Copy link
Contributor Author

vortex73 commented Sep 16, 2025

I appreciate the elegance of having a type system that eliminates differences that don't really matter at the ABI level, but having to allocate a bunch of memory to translate types to a new representation that's used for a few cycles while we call this portable library seems like a pretty high amount of overhead.

The prototype shows compile-time impact of <0.1% at -O0, even without any performance optimization work. Isn't that pretty negligible in practice?

Those measurements show a .6% compile time impact. Are you looking at the file sizes?

Also, this change is not supposed to actually change the emitted instructions, so something's pretty clearly up there.

I'm referring to instructions.
Also isn't the increase in instruction count mainly due to additional code being compiled, new code paths etc?

@nikic
Copy link
Contributor

nikic commented Sep 16, 2025

I appreciate the elegance of having a type system that eliminates differences that don't really matter at the ABI level, but having to allocate a bunch of memory to translate types to a new representation that's used for a few cycles while we call this portable library seems like a pretty high amount of overhead.

The prototype shows compile-time impact of <0.1% at -O0, even without any performance optimization work. Isn't that pretty negligible in practice?

Those measurements show a .6% compile time impact. Are you looking at the file sizes?

Also, this change is not supposed to actually change the emitted instructions, so something's pretty clearly up there.

Ah, there is some confusion about the metric here. "instructions" here is referring to the number of instructions retired during compilation. This is a compile-time metric. I think the .6% you are referring to is the wall-time metric on the clang build, which is too noisy to resolve such small differences. (The instructions metric for the clang build is also dominated by new code being compiled, not by the compile-time impact of the change itself. The only relevant metrics here are the instruction counts for stage[12]-O0-g.)

@rjmccall
Copy link
Contributor

Okay, if that's total instructions retired rather than emitted, that's an interesting compile-time measurement, but it's also one we wouldn't expect to reflect the performance impact of significant changes in memory use patterns. I understand that wall-clock measurements are noisy on the level of 1-2%, but that means measuring them requires more attention, not that we should just absorb arbitrary accumulated regressions because they're all within the noise. IRGen is generally a pretty small contributor to the overall compile time, so a .6% increase in overall compile time actually means a very large relative increase in IRGen time.

@nikic
Copy link
Contributor

nikic commented Sep 17, 2025

Okay, if that's total instructions retired rather than emitted, that's an interesting compile-time measurement, but it's also one we wouldn't expect to reflect the performance impact of significant changes in memory use patterns.

Instructions retired is our standard compile-time metric, because it allows us to resolve even small regressions at the sub-0.1% level. For most changes, I have found it to be a good proxy for wall-time as far as compiler workloads are concerned.

I can see how the additional allocations here might cause effects that are not well reflected in instruction count metrics, so I ran some local wall-time measurements with large run counts. These are targeting the -O0 configuration (I've excluded -g to remove more backend time).

The results for two files with different characteristics are:

sqlite3.c:
Old: Time (mean ± σ):     208.9 ms ±   1.6 ms
New: Time (mean ± σ):     208.6 ms ±   1.7 ms

tramp3d-v4.cpp:
Old: Time (mean ± σ):      1.432 s ±  0.008 s
New: Time (mean ± σ):      1.433 s ±  0.006 s 

There is no statistically significant difference here, and I have not been able to get the noise floor below this level.

I understand that wall-clock measurements are noisy on the level of 1-2%, but that means measuring them requires more attention, not that we should just absorb arbitrary accumulated regressions because they're all within the noise.

I'm glad to hear that you care about accumulated regressions in Clang! I do as well, which is why I run the compile-time tracker. This must be the first time somebody has higher standards for compile-time regressions than I do :) My usual experience on the Clang side is that "small" regressions in the 0.1-0.3% range are not even worth investigating.

IRGen is generally a pretty small contributor to the overall compile time, so a .6% increase in overall compile time actually means a very large relative increase in IRGen time.

To be clear, this .6% number you've picked out is meaningless. It's not statistically significant and might as well be -1.5% for all we know. (In the interface, the statistically significant results are colored.)

@rjmccall
Copy link
Contributor

.6% was the best number we had. You are right that it wasn't statistically significant, but that just means it was weak evidence, not that it should be dismissed; a 50% increase on one sample also wouldn't be statistically significant, but it's obviously evidence of a problem.

Anyway, your new numbers are good evidence that this isn't a problem. Thank you for running them.

@vortex73
Copy link
Contributor Author

@rnk @efriedma-quic ping?

Comment on lines 10 to 12
/// This file defines the Types and related helper methods concerned to the
/// LLVMABI library which mirrors ABI related type information from
/// the LLVM frontend.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole comment reads a bit strangely, e.g. "concerned to" is not normal English phrasing. Maybe rewrite to something like:

The file defines the type system of the LLVMABI library, which frontends can use to describe the aspects of their high level types that affect the ABI.

(I'm sure this can still be improved!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, Thanks!

bool CXXRecord = false) {
FieldInfo *FieldArray = Allocator.Allocate<FieldInfo>(Fields.size());

for (size_t I = 0; I < Fields.size(); ++I) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (size_t I = 0; I < Fields.size(); ++I) {
for (size_t I = 0, E = Fields.size(); I != E; ++I) {

}

for (const auto &VBase : RT->getVirtualBaseClasses()) {
if (!isEmptyForABI(VBase.FieldType))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this whole loop be if (!RT->getVirtualBaseClasses().empty()) return false; instead, or am I missing some nuance?

Comment on lines +434 to +438
uint64_t ElementSize = ElementType->getSizeInBits().getFixedValue();
uint64_t ComplexSize = ElementSize * 2;

return new (Allocator.Allocate<ComplexType>())
ComplexType(ElementType, ComplexSize, Align);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uint64_t ElementSize = ElementType->getSizeInBits().getFixedValue();
uint64_t ComplexSize = ElementSize * 2;
return new (Allocator.Allocate<ComplexType>())
ComplexType(ElementType, ComplexSize, Align);
uint64_t ElementSizeInBits = ElementType->getSizeInBits().getFixedValue();
uint64_t ComplexSizeInBits = ElementSizeInBits * 2;
return new (Allocator.Allocate<ComplexType>())
ComplexType(ElementType, ComplexSizeInBits, Align);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants