-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cycleclock::Now for RISC-V and PPC #955
Conversation
e79df62
to
ee15d9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, ok, after staring at snippets made from that function on godbolt
i have to agree that this appears to be sane in general.
Might be good to hear from RISC-V folk, but other than that LG.
This seems overall decent to me, though I wouldn't mind seeing some ppc/risc-v benchmark output before and after the change. i don't have access to either platform. maybe you do @luismarques ? |
@luismarques showed me why my contribution was both wrong (variable naming), and bad (GCC and clang can eliminate some of the assembly if you don't have it in a single ASM block), leading to miscompiles, as he has explained. I haven't had a moment to check or benchmark this on a risc-v system, and don't have powerpc systems to check this on. I agree there are almost certainly improvements needing to be made to the risc-v LLVM backend around tuning sequences like this to avoid branches so we can avoid the inline assembly here, but reasonably this change improves things for versions of clang that have already been released, and also for RISC-V gcc. With the caveat that I haven't managed to compile and test this, I'm happy for it to be committed. |
Well, no need to be so negative about your contribution :)
I have two QEMU VMs, 32- and a 64-bit RISC-V ones. But they are quite finicky, especially the 32-bit one. Just compiling software and their dependencies without running out of memory on the RV32 VM is extremely difficult. This code is going to be vended into the LLVM copy, so that might provide some testing. But my plan was to get this approved upstream before porting the changes to LLVM. Perhaps the manual inspection might suffice for now, unless somebody can test the PPC changes? |
Fixes the following issues with the implementation of `cycleclock::Now`: - The RISC-V implementation wouldn't compile due to a typo; - Both the PPC and RISC-V implementation's asm statements lacked the volatile keyword. This resulted in the repeated read of the counter's high part being optimized away, so overflow wasn't handled at all. Multiple counter reads could also be misoptimized, especially in LTO scenarios. - Relied on the zero/sign-extension of inline asm operands, which isn't guaranteed to occur and differs between compilers, namely GCC and Clang. The PowerPC64 implementation was improved to do a single 64-bit read of the time-base counter. The RISC-V implementation was improved to do the overflow handing in assembly, since Clang would generate a branch, defeating the purpose of the non-branching counter reading approach.
ee15d9d
to
9ab1470
Compare
I've not tested this, but at least by inspection it seems correct. LGTM. |
Thank you all! |
Cherrypick the upstream fix commit a77d5f7 onto llvm/utils/benchmark and libcxx/utils/google-benchmark. This fixes LLVM's 32-bit RISC-V compilation, and the issues mentioned in google/benchmark#955 An additional cherrypick of ecc1685 fixes some minor formatting issues introduced by the preceding commit. Differential Revision: https://reviews.llvm.org/D78084
This is a cherrypick of the upstream fix commit a77d5f7 onto the llvm-test-suite's `MicroBenchmarks/libs/benchmark-1.3.0`, to match the same cherrypick in the LLVM monorepo. This fixes 32-bit RISC-V compilation, and the issues mentioned in google/benchmark#955 An additional cherrypick of ecc1685 fixes some minor formatting issues introduced by the preceding commit. Differential Revision: https://reviews.llvm.org/D78456
Cherrypick the upstream fix commit a77d5f7 onto llvm/utils/benchmark and libcxx/utils/google-benchmark. This fixes LLVM's 32-bit RISC-V compilation, and the issues mentioned in google/benchmark#955 An additional cherrypick of ecc1685 fixes some minor formatting issues introduced by the preceding commit. Differential Revision: https://reviews.llvm.org/D78084
Fixes the following issues with the implementation of `cycleclock::Now`: - The RISC-V implementation wouldn't compile due to a typo; - Both the PPC and RISC-V implementation's asm statements lacked the volatile keyword. This resulted in the repeated read of the counter's high part being optimized away, so overflow wasn't handled at all. Multiple counter reads could also be misoptimized, especially in LTO scenarios. - Relied on the zero/sign-extension of inline asm operands, which isn't guaranteed to occur and differs between compilers, namely GCC and Clang. The PowerPC64 implementation was improved to do a single 64-bit read of the time-base counter. The RISC-V implementation was improved to do the overflow handing in assembly, since Clang would generate a branch, defeating the purpose of the non-branching counter reading approach.
Cherrypick the upstream fix commit a77d5f7 onto llvm/utils/benchmark and libcxx/utils/google-benchmark. This fixes LLVM's 32-bit RISC-V compilation, and the issues mentioned in google/benchmark#955 An additional cherrypick of ecc1685 fixes some minor formatting issues introduced by the preceding commit. Differential Revision: https://reviews.llvm.org/D78084
Cherrypick the upstream fix commit a77d5f7 onto llvm/utils/benchmark and libcxx/utils/google-benchmark. This fixes LLVM's 32-bit RISC-V compilation, and the issues mentioned in google/benchmark#955 An additional cherrypick of ecc1685 fixes some minor formatting issues introduced by the preceding commit. Differential Revision: https://reviews.llvm.org/D78084
Fixes the following issues with the implementation of
cycleclock::Now
:The RISC-V implementation wouldn't compile due to a typo;
Both the PPC and RISC-V implementation's asm statements lacked the
volatile keyword. This resulted in the repeated read of the counter's
high part being optimized away, so overflow wasn't handled at all.
Multiple counter reads could also be misoptimized, especially in LTO
scenarios.
Relied on the zero/sign-extension of inline asm operands, which isn't
guaranteed to occur and differs between compilers, namely GCC and Clang.
The PowerPC64 implementation was improved to do a single 64-bit read of
the time-base counter.
The RISC-V implementation was improved to do the overflow handing in
assembly, since Clang would generate a branch, defeating the purpose
of the non-branching counter reading approach.
I dug into the history of the project and the API usage, and it's not quite
clear that branching would actually be problematic. For instance, the gperftools
repo inherited the same code and eventually changed the PPC implementation to a
branching one (before dropping the code completely). Still, for the considered
use case the chosen approach seems sensible (if more complex and error-prone to
implement), so I have kept non-branching implementations.