Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cycleclock::Now for RISC-V and PPC #955

Merged
merged 1 commit into from
Apr 10, 2020

Conversation

luismarques
Copy link
Contributor

Fixes the following issues with the implementation of cycleclock::Now:

  • The RISC-V implementation wouldn't compile due to a typo;

  • Both the PPC and RISC-V implementation's asm statements lacked the
    volatile keyword. This resulted in the repeated read of the counter's
    high part being optimized away, so overflow wasn't handled at all.
    Multiple counter reads could also be misoptimized, especially in LTO
    scenarios.

  • Relied on the zero/sign-extension of inline asm operands, which isn't
    guaranteed to occur and differs between compilers, namely GCC and Clang.

The PowerPC64 implementation was improved to do a single 64-bit read of
the time-base counter.

The RISC-V implementation was improved to do the overflow handing in
assembly, since Clang would generate a branch, defeating the purpose
of the non-branching counter reading approach.

I dug into the history of the project and the API usage, and it's not quite
clear that branching would actually be problematic. For instance, the gperftools
repo inherited the same code and eventually changed the PPC implementation to a
branching one (before dropping the code completely). Still, for the considered
use case the chosen approach seems sensible (if more complex and error-prone to
implement), so I have kept non-branching implementations.

@luismarques luismarques force-pushed the fix-cycleclock-now branch 2 times, most recently from e79df62 to ee15d9d Compare April 8, 2020 22:52
@LebedevRI
Copy link
Collaborator

@lenary @asb thoughts on RISC-V changes?

src/cycleclock.h Outdated Show resolved Hide resolved
Copy link
Collaborator

@LebedevRI LebedevRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, ok, after staring at snippets made from that function on godbolt
i have to agree that this appears to be sane in general.
Might be good to hear from RISC-V folk, but other than that LG.

@dmah42
Copy link
Member

dmah42 commented Apr 9, 2020

This seems overall decent to me, though I wouldn't mind seeing some ppc/risc-v benchmark output before and after the change. i don't have access to either platform. maybe you do @luismarques ?

@lenary
Copy link
Contributor

lenary commented Apr 9, 2020

@luismarques showed me why my contribution was both wrong (variable naming), and bad (GCC and clang can eliminate some of the assembly if you don't have it in a single ASM block), leading to miscompiles, as he has explained.

I haven't had a moment to check or benchmark this on a risc-v system, and don't have powerpc systems to check this on.

I agree there are almost certainly improvements needing to be made to the risc-v LLVM backend around tuning sequences like this to avoid branches so we can avoid the inline assembly here, but reasonably this change improves things for versions of clang that have already been released, and also for RISC-V gcc.

With the caveat that I haven't managed to compile and test this, I'm happy for it to be committed.

@luismarques
Copy link
Contributor Author

@luismarques showed me why my contribution was both wrong (variable naming), and bad (GCC and clang can eliminate some of the assembly if you don't have it in a single ASM block), leading to miscompiles, as he has explained.

Well, no need to be so negative about your contribution :)
You based your implementation on the PPC one, so it was fairly reasonable to assume that the things you hadn't changed were OK to rely on. The broader issue is that the software ecosystem was still bootstrapping, so it was difficult to test the changes.

@dominichamon

This seems overall decent to me, though I wouldn't mind seeing some ppc/risc-v benchmark output before and after the change. i don't have access to either platform. maybe you do @luismarques ?

I have two QEMU VMs, 32- and a 64-bit RISC-V ones. But they are quite finicky, especially the 32-bit one. Just compiling software and their dependencies without running out of memory on the RV32 VM is extremely difficult. This code is going to be vended into the LLVM copy, so that might provide some testing. But my plan was to get this approved upstream before porting the changes to LLVM. Perhaps the manual inspection might suffice for now, unless somebody can test the PPC changes?

Fixes the following issues with the implementation of `cycleclock::Now`:

- The RISC-V implementation wouldn't compile due to a typo;

- Both the PPC and RISC-V implementation's asm statements lacked the
  volatile keyword. This resulted in the repeated read of the counter's
  high part being optimized away, so overflow wasn't handled at all.
  Multiple counter reads could also be misoptimized, especially in LTO
  scenarios.

- Relied on the zero/sign-extension of inline asm operands, which isn't
  guaranteed to occur and differs between compilers, namely GCC and Clang.

The PowerPC64 implementation was improved to do a single 64-bit read of
the time-base counter.

The RISC-V implementation was improved to do the overflow handing in
assembly, since Clang would generate a branch, defeating the purpose
of the non-branching counter reading approach.
@asb
Copy link

asb commented Apr 10, 2020

I've not tested this, but at least by inspection it seems correct. LGTM.

@dmah42 dmah42 merged commit a77d5f7 into google:master Apr 10, 2020
@dmah42
Copy link
Member

dmah42 commented Apr 10, 2020

Thank you all!

luismarques added a commit to llvm/llvm-project that referenced this pull request Apr 18, 2020
Cherrypick the upstream fix commit a77d5f7 onto llvm/utils/benchmark
and libcxx/utils/google-benchmark.
This fixes LLVM's 32-bit RISC-V compilation, and the issues
mentioned in google/benchmark#955
An additional cherrypick of ecc1685 fixes some minor formatting
issues introduced by the preceding commit.

Differential Revision: https://reviews.llvm.org/D78084
luismarques added a commit to llvm/llvm-test-suite that referenced this pull request Apr 25, 2020
This is a cherrypick of the upstream fix commit a77d5f7 onto
the llvm-test-suite's `MicroBenchmarks/libs/benchmark-1.3.0`,
to match the same cherrypick in the LLVM monorepo.
This fixes 32-bit RISC-V compilation, and the issues
mentioned in google/benchmark#955
An additional cherrypick of ecc1685 fixes some minor formatting
issues introduced by the preceding commit.

Differential Revision: https://reviews.llvm.org/D78456
arichardson pushed a commit to arichardson/llvm-project that referenced this pull request Jul 2, 2020
Cherrypick the upstream fix commit a77d5f7 onto llvm/utils/benchmark
and libcxx/utils/google-benchmark.
This fixes LLVM's 32-bit RISC-V compilation, and the issues
mentioned in google/benchmark#955
An additional cherrypick of ecc1685 fixes some minor formatting
issues introduced by the preceding commit.

Differential Revision: https://reviews.llvm.org/D78084
JBakamovic pushed a commit to JBakamovic/benchmark that referenced this pull request Sep 11, 2020
Fixes the following issues with the implementation of `cycleclock::Now`:

- The RISC-V implementation wouldn't compile due to a typo;

- Both the PPC and RISC-V implementation's asm statements lacked the
  volatile keyword. This resulted in the repeated read of the counter's
  high part being optimized away, so overflow wasn't handled at all.
  Multiple counter reads could also be misoptimized, especially in LTO
  scenarios.

- Relied on the zero/sign-extension of inline asm operands, which isn't
  guaranteed to occur and differs between compilers, namely GCC and Clang.

The PowerPC64 implementation was improved to do a single 64-bit read of
the time-base counter.

The RISC-V implementation was improved to do the overflow handing in
assembly, since Clang would generate a branch, defeating the purpose
of the non-branching counter reading approach.
arichardson pushed a commit to CTSRD-CHERI/libcxx that referenced this pull request Nov 26, 2020
Cherrypick the upstream fix commit a77d5f7 onto llvm/utils/benchmark
and libcxx/utils/google-benchmark.
This fixes LLVM's 32-bit RISC-V compilation, and the issues
mentioned in google/benchmark#955
An additional cherrypick of ecc1685 fixes some minor formatting
issues introduced by the preceding commit.

Differential Revision: https://reviews.llvm.org/D78084
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this pull request Oct 7, 2022
Cherrypick the upstream fix commit a77d5f7 onto llvm/utils/benchmark
and libcxx/utils/google-benchmark.
This fixes LLVM's 32-bit RISC-V compilation, and the issues
mentioned in google/benchmark#955
An additional cherrypick of ecc1685 fixes some minor formatting
issues introduced by the preceding commit.

Differential Revision: https://reviews.llvm.org/D78084
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants