Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVM][ConstantFolding] missed fold for libcalls with long double constants #19916

Open
llvmbot opened this issue Apr 24, 2014 · 20 comments
Open
Labels
bugzilla Issues migrated from bugzilla confirmed Verified by a second party constant-folding Problems related to constant folding in the optimizer floating-point Floating-point math missed-optimization

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 24, 2014

Bugzilla Link 19542
Version 3.4
OS Linux
Reporter LLVM Bugzilla Contributor
CC @DougGregor,@irishrover

Extended Description

When using mt19937 or mt19937_64 in combination with a uniform_real_distribution, the performance is about an order of magnitude worse than gcc's. But it only happens with that particular combination. Those generators are fine with uniform_int_distribution, and another generator like default_random_engine works fine with uniform_real_distribution. A program to reproduce this and my results:

#include <iostream>
#include <vector>
#include <chrono>
#include <random>

template< typename T_rng, typename T_dist>
double timer(T_rng& rng, T_dist& dist, int n){
    std::vector< typename T_dist::result_type > vec(n, 0);
    auto t1 = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < n; ++i)
        vec[i] = dist(rng);
    auto t2 = std::chrono::high_resolution_clock::now();
    auto runtime = std::chrono::duration_cast<std::chrono::microseconds>(t2-t1).count()/1000.0;
    return runtime;
}

int main(){
    const int n = 10000000;
    std::default_random_engine rng_default(1);
    std::mt19937 rng_mt (1);
    std::mt19937_64 rng_mt_64 (1);
    std::uniform_int_distribution<int> dist_int(0,1000);
    std::uniform_real_distribution<float> dist_float(0.0, 1.0);

    std::cout << "int_default: " << timer(rng_default, dist_int, n) << std::endl;
    std::cout << "int_mt: " << timer(rng_mt_64, dist_int, n) << std::endl;
    std::cout << "int_mt_64: " << timer(rng_mt_64, dist_int, n) << std::endl;
    std::cout << "float_default: " << timer(rng_default, dist_float, n) << std::endl;
    std::cout << "float_mt: " << timer(rng_mt, dist_float, n) << std::endl;
    std::cout << "float_mt_64: " << timer(rng_mt_64, dist_float, n) << std::endl;
}
result for my linux 64bit:
// g++ 4.8.1
int_default: 184.687
int_mt: 176.092
int_mt_64: 176.076
float_default: 45.478
float_mt: 58.018
float_mt_64: 90.073

// clang 3.4
int_default: 207.594
int_mt: 195.68
int_mt_64: 195.702
float_default: 54.535
float_mt: 747.402 <----- slow
float_mt_64: 781.237 <-- slow
@irishrover
Copy link
Contributor

Please provide your command-lines for both compilers.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Apr 25, 2014

clang++ -O3 -std=c++11 random.cpp

and

g++ -O3 -std=c++11 random.cpp

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 9, 2021
@cor3ntin cor3ntin added the confirmed Verified by a second party label May 1, 2024
@cor3ntin
Copy link
Contributor

cor3ntin commented May 1, 2024

https://compiler-explorer.com/z/v4rjbzd8T

GCC 13: float_mt_64: 100ms
Clang trunk: float_mt_64: 2098ms
Clang trunk + libc++: float_mt_64: 73ms

so std::uniform_real_distribution<float> + std::mt19937_64 as implemented in libstdc++ seem to hit
some optimization issue

@erichkeane erichkeane added ipo Interprocedural optimizations ipa Interprocedural analysis labels May 1, 2024
@nickdesaulniers
Copy link
Member

nickdesaulniers commented May 22, 2024

The IR here is full of

%32 = tail call noundef x86_fp80 @logl(x86_fp80 noundef 0xK401F8000000000000000) #8
%33 = tail call noundef x86_fp80 @logl(x86_fp80 noundef 0xK40008000000000000000) #8

are we failing to fold these constants?

@nickdesaulniers
Copy link
Member

A simpler reproducer: https://compiler-explorer.com/z/b4hjW7x5s

(not sure that's the only issue here)

@nickdesaulniers
Copy link
Member

If we just call log rather than logl, this gets folded by EarlyCSE. I'm guessing EarlyCSE is invoking some utility related to constant folding that maybe doesn't recognize logl.

@nickdesaulniers nickdesaulniers added missed-optimization constant-folding Problems related to constant folding in the optimizer labels May 22, 2024
@nickdesaulniers
Copy link
Member

simplifyInstruction is called from EarlyCSE, which eventually calls llvm::simplifyCall. I wonder if llvm::canConstantFoldCallTo needs an entry for logl (the blessed long double variant)? cc @efriedma-quic

@efriedma-quic
Copy link
Collaborator

Constant-folding long double math is hard because we currently use the host libc to fold transcendental functions... and which forms of "long double" libc supports depends on the host. That said, there's ongoing work; see #90611.

Eventually, I think we want to add native support to APFloat for these operations, but correctly rounded floating-point ops are hard to write, so nobody has attempted it. That said, since llvm-libc needs them anyway, maybe we can reuse that code once it's written.

@nickdesaulniers
Copy link
Member

That said, since llvm-libc needs them anyway, maybe we can reuse that code once it's written.

cc @lntue @michaelrj-google

@nickdesaulniers
Copy link
Member

nickdesaulniers commented May 22, 2024

Seems like similar issues exist wrt long double and constant folding, for example log2l, tanl, etc. also demonstrates this issue. cc @jcranmer-intel

@nickdesaulniers nickdesaulniers changed the title Odd performance drop with mt19937 and uniform_real_distribution [LLVM][ConstantFolding] missed fold for libcalls with long double constants May 22, 2024
@nickdesaulniers nickdesaulniers removed ipo Interprocedural optimizations ipa Interprocedural analysis labels May 22, 2024
@jcranmer-intel jcranmer-intel added the floating-point Floating-point math label May 22, 2024
@michaelrj-google
Copy link
Contributor

From the code reuse side it should be doable eventually. I'm currently working on Project Hand in Hand to share code between libc and libc++. Once we get that working we can look into expanding it to more projects inside of LLVM.

As for the long double functions, I don't think they're currently a priority. There's been a good amount of work done on float128 functions, but the special intel 80 bit long double versions will require special handling for correct rounding.

@jcranmer-intel
Copy link
Contributor

Constant folding of libm functions is controversial (cf. https://discourse.llvm.org/t/fp-constant-folding-of-floating-point-operations/73138), although I don't think we've reached a consensus other than "it's sometimes acceptable and sometimes not acceptable".

(As a side note, using long double in general is something I'd avoid in part because it maps to pretty different types on different systems, and on many systems, the types it does map to tend to be wonkier and somewhat less performant than double.)

@nickdesaulniers
Copy link
Member

Perhaps worth filing a bug report to libstdc++ (for the original issue) noting the current issue in LLVM.

@jcranmer-intel
Copy link
Contributor

There does seem to be some precedent for libstdc++ being willing to use double over long double here for speed: gcc-mirror/gcc@60d9f25

@nickdesaulniers
Copy link
Member

Question: 🙋 , I know it's nice for APFloat to be able to handle Arbitrary Precision, but surely if we knew the constant was a float with the same precision as the platform's long double, could we not just invoke logl (or whatever)?

@efriedma-quic
Copy link
Collaborator

could we not just invoke logl (or whatever)?

Yes, but you need a bunch of complicated ifdefs to make sure you don't break cross-compiling. This is what #90611 does.

@efriedma-quic
Copy link
Collaborator

Also, just invoking the host's logl isn't feasible for some uses. For example, for C++ constexpr math functions, if we don't consistently evaluate math functions to the same result, we break ODR, and the program has undefined behavior.

@jcranmer-intel
Copy link
Contributor

Also, just invoking the host's logl isn't feasible for some uses.

Also, there's no guarantee that the host math function is the same as the function used by the runtime implementation. Especially in cross-compilation scenarios.

(Also also, while I would hope that no one who compiles llvm/clang is doing so with -ffast-math, I wouldn't be shocked to discover that there are people who are absolutely doing that.)

@nickdesaulniers
Copy link
Member

Also, there's no guarantee that the host math function is the same as the function used by the runtime implementation.

I imagine that GCC does exactly that; assume the host math function IS used by the runtime implementation?

@efriedma-quic
Copy link
Collaborator

gcc uses MPFR (which always produces correctly rounded results).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla confirmed Verified by a second party constant-folding Problems related to constant folding in the optimizer floating-point Floating-point math missed-optimization
Projects
None yet
Development

No branches or pull requests

9 participants