Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move sqrt from `std` to `core` #63455

Open
wants to merge 3 commits into
base: master
from

Conversation

@Lokathor
Copy link
Contributor

commented Aug 11, 2019

Per #50145 (comment) and #50145 (comment)

  • This PR moves the sqrt method of the f32 and f64 types from libstd to libcore.
  • I literally just cut and pasted the methods from one place to the other, it was simpler than I expected it to be.
  • I didn't alter any tests, though naturally any tests that check methods available on a type in std should also still work if the method is available on the type in core.
  • I didn't add and new tests that check that the functionality is available in just core, because I don't know where those tests go. There's a whole lot of folders and no tests/ folder in the project root like a normal crate has. Someone please tell me.
  • I didn't put this behind a feature flag because I don't know how to do that, and also I'm not sure that a feature flag is necessary. If a feature flag is necessary someone also please explain how to do that.

cc @varkor, cc @alexcrichton

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Aug 11, 2019

r? @shepmaster

(rust_highfive has picked a reviewer for you, use r? to override)

@clarfon

This comment has been minimized.

Copy link
Contributor

commented Aug 11, 2019

This can generate a libmath call on systems without hardware sqrt, I believe, which is why it was in std before.

@Lokathor

This comment has been minimized.

Copy link
Contributor Author

commented Aug 11, 2019

According to an issue on the compiler-builtins crate (link), that will apparently never happen with any current target.

If it does on some new target that Rust is ported to, it's as "simple" as patching compiler-builtins to expose that function in libm.

@nikic

This comment has been minimized.

Copy link
Contributor

commented Aug 11, 2019

We should probably resolve #62729 before exposing more things from libcore, as this will likely run into the same issue.

@roblabla

This comment has been minimized.

Copy link
Contributor

commented Aug 11, 2019

I'll try to get a PR to #62729 rolling, exposing the necessary symbols for soft-float impls. The PR will probably need to add a number of new symbols anyways, I can add sqrt then too.

@shepmaster shepmaster added the T-libs label Aug 12, 2019

@shepmaster

This comment has been minimized.

Copy link
Member

commented Aug 12, 2019

Too deep knowledge for me; reassigning to someone from libs...

r? @Amanieu

@rust-highfive rust-highfive assigned Amanieu and unassigned shepmaster Aug 12, 2019

@alexcrichton

This comment has been minimized.

Copy link
Member

commented Aug 12, 2019

I'm not certain that we're ready to do this just yet. There's a number of reasons that we haven't started moving so many intrinsics to libcore yet. While the compiler-builtins fallback implementation is a prerequisite it has unfortunate interactions with symbol visibility which also causes issues. Native platform libm libraries (or the equivalent) typically have quite optimized routines for various functions (especially things like sqrt). The fallback implementation present in compiler-builtins hasn't been verified to be of adequate performance compared to these native implementations.

By putting a symbol in compiler-builtins we're shadowing the system libm which prevents usage of the native optimized routine if it's available, which basically means that this could result in a performance regression. AFAIK no performance analysis has been done to determine the results of this movement.

Note that on some platforms LLVM will expand the intrinsic inline (such as x86/x86_64 most likely). These will have no performance regression even if Rust's compiler-builtins is slower than the native libm because the native libm isn't called. The worrisome part is platforms where the intrinsic isn't expanded inline and it's lowered to a library call.

@Lokathor

This comment has been minimized.

Copy link
Contributor Author

commented Aug 12, 2019

I thought you said in the compiler-builtins issue that LLVM wouldn't do that? I guess you meant "LLVM won't do that on this particular target"?

@alexcrichton

This comment has been minimized.

Copy link
Member

commented Aug 13, 2019

LLVM in general will sort of just do what's best at the time. If LLVM can lower an intrinsinc call like these to an inline instruction sequence, it will. Otherwise if LLVM can't it will lower it to a function call. For x86/x86_64 it seems to be dependent on whether the SSE feature is enabled. Although it's on by default it's not always on. I suspect that LLVM's logic for lowering is different for other architectures like ARM or AArch64.

@Lokathor

This comment has been minimized.

Copy link
Contributor Author

commented Aug 13, 2019

Note that f64 sqrt requires sse2, not just sse, though yes both CPU feature sets are enabled by default for rust builds targeting i686 and x86_64. Similarly, ARM and AArch64 have a sqrt instruction available in common, modern target setups. As does wasm32.

Focusing back on point: What concrete action can be done to advance this PR towards being accepted?

Regarding the concerns about possibly bad performance on some targets:

  • On Stable, std calls intrinsics::sqrtf32.
  • On Nightly, you can do the same in no_std with the core_intrinsics unstable feature.
  • If using intrinsics::sqrtf32 (or any other float intrinsic) has some sort of performance problems on some platforms then (as far as I can tell) that's already a problem on those platforms. This does not give us a new problem.
  • If it only might have a performance problem on some platforms isn't that solved separately with other PRs once a problem is actually detected? Either to the libm crate (adjusting its code) and/or compiler-builtins crate (not exporting the libm crate's function so that the system libm function is used).

I certainly want to do this right, but I also want to try and have a clear path forward, to keep making progress, because "floating point in core" has been stalled for many months now.

@alexcrichton

This comment has been minimized.

Copy link
Member

commented Aug 14, 2019

For moving this forward, are there parts of my previous comment that are unclear? Sorry I don't have a ton of time to manage this, so I can help out sometimes but I cannot personally lead the charge and make sure all the ducks are in a row for all PRs.

@Lokathor

This comment has been minimized.

Copy link
Contributor Author

commented Aug 15, 2019

  • It's not clear how we could ever fully verify against a potential performance regression on all possible targets ahead of time, since we don't really have easy access to all those targets. How much checking is enough checking?
  • This PR doesn't make a change compiler-builtins or libm, so if those crates shadow the system libm on some platforms and cause performance problems then doesn't the system libm already get shadowed even without this change? Wouldn't we have already gotten a bug report? This is the part I'm the most confused about.

Apart from those questions, I did do a quick check of core-only sqrt on all our official targets by placing

#![feature(core_intrinsics)]
#![no_std]

pub fn sqrt_f32(f: f32) -> f32 {
  unsafe { core::intrinsics::sqrtf32(f) }
}

pub fn sqrt_f64(f: f64) -> f64 {
  unsafe { core::intrinsics::sqrtf64(f) }
}

into godbolt, using Nightly

  • rustc 1.38.0-nightly (60960a260 2019-08-12)

and the compiler args

  • --edition 2018 -C opt-level=3 --target=TARGET

for each TARGET value.

Here's the highlights, though you can also look at the spreadsheet too I guess:

  • 36 of the 96 targets don't actually build and emit ASM on godbolt. "can't find crate for core", iOS files missing, enscripten files missing, things like that. All of these are Tier 2, 2.5, or 3.
    • Note: For asmjs-unknown-emscripten I was able to get some JS output but it seemed like complete nonsense that wasn't performing a sqrt at all, so I counted it in the "doesn't seem to build right on godbolt" group.
  • Some Tier 2 targets compile just fine but DO NOT have a hardware instruction generated.
  • In all other cases, including all Tier 1 targets, wasm32-unknown-unknown, x86 without SSE, and every variant of ARM that did build, a hardware instruction was compiled.

The targets where a call to sqrt is performed are as follows:

  • mips-unknown-linux-musl // MIPS Linux with MUSL
  • mipsel-unknown-linux-musl // MIPS (LE) Linux with MUSL
  • powerpc-unknown-linux-gnu // PowerPC Linux
  • riscv32imac-unknown-none-elf // Bare RISC-V (RV32IMAC ISA)
  • riscv32imc-unknown-none-elf // Bare RISC-V (RV32IMC ISA)
  • riscv64gc-unknown-none-elf // Bare RISC-V (RV64IMAFDC ISA)
  • riscv64imac-unknown-none-elf // Bare RISC-V (RV64IMAC ISA)
  • thumbv7em-none-eabihf // Bare Cortex-M4F, M7F, FPU, hardfloat (this had an instruction for f32 but a call for f64 so we're gonna count it as a call)

If "Bare RISC-V" means "no OS" like I think it does, then they don't really have an existing system libm to get shadowed by us.

Since the libm crate's sqrt comes from the MUSL source I think we're solid on the MUSL targets.

That just leaves PowerPC Linux... I don't know what libm they use, but if it's either MUSL or OpenLibm then that's the same sqrt formula as we're using, so there's no potential for regression there.

In summary, I strongly suspect that we're in the clear.

@alexcrichton

This comment has been minimized.

Copy link
Member

commented Aug 15, 2019

How much checking is enough checking?

For me, I'd like to personally have faith that someone has investigated this to the degree that they know more about it than me, because I know that I at least personally do not know how this works on all platforms. I think doing something in terms of checking is required and we can see where we feel after that.

This PR doesn't make a change compiler-builtins or libm, so if those crates shadow the system libm on some platforms and cause performance problems then doesn't the system libm already get shadowed even without this change? Wouldn't we have already gotten a bug report? This is the part I'm the most confused about.

The compiler-builtins crate doesn't currently export symbols on main platforms that override those found in system libm implementations. It only does that on some platforms (not all) today and otherwise the symbols exported on all platforms are rustc-specific.

In summary, I strongly suspect that we're in the clear.

Thanks for investigating all that! One thing I'm also worried about though is what happens when the default codegen settings are tweaked. For example if sse2 is disabled on x86/x86_64 or if there's some corresponding feature that enables the usage here for other platforms. And yeah I'm primarily worried about platforms like mips/powerpc/riscv for the glibc variants, because I'd expect that the libm on those platforms have asm implementations that are likely faster than ours and could result in a performance regression.

@Lokathor

This comment has been minimized.

Copy link
Contributor Author

commented Aug 15, 2019

The compiler-builtins crate doesn't currently export symbols on main platforms that override those found in system libm implementations. It only does that on some platforms (not all) today and otherwise the symbols exported on all platforms are rustc-specific.

But none of that is affected by this PR, right?

Here's my logic:

  • The list of symbols that compiler-builtins exports on any particular platform is the exact same, with or without this PR, because that's a totally separate crate from libcore.
  • So if compiler-builtins can shadow a symbol and cause a performance regression on a particular platform, it's already doing that (or not) on each particular platform, even without this PR being merged.
  • So any potential performance loss is already happening, and we should have already had a bug report or something.
  • All this PR appears to change is if sqrt is visible at the core or std level

That's why I'm confused about this concern.

One thing I'm also worried about though is what happens when the default codegen settings are tweaked. For example if sse2 is disabled on x86/x86_64 or if there's some corresponding feature that enables the usage here for other platforms.

What rust calls the i586 target is an x86 target with SSE stuff turned off, in which case your ASM looks more like this:

example::sqrt_f32:
        flds    4(%esp)
        fsqrt
        retl

example::sqrt_f64:
        fldl    4(%esp)
        fsqrt
        retl

Because the intel chips supported sqrt before SSE was added, and they've just stayed backwards compatible all this time, so they can still do it with the feature off.

And yeah I'm primarily worried about platforms like mips/powerpc/riscv for the glibc variants, because I'd expect that the libm on those platforms have asm implementations that are likely faster than ours and could result in a performance regression.

Unfortunately, as mentioned, I don't have access to a mips, powerpc, or riscv device and I don't know anyone who does.

  • Where do we go to ask someone to build and test rust for us on these targets?
  • If, after some reasonable period of time (3-6 months maybe), we can't contact anyone to test some code on those targets, then can we just merge this? And if a bug report about a regression does come in after that we can adjust compiler-builtins/libm at that point once we have a specific bug report to look at?
@lzutao

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2019

@Lokathor, you might want to have a look in https://internals.rust-lang.org/t/gcc-compile-farm-for-rustc/9511 . At the moment, they have mips, powerpc computers. See the list for more info: https://cfarm.tetaneutral.net/machines/list/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.