Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc: Improving safe wasm float->int casts #74695

Merged
merged 1 commit into from
Aug 4, 2020

Conversation

alexcrichton
Copy link
Member

@alexcrichton alexcrichton commented Jul 23, 2020

This commit improves code generation for WebAssembly targets when
translating floating to integer casts. This improvement is only relevant
when the nontrapping-fptoint feature is not enabled, but the feature
is not enabled by default right now. Additionally this improvement only
affects safe casts since unchecked casts were improved in #74659.

Some more background for this issue is present on #73591, but the
general gist of the issue is that in LLVM the fptosi and fptoui
instructions are defined to return an undef value if they execute on
out-of-bounds values; they notably do not trap. To implement these
instructions for WebAssembly the LLVM backend must therefore generate
quite a few instructions before executing i32.trunc_f32_s (for
example) because this WebAssembly instruction traps on out-of-bounds
values. This codegen into wasm instructions happens very late in the
code generator, so what ends up happening is that rustc inserts its own
codegen to implement Rust's saturating semantics, and then LLVM also
inserts its own codegen to make sure that the fptosi instruction
doesn't trap. Overall this means that a function like this:

#[no_mangle]
pub unsafe extern "C" fn cast(x: f64) -> u32 {
    x as u32
}

will generate this WebAssembly today:

(func $cast (type 0) (param f64) (result i32)
  (local i32 i32)
  local.get 0
  f64.const 0x1.fffffffep+31 (;=4.29497e+09;)
  f64.gt
  local.set 1
  block  ;; label = @1
    block  ;; label = @2
      local.get 0
      f64.const 0x0p+0 (;=0;)
      local.get 0
      f64.const 0x0p+0 (;=0;)
      f64.gt
      select
      local.tee 0
      f64.const 0x1p+32 (;=4.29497e+09;)
      f64.lt
      local.get 0
      f64.const 0x0p+0 (;=0;)
      f64.ge
      i32.and
      i32.eqz
      br_if 0 (;@2;)
      local.get 0
      i32.trunc_f64_u
      local.set 2
      br 1 (;@1;)
    end
    i32.const 0
    local.set 2
  end
  i32.const -1
  local.get 2
  local.get 1
  select)

This PR improves the situation by updating the code generation for
float-to-int conversions in rustc, specifically only for WebAssembly
targets and only for some situations (float-to-u8 still has not great
codegen). The fix here is to use basic blocks and control flow to avoid
speculatively executing fptosi, and instead LLVM's raw intrinsic for
the WebAssembly instruction is used instead. This effectively extends
the support added in #74659 to checked casts. After this commit the
codegen for the above Rust function looks like:

(func $cast (type 0) (param f64) (result i32)
  (local i32)
  block  ;; label = @1
    local.get 0
    f64.const 0x0p+0 (;=0;)
    f64.ge
    local.tee 1
    i32.const 1
    i32.xor
    br_if 0 (;@1;)
    local.get 0
    f64.const 0x1.fffffffep+31 (;=4.29497e+09;)
    f64.le
    i32.eqz
    br_if 0 (;@1;)
    local.get 0
    i32.trunc_f64_u
    return
  end
  i32.const -1
  i32.const 0
  local.get 1
  select)

For reference, in Rust 1.44, which did not have saturating
float-to-integer casts, the codegen LLVM would emit is:

(func $cast (type 0) (param f64) (result i32)
  block  ;; label = @1
    local.get 0
    f64.const 0x1p+32 (;=4.29497e+09;)
    f64.lt
    local.get 0
    f64.const 0x0p+0 (;=0;)
    f64.ge
    i32.and
    i32.eqz
    br_if 0 (;@1;)
    local.get 0
    i32.trunc_f64_u
    return
  end
  i32.const 0)

So we're relatively close to the original codegen, although it's
slightly different because the semantics of the function changed where
we're emulating the i32.trunc_sat_f32_s instruction rather than always
replacing out-of-bounds values with zero.

There is still work that could be done to improve casts such as f32 to
u8. That form of cast still uses the fptosi instruction which
generates lots of branch-y code. This seems less important to tackle now
though. In the meantime this should take care of most use cases of
floating-point conversion and as a result I'm going to speculate that
this...

Closes #73591

@rust-highfive
Copy link
Collaborator

r? @davidtwco

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 23, 2020
Copy link
Member

@davidtwco davidtwco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but someone else should probably take a look too.

@davidtwco
Copy link
Member

r? @nagisa

@rust-highfive rust-highfive assigned nagisa and unassigned davidtwco Jul 24, 2020
@CryZe
Copy link
Contributor

CryZe commented Jul 24, 2020

Actually for the 8-bit and 16-bit cases, what I did in my prototype was to emit the cast to u32 / i32 and then downcast the int from there, that should be a reasonably easy change here too. Should we do that in a follow up PR or do you want to squeeze it into this PR? (I could look into it again as well)

//
// ;; ... previous basic block
// %load_result = alloca i32, align 4
// %less_or_nan = fcmp ult double %x, 0xC1E0000000000000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks more complicated than it needs to be. I’m aware that this is not very constructive in terms of feedback as it is… but I’m experimenting with some alternative ways to write this down as LLVM IR and hopefully will come back to update this before monday.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here’s something I have so far (caveat untested, unchecked and not validated):

  ; check that the number is within the expected bounds and just convert if the number is in range
  %inbound_lower = fcmp oge double %0, 0xC1E0000000000000
  %inbound_upper = fcmp ole double %0, 0x41DFFFFFFFC00000
  %inbounds = and i1 %inbound_lower, %inbound_upper
  br i1 %inbounds, label %convert, label %specialcase

specialcase:
  %isnan = fcmp ord double %0, 0.000000e+00 ; can probably also be `!inbound_lower && !inbound_upper`?
  ; select a max/min value for the case where float is not nan
  %result_notnan = select i1 %inbound_lower, i32 2147483647, i32 -2147483648
  ; nan or min/max
  %result_nan = select i1 %isnan, i32 0, i32 %result_notnan
  ret i32 %result_nan

convert:
  %result = fptosi double %0 to i32
  ret i32 %result

Copy link
Member

@nagisa nagisa Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I ended up with in the end

    ;; block so far... %0 is the argument
    %inbound_lower = fcmp oge double %0, 0xC1E0000000000000
    %inbound_upper = fcmp ole double %0, 0x41DFFFFFFFC00000
    ;; match (inbound_lower, inbound_upper) { 
    ;;     (true, true) => %0 can be converted without trapping
    ;;     (false, false) => %0 is a NaN
    ;;     (true, false) => %0 is too large
    ;;     (false, true) => %0 is too small
    ;; }
    ;;
    ;; The (true, true) check, go to %convert if so.
    %inbounds = and i1 %inbound_lower, %inbound_upper
    br i1 %inbounds, label %convert, label %specialcase

specialcase:
    ;; Handle the cases where the number is NaN, too large or too small
    
    ;; Either (true, false) or (false, true)
    %is_not_nan = or i1 %inbound_lower, %inbound_upper
    ;; Figure out which saturated value we are interested in if not `NaN`
    %saturated = select i1 %inbound_lower, i32 2147483647, i32 -2147483648
    ;; Figure out between saturated and NaN representations
    %result_nan = select i1 %is_not_nan, i32 %saturated, i32 0
    br label %done

convert:
    %result = call i32 @llvm.wasm.trunc.signed.i32.f64(double %0) ; fptosi double %0 to i32
    br label %done

done:
    %r = phi i32 [ %result, %convert ], [ %result_nan, %specialcase ]
    ;; continues here

Here’s a godbolt comparison of the two approaches: https://rust.godbolt.org/z/cnEGWM. Curiously the output in godbolt is very different from what was posted in the PR description, and I’m not sure why the difference (maybe i32 vs u32?).

Anyway, comparing just the godbolt output the code generated from the snippet above looks somewhat better to my wasm-untrained eye. This code is also something LLVM is not really able to optimize much further. So at the very least doing something along the lines of this snippet can be a win in terms of the compile time.

If you don’t want to attempt to integrate something like this, that's also fine with me. In that case let me know and I’ll r+ this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice! I'm happy to update this to use that codegen. Do you know if we can natively generate phi nodes though? I wanted to do that instead of using an alloca but the builder trait didn't seem to have a way to manually construct phi nodes.

@alexcrichton alexcrichton force-pushed the more-wasm-float-cast-fixes branch 2 times, most recently from 66e5418 to 4a8d24b Compare July 28, 2020 17:03
@alexcrichton
Copy link
Member Author

Ok I've updated this to codegen in line with @nagisa's suggestions, although it still uses an alloca to store the result rather than a phi because I can't figure out how to generate a phi node natively :(

This commit improves code generation for WebAssembly targets when
translating floating to integer casts. This improvement is only relevant
when the `nontrapping-fptoint` feature is not enabled, but the feature
is not enabled by default right now. Additionally this improvement only
affects safe casts since unchecked casts were improved in rust-lang#74659.

Some more background for this issue is present on rust-lang#73591, but the
general gist of the issue is that in LLVM the `fptosi` and `fptoui`
instructions are defined to return an `undef` value if they execute on
out-of-bounds values; they notably do not trap. To implement these
instructions for WebAssembly the LLVM backend must therefore generate
quite a few instructions before executing `i32.trunc_f32_s` (for
example) because this WebAssembly instruction traps on out-of-bounds
values. This codegen into wasm instructions happens very late in the
code generator, so what ends up happening is that rustc inserts its own
codegen to implement Rust's saturating semantics, and then LLVM also
inserts its own codegen to make sure that the `fptosi` instruction
doesn't trap. Overall this means that a function like this:

    #[no_mangle]
    pub unsafe extern "C" fn cast(x: f64) -> u32 {
        x as u32
    }

will generate this WebAssembly today:

    (func $cast (type 0) (param f64) (result i32)
      (local i32 i32)
      local.get 0
      f64.const 0x1.fffffffep+31 (;=4.29497e+09;)
      f64.gt
      local.set 1
      block  ;; label = @1
        block  ;; label = @2
          local.get 0
          f64.const 0x0p+0 (;=0;)
          local.get 0
          f64.const 0x0p+0 (;=0;)
          f64.gt
          select
          local.tee 0
          f64.const 0x1p+32 (;=4.29497e+09;)
          f64.lt
          local.get 0
          f64.const 0x0p+0 (;=0;)
          f64.ge
          i32.and
          i32.eqz
          br_if 0 (;@2;)
          local.get 0
          i32.trunc_f64_u
          local.set 2
          br 1 (;@1;)
        end
        i32.const 0
        local.set 2
      end
      i32.const -1
      local.get 2
      local.get 1
      select)

This PR improves the situation by updating the code generation for
float-to-int conversions in rustc, specifically only for WebAssembly
targets and only for some situations (float-to-u8 still has not great
codegen). The fix here is to use basic blocks and control flow to avoid
speculatively executing `fptosi`, and instead LLVM's raw intrinsic for
the WebAssembly instruction is used instead. This effectively extends
the support added in rust-lang#74659 to checked casts. After this commit the
codegen for the above Rust function looks like:

    (func $cast (type 0) (param f64) (result i32)
      (local i32)
      block  ;; label = @1
        local.get 0
        f64.const 0x0p+0 (;=0;)
        f64.ge
        local.tee 1
        i32.const 1
        i32.xor
        br_if 0 (;@1;)
        local.get 0
        f64.const 0x1.fffffffep+31 (;=4.29497e+09;)
        f64.le
        i32.eqz
        br_if 0 (;@1;)
        local.get 0
        i32.trunc_f64_u
        return
      end
      i32.const -1
      i32.const 0
      local.get 1
      select)

For reference, in Rust 1.44, which did not have saturating
float-to-integer casts, the codegen LLVM would emit is:

    (func $cast (type 0) (param f64) (result i32)
      block  ;; label = @1
        local.get 0
        f64.const 0x1p+32 (;=4.29497e+09;)
        f64.lt
        local.get 0
        f64.const 0x0p+0 (;=0;)
        f64.ge
        i32.and
        i32.eqz
        br_if 0 (;@1;)
        local.get 0
        i32.trunc_f64_u
        return
      end
      i32.const 0)

So we're relatively close to the original codegen, although it's
slightly different because the semantics of the function changed where
we're emulating the `i32.trunc_sat_f32_s` instruction rather than always
replacing out-of-bounds values with zero.

There is still work that could be done to improve casts such as `f32` to
`u8`. That form of cast still uses the `fptosi` instruction which
generates lots of branch-y code. This seems less important to tackle now
though. In the meantime this should take care of most use cases of
floating-point conversion and as a result I'm going to speculate that
this...

Closes rust-lang#73591
@alexcrichton
Copy link
Member Author

ping for a re-r? from @nagisa

@nagisa
Copy link
Member

nagisa commented Aug 3, 2020

Yeah LGTM. Its too bad we cannot easily generate phi – having many casts will ultimately end up generating a larger amount of IR for LLVM to optimize, slowing down compilation. But nevertheless the improvement seems worth it.

@bors r+

@bors
Copy link
Contributor

bors commented Aug 3, 2020

📌 Commit 2c1b046 has been approved by nagisa

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 3, 2020
@bors
Copy link
Contributor

bors commented Aug 3, 2020

💡 This pull request was already approved, no need to approve it again.

@bors
Copy link
Contributor

bors commented Aug 3, 2020

📌 Commit 2c1b046 has been approved by nagisa

@bors
Copy link
Contributor

bors commented Aug 3, 2020

⌛ Testing commit 2c1b046 with merge 1d601d6...

@bors
Copy link
Contributor

bors commented Aug 4, 2020

☀️ Test successful - checks-actions, checks-azure
Approved by: nagisa
Pushing 1d601d6 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Aug 4, 2020
@bors bors merged commit 1d601d6 into rust-lang:master Aug 4, 2020
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Aug 9, 2020
This commit fixes an issue with rust-lang#74695 where the fptosi and fptoui
specializations on wasm were accidentally used on vector types by the
`simd_cast` intrinsic. This issue showed up as broken CI for the stdsimd
crate. Here this commit simply skips the specialization on vector kinds
flowing into `fpto{s,u}i`.
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Aug 10, 2020
Don't try to use wasm intrinsics on vectors

This commit fixes an issue with rust-lang#74695 where the fptosi and fptoui
specializations on wasm were accidentally used on vector types by the
`simd_cast` intrinsic. This issue showed up as broken CI for the stdsimd
crate. Here this commit simply skips the specialization on vector kinds
flowing into `fpto{s,u}i`.
@cuviper cuviper added this to the 1.47.0 milestone May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Redundant checks in floating point to integer casts on WASM
7 participants