Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missed Optimization when Matching on Complete Unicode Character Ranges with an Unreachable Arm #123927

Open
JordanLloydHall opened this issue Apr 14, 2024 · 1 comment
Labels
A-codegen Area: Code generation C-bug Category: This is a bug. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@JordanLloydHall
Copy link

JordanLloydHall commented Apr 14, 2024

Hello, I've noticed a bug with rustc:

#[inline(never)]
pub fn with_unreachable(c: char) -> bool {
    match c {
        ('\0'..='\u{d7ff}') => true,
        ('\u{e000}'..='\u{10ffff}') => true,
        _ => false
    }
}

#[inline(never)]
pub fn without_unreachable(c: char) -> bool {
    match c {
        ('\0'..='\u{d7ff}') => true,
        ('\u{e000}'..='\u{10ffff}') => true,
    }
}

which outputs:

example::with_unreachable::hd01eb55f04576fdc:
  cmp edi, 55296
  setb cl
  add edi, -57344
  cmp edi, 1056768
  setb al
  or al, cl
  ret

example::without_unreachable::h31c5b4f665ccfd93:
  mov al, 1
  ret

Gives a compiler warning telling us that _ => false is unreachable, but it seems that the optimiser isn't privy to this? We'd expect without_unreachable and with_unreachable to give the same assembly output.

Meta

The above is built with rustc 1.77.0 and flags -C debuginfo=1 --emit asm -Cllvm-args=--x86-asm-syntax=intel --crate-type rlib --color=always --edition 2021 -C opt-level=3

A link to the godbolt snippet: https://godbolt.org/z/hf7jje83W

@JordanLloydHall JordanLloydHall added the C-bug Category: This is a bug. label Apr 14, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Apr 14, 2024
@JordanLloydHall JordanLloydHall changed the title Rust Compiler Misoptimization when Matching on Complete Unicode Character Ranges with an Unreachable Arm Rust Compiler Missed Optimization when Matching on Complete Unicode Character Ranges with an Unreachable Arm Apr 14, 2024
@workingjubilee workingjubilee changed the title Rust Compiler Missed Optimization when Matching on Complete Unicode Character Ranges with an Unreachable Arm Missed Optimization when Matching on Complete Unicode Character Ranges with an Unreachable Arm Apr 14, 2024
@asquared31415
Copy link
Contributor

I imagine this is because, while rustc knows that a char has that gap in the middle for the purposes of exhaustiveness checking, it only has a niche at the end of the valid range, so by the time this gets to LLVM, it just sees a 32 bit integer that must not be greater than 0x10FFFF, with no knowledge of the middle invalid range. This probably could be worked around, but I believe there are efforts to improve handling of niches to permit multiple and more complex niches, which would overlap with a fix to this problem.

@rustbot label -needs-triage +T-compiler +A-codegen +C-optimization +I-slow +I-heavy

@rustbot rustbot added A-codegen Area: Code generation C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Apr 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation C-bug Category: This is a bug. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

3 participants