Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bounds checking should be eliminated in obvious case #30112

Closed
Geal opened this issue Nov 30, 2015 · 16 comments
Closed

Bounds checking should be eliminated in obvious case #30112

Geal opened this issue Nov 30, 2015 · 16 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Geal
Copy link
Contributor

Geal commented Nov 30, 2015

Hi,

Following the benchmarks comparing the nom and chomp parser combinator libraries, I investigated the performance difference and found something interesting: in code that is nearly equivalent, rustc generates a lot more stuff with nom than chomp, and keeps bounds check where the parsers make sure no error should happen.

For the comparison, here are some code examples, in nom:

named!(message_header_value, chain!(
        take_while1!(is_horizontal_space) ~
  data: take_while1!(not_line_ending)     ~
  line_ending,
  || data));

in chomp:

fn message_header_line(i: Input<u8>) -> U8Result<&[u8]> {
    parse!{i;
                   take_while1(is_horizontal_space);
        let line = take_till(is_end_of_line);
                   end_of_line();

        ret line
    }   
}

(we could change take_while1!(not_line_ending) to a take_till!(is_end_of_line) and it would achieve the same code).

Once the macros are processed, it gives the following code in nom: https://gist.github.com/Geal/fa3740cf45530d123023

chomp uses the same approach, with iterators, in its version of take_while1 and take_till`: https://github.com/m4rw3r/chomp/blob/master/src/parsers.rs#L208-L253

Now, the interesting thing is the assembly generated by rustc ( 1.5.0-dev (ea2dabf 2015-10-21), but the version from yesterday has the same issues). Here is the nom version: http://dev.unhandledexpression.com/nom_http.pdf
And the chomp version: http://dev.unhandledexpression.com/chomp_http.pdf

We can see that nom's code is a lot more complex:

  • large blocks of code calling nom's Err destructor (it is expected, but I'd like to improve that as well)
  • 4 bounds checks are still present, while they do not appear in chomp

I would like to know if there is a way to improve code generation. If the issue is in rustc, I can provide as many test cases as you need. If it is in nom, I'm open to any ideas ;)

@Geal
Copy link
Contributor Author

Geal commented Nov 30, 2015

@bluss provided a reduced test case: http://is.gd/0vOZs0

@Gankra
Copy link
Contributor

Gankra commented Nov 30, 2015

I'm pretty sure this is an LLVM/backend issue. LLVM's bounds-check removal analysis is really brittle. It's possible there's better flags we could be passing it to try to eliminate bounds checks. I seem to recall @huonw mentioning one once?

@bluss
Copy link
Member

bluss commented Nov 30, 2015

This kind of thing (playground) elides the bounds check for input[i] but not for input[..i], which is mysterious.

pub fn split_at(input: &[u8]) -> &[u8] {
    for i in 0..input.len() {
        if input[i] == b' ' {
            return &input[..i]
        }
    }
    &[]
}

@dotdash
Copy link
Contributor

dotdash commented Nov 30, 2015

LLVM removes the first bounds check, because that is the same as the loop condition, i.e. i < input.len(). But for the second bounds is i <= input.len(), and LLVM only handles propagation in such cases if the value is compared to a constant, which is not the case here.

@bluss
Copy link
Member

bluss commented Nov 30, 2015

@dotdash That's unfortunate. Do you know if that is a reason for the original example too?

@Keruspe
Copy link
Contributor

Keruspe commented Nov 30, 2015

What could be the reason for llvm to keep bound checks when using nom but not when using chomp?

@dotdash
Copy link
Contributor

dotdash commented Dec 1, 2015

@Geal could you provide the IR and the asm code for that function? I know the asm is in those pdfs, but I have a hard time reading that. You can get them with --emit llvm-ir,asm which should produce a .ll and a .s file. Thanks!

@Geal
Copy link
Contributor Author

Geal commented Dec 1, 2015

@dotdash here: https://gist.github.com/Geal/e7ebe638862828420863

It might be slightly different, since I built with a more recent rustc (1.6.0-dev (4867df4 2015-11-29) ), but the performance is the same.

@huonw
Copy link
Member

huonw commented Dec 1, 2015

I'm pretty sure this is an LLVM/backend issue. LLVM's bounds-check removal analysis is really brittle. It's possible there's better flags we could be passing it to try to eliminate bounds checks. I seem to recall @huonw mentioning one once?

IRCE: #22987

http://llvm.org/docs/Frontend/PerformanceTips.html also says:

For languages with numerous rarely executed guard conditions (e.g. null checks, type checks, range checks) consider adding an extra execution or two of LoopUnswith and LICM to your pass order. The standard pass order, which is tuned for C and C++ applications, may not be sufficient to remove all dischargeable checks from loops.

@dotdash
Copy link
Contributor

dotdash commented Dec 10, 2015

@Geal the remaining checks are the ones that check that start is less than or equal to end for the range. Due to the the ranges including the end, that makes the actual bounds check go away. What I wonder is why there is a check that compares against 2 here: https://gist.github.com/Geal/e7ebe638862828420863#file-nom_http-ll-L216

I can't seem to figure out to which part of the code that belongs. Any idea?

@steveklabnik steveklabnik added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Jun 6, 2016
@bluss
Copy link
Member

bluss commented Sep 24, 2016

Triage: rustc 1.13.0-nightly (4f9812a 2016-09-21)

#![crate_type="lib"]

pub fn split_to(input: &[u8]) -> &[u8] {
    match input.iter().position(|x| *x == b' ') {
        Some(i) => &input[..i],
        None => &[],
    }
}

pub fn split_from(input: &[u8]) -> &[u8] {
    match input.iter().position(|x| *x == b' ') {
        Some(i) => &input[i..],
        None => &[],
    }
}

pub fn find(input: &[u8]) -> Option<&u8> {
    match input.iter().position(|x| *x == b' ') {
        Some(i) => Some(&input[i]),
        None => None,
    }
}

@bluss
Copy link
Member

bluss commented Sep 24, 2016

The optimizer likes the indices best at this point. No bounds checks here (beta & nightly)

pub fn split_to(input: &[u8]) -> &[u8] {
    match (0..input.len()).find(|&i| input[i] == b' ') {
        Some(i) => &input[..i],
        None => &[],
    }
}

Optimization is simple: The range ensures i < input.len().

@Mark-Simulacrum Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 24, 2017
@steveklabnik
Copy link
Member

Triage: I'm not great with asm, but https://godbolt.org/z/wj6kYl

It appears that split_to is partially unrolled in today's nightly, but still checks the bounds. split_to2, like was said above, compiles to very straightforward code with no checks.

@nikic nikic added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Mar 12, 2019
@jonas-schievink jonas-schievink added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 22, 2020
@EgorBo
Copy link

EgorBo commented Aug 26, 2021

https://godbolt.org/z/Wb46fYsPK

If I understand it correctly - after I altered the default order of LLVM passes and inserted IRCE there (I also added additional LICM and unswitch if I remember correctly - I just followed Pass Ordering tips.) it managed to "clone" the loop and have a fast loop where it doesn't poll for bounds checks 😏

image

@saethlin
Copy link
Member

Looks to me like all the bounds checks in the original examples are gone now: https://godbolt.org/z/fcexGna7s

@workingjubilee workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023
@workingjubilee
Copy link
Contributor

Thanks for checking! I think this can be closed then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests