Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement bcmp #315

Merged
merged 1 commit into from Sep 30, 2019

Conversation

@iankronquist
Copy link
Contributor

commented Sep 29, 2019

As of LLVM 9.0, certain calls to memcmp may be converted to bcmp, which I guess
could save a single subtraction on some architectures.

bcmp is just like memcmp except instead of returning the difference between the
two differing bytes, it returns non-zero instead. As such, memcmp is a valid
implementation of bcmp.

If we care about size, bcmp should just call memcmp.
If we care about "speed", we can change bcmp to save a single sub like this instead:

pub unsafe extern "C" fn bcmp(s1: *const u8, s2: *const u8, n: usize) -> i32 {
    let mut i = 0;
    while i < n {
        let a = *s1.offset(i as isize);
        let b = *s2.offset(i as isize);
        if a != b {
            return 1;
        }
        i += 1;
    }
    0
}

I'm unsure if the difference in icache size outweighs the single sub in programs which otherwise use memcmp, but I'll trust the LLVM devs to know what they're doing.

In this PR I do not address any changes which may or may not be needed for arm
aebi as I lack proper test hardware.

As of LLVM 9.0, certain calls to memcmp may be converted to bcmp, which I guess
could save a single subtraction on some architectures. [1]

bcmp is just like memcmp except instead of returning the difference between the
two differing bytes, it returns non-zero instead. As such, memcmp is a valid
implementation of bcmp.

If we care about size, bcmp should just call memcmp.
If we care about speed, we can change bcmp to look like this instead:
```rust
pub unsafe extern "C" fn bcmp(s1: *const u8, s2: *const u8, n: usize) -> i32 {
    let mut i = 0;
    while i < n {
        let a = *s1.offset(i as isize);
        let b = *s2.offset(i as isize);
        if a != b {
            return 1;
        }
        i += 1;
    }
    0
}
```

In this PR I do not address any changes which may or may not be needed for arm
aebi as I lack proper test hardware.

[1]: https://releases.llvm.org/9.0.0/docs/ReleaseNotes.html#noteworthy-optimizations
@alexcrichton

This comment has been minimized.

Copy link
Member

commented Sep 30, 2019

This looks reasonable to me, thanks!

@alexcrichton alexcrichton merged commit 462b73c into rust-lang:master Sep 30, 2019
@programmerjake

This comment has been minimized.

Copy link

commented Oct 2, 2019

it's not as much about saving a single subtraction as allowing faster vectorized code, since memcmp requires figuring out which byte is the one it should base the return value on, but bcmp just needs to know if any of the bytes differ, which is much easier when the bytes are being processed more than one at a time. As an example, using AVX2 on x86 allows it to compare 32 bytes at a time (since the operations are 256 bits wide). A more extreme example is using the proposed V extension for RISC-V where it could compare hundreds of bytes at a time.

@nikic

This comment has been minimized.

Copy link

commented Oct 3, 2019

In what context did you encounter the need for bcmp? LLVM only emits bcmp if it is known to exist, such as on gnu targets. If you got an undefined symbol error, I would expect that it's due to an error in the target specification.

@CryZe

This comment has been minimized.

Copy link
Contributor

commented Oct 3, 2019

I've encountered it when using no_std + a Linux based target. Not sure what OP's use case is though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.