Counting occurrences of a given byte or UTF-8 characters in a slice of memory – fast
Switch branches/tags
Nothing to show
Clone or download
Latest commit 2bc5cba Nov 24, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
benches [Experimental] Convert benchmarks to criterion (#38) Apr 11, 2018
src fix double-semicolon Nov 24, 2018
tests num_chars Dec 21, 2017
.gitignore now with 100% more code Sep 27, 2016
.travis.yml Fancy new algorithm on stable SIMD Oct 23, 2018
Cargo.toml exclude CI files Oct 28, 2018
LICENSE.Apache2 now with 100% more code Sep 27, 2016
LICENSE.MIT Update LICENSE.MIT file Jun 23, 2017
README.md Fix typos. Oct 29, 2018
appveyor.yml Fancy new algorithm on stable SIMD Oct 23, 2018

README.md

bytecount

Counting bytes really fast

Build Status Windows build status Current Version License: Apache 2.0/MIT

This uses the "hyperscreamingcount" algorithm by Joshua Landau to count bytes faster than anything else. The newlinebench repository has further benchmarks for old versions of this repository.

To use bytecount in your crate, if you have cargo-edit, just type cargo add bytecount in a terminal with the crate root as the current path. Otherwise you can manually edit your Cargo.toml to add bytecount = 0.4.0 to your [dependencies] section.

In your crate root (lib.rs or main.rs, depending on if you are writing a library or application), add extern crate bytecount;. Now you can simply use bytecount::count as follows:

extern crate bytecount;

fn main() {
    let mytext = "some potentially large text, perhaps read from disk?";
    let spaces = bytecount::count(mytext.as_bytes(), b' ');
    ..
}

bytecount supports two features to make use of modern CPU's features to speed up counting considerably. To allow your users to use them, add the following to your Cargo.toml:

[features]
runtime-dispatch-simd = ["bytecount/runtime-dispatch-simd"]
generic-simd = ["bytecount/generic-simd"]

The first, runtime-dispatch-simd, enables detection of SIMD capabilities at runtime, which allows using the SSE2 and AVX2 codepaths, but cannot be used with no_std.

Your users can then compile with runtime dispatch using:

cargo build --release --features runtime-dispatch-simd

The second, generic-simd, uses packed_simd to provide a fast architecture-agnostic SIMD codepath, but requires running on nightly.

Your users can compile with this codepath using:

cargo build --release --features generic-simd

Building for a more specific architecture will also improve performance. You can do this with

RUSTFLAGS="-C target-cpu=native" cargo build --release

The scalar algorithm is explained in depth here.

License

Licensed under either of at your discretion: