Improve .chars().count() #37888

Merged
merged 1 commit into from Nov 21, 2016

Projects

None yet

6 participants

@bluss
Contributor
bluss commented Nov 19, 2016 edited

Use a simpler loop to count the char of a string: count the
number of non-continuation bytes. Use count += <conditional> which the
compiler understands well and can apply loop optimizations to.

benchmark descriptions and results for two configurations:

  • ascii: ascii text
  • cy: cyrillic text
  • jp: japanese text
  • words ascii: counting each split_whitespace item from the ascii text
  • words jp: counting each split_whitespace item from the jp text
x86-64 rustc -Copt-level=3
 name               orig_ ns/iter      cmov_ ns/iter      diff ns/iter   diff % 
 count_ascii        1,453 (1755 MB/s)  1,398 (1824 MB/s)           -55   -3.79% 
 count_cy           5,990 (856 MB/s)   2,545 (2016 MB/s)        -3,445  -57.51% 
 count_jp           3,075 (1169 MB/s)  1,772 (2029 MB/s)        -1,303  -42.37% 
 count_words_ascii  4,157 (521 MB/s)   1,797 (1205 MB/s)        -2,360  -56.77% 
 count_words_jp     3,337 (1071 MB/s)  1,772 (2018 MB/s)        -1,565  -46.90%

x86-64 rustc -Ctarget-feature=+avx -Copt-level=3
 name               orig_ ns/iter      cmov_ ns/iter      diff ns/iter   diff % 
 count_ascii        1,444 (1766 MB/s)  763 (3343 MB/s)            -681  -47.16% 
 count_cy           5,871 (874 MB/s)   1,527 (3360 MB/s)        -4,344  -73.99% 
 count_jp           2,874 (1251 MB/s)  1,073 (3351 MB/s)        -1,801  -62.67% 
 count_words_ascii  4,131 (524 MB/s)   1,871 (1157 MB/s)        -2,260  -54.71% 
 count_words_jp     3,253 (1099 MB/s)  1,331 (2686 MB/s)        -1,922  -59.08%

I briefly explored a more involved blocked algorithm (looking at 8 or more bytes at a time),
but the code in this PR was always winning count_words_ascii in particular (counting
many small strings); this solution is an improvement without tradeoffs.

@bluss bluss str: Improve .chars().count()
Use a simpler loop to count the `char` of a string: count the
number of non-continuation bytes. Use `count += <conditional>` which the
compiler understands well and can apply loop optimizations to.
5a3aa2f
@brson brson was assigned by rust-highfive Nov 19, 2016
@rust-highfive
Collaborator

r? @brson

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton
Member

@bors: r+

Nice wins!

@bors
Contributor
bors commented Nov 20, 2016

๐Ÿ“Œ Commit 5a3aa2f has been approved by alexcrichton

@bors
Contributor
bors commented Nov 20, 2016

โŒ›๏ธ Testing commit 5a3aa2f with merge fc2373c...

@bors bors added a commit that referenced this pull request Nov 20, 2016
@bors bors Auto merge of #37888 - bluss:chars-count, r=alexcrichton
Improve .chars().count()

Use a simpler loop to count the `char` of a string: count the
number of non-continuation bytes. Use `count += <conditional>` which the
compiler understands well and can apply loop optimizations to.

benchmark descriptions and results for two configurations:

- ascii: ascii text
- cy: cyrillic text
- jp: japanese text
- words ascii: counting each split_whitespace item from the ascii text
- words jp: counting each split_whitespace item from the jp text

```
x86-64 rustc -Copt-level=3
 name               orig_ ns/iter      cmov_ ns/iter      diff ns/iter   diff %
 count_ascii        1,453 (1755 MB/s)  1,398 (1824 MB/s)           -55   -3.79%
 count_cy           5,990 (856 MB/s)   2,545 (2016 MB/s)        -3,445  -57.51%
 count_jp           3,075 (1169 MB/s)  1,772 (2029 MB/s)        -1,303  -42.37%
 count_words_ascii  4,157 (521 MB/s)   1,797 (1205 MB/s)        -2,360  -56.77%
 count_words_jp     3,337 (1071 MB/s)  1,772 (2018 MB/s)        -1,565  -46.90%

x86-64 rustc -Ctarget-feature=+avx -Copt-level=3
 name               orig_ ns/iter      cmov_ ns/iter      diff ns/iter   diff %
 count_ascii        1,444 (1766 MB/s)  763 (3343 MB/s)            -681  -47.16%
 count_cy           5,871 (874 MB/s)   1,527 (3360 MB/s)        -4,344  -73.99%
 count_jp           2,874 (1251 MB/s)  1,073 (3351 MB/s)        -1,801  -62.67%
 count_words_ascii  4,131 (524 MB/s)   1,871 (1157 MB/s)        -2,260  -54.71%
 count_words_jp     3,253 (1099 MB/s)  1,331 (2686 MB/s)        -1,922  -59.08%
```

I briefly explored a more involved blocked algorithm (looking at 8 or more bytes at a time),
but the code in this PR was always winning `count_words_ascii` in particular (counting
many small strings); this solution is an improvement without tradeoffs.
fc2373c
@bors bors merged commit 5a3aa2f into rust-lang:master Nov 21, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details
@bluss bluss deleted the bluss:chars-count branch Nov 21, 2016
@brson brson added the relnotes label Nov 22, 2016
@llogiq
Contributor
llogiq commented Nov 28, 2016

I'm curious โ€“ bytecount is much faster than anything else at counting bytes, and should be adaptable to this situation (count bytes lower than 128) without perf loss.

@bluss
Contributor
bluss commented Nov 28, 2016 edited

Go ahead and experiment. My comment was

I briefly explored a more involved blocked algorithm (looking at 8 or more bytes at a time),
but the code in this PR was always winning count_words_ascii in particular (counting
many small strings); this solution is an improvement without tradeoffs.

I'm leaving the door open to such improvements, but I suggest looking out for the small-input case as well.

@bluss
Contributor
bluss commented Nov 28, 2016 edited

Oh by the way @llogiq did you see this comment? I wanted to tell you, due to possible appication in bytecount, that it can be beneficial (it was to me) to use this kind of raw pointer solution instead of computing separate slice parts up front. (Edit: Oh I now see why you couldn't possibly see that comment).

@bluss
Contributor
bluss commented Nov 29, 2016 edited

By the way, it's not counting just bytes lower than 128, but any (non-)continuation byte.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment