New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve .chars().count() #37888

Merged
merged 1 commit into from Nov 21, 2016

Conversation

Projects
None yet
6 participants
@bluss
Contributor

bluss commented Nov 19, 2016

Use a simpler loop to count the char of a string: count the
number of non-continuation bytes. Use count += <conditional> which the
compiler understands well and can apply loop optimizations to.

benchmark descriptions and results for two configurations:

  • ascii: ascii text
  • cy: cyrillic text
  • jp: japanese text
  • words ascii: counting each split_whitespace item from the ascii text
  • words jp: counting each split_whitespace item from the jp text
x86-64 rustc -Copt-level=3
 name               orig_ ns/iter      cmov_ ns/iter      diff ns/iter   diff % 
 count_ascii        1,453 (1755 MB/s)  1,398 (1824 MB/s)           -55   -3.79% 
 count_cy           5,990 (856 MB/s)   2,545 (2016 MB/s)        -3,445  -57.51% 
 count_jp           3,075 (1169 MB/s)  1,772 (2029 MB/s)        -1,303  -42.37% 
 count_words_ascii  4,157 (521 MB/s)   1,797 (1205 MB/s)        -2,360  -56.77% 
 count_words_jp     3,337 (1071 MB/s)  1,772 (2018 MB/s)        -1,565  -46.90%

x86-64 rustc -Ctarget-feature=+avx -Copt-level=3
 name               orig_ ns/iter      cmov_ ns/iter      diff ns/iter   diff % 
 count_ascii        1,444 (1766 MB/s)  763 (3343 MB/s)            -681  -47.16% 
 count_cy           5,871 (874 MB/s)   1,527 (3360 MB/s)        -4,344  -73.99% 
 count_jp           2,874 (1251 MB/s)  1,073 (3351 MB/s)        -1,801  -62.67% 
 count_words_ascii  4,131 (524 MB/s)   1,871 (1157 MB/s)        -2,260  -54.71% 
 count_words_jp     3,253 (1099 MB/s)  1,331 (2686 MB/s)        -1,922  -59.08%

I briefly explored a more involved blocked algorithm (looking at 8 or more bytes at a time),
but the code in this PR was always winning count_words_ascii in particular (counting
many small strings); this solution is an improvement without tradeoffs.

str: Improve .chars().count()
Use a simpler loop to count the `char` of a string: count the
number of non-continuation bytes. Use `count += <conditional>` which the
compiler understands well and can apply loop optimizations to.
@rust-highfive

This comment has been minimized.

Show comment
Hide comment
@rust-highfive

rust-highfive Nov 19, 2016

Collaborator

r? @brson

(rust_highfive has picked a reviewer for you, use r? to override)

Collaborator

rust-highfive commented Nov 19, 2016

r? @brson

(rust_highfive has picked a reviewer for you, use r? to override)

@bluss

This comment has been minimized.

Show comment
Hide comment
Contributor

bluss commented Nov 19, 2016

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Nov 20, 2016

Member

@bors: r+

Nice wins!

Member

alexcrichton commented Nov 20, 2016

@bors: r+

Nice wins!

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Nov 20, 2016

Contributor

📌 Commit 5a3aa2f has been approved by alexcrichton

Contributor

bors commented Nov 20, 2016

📌 Commit 5a3aa2f has been approved by alexcrichton

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Nov 20, 2016

Contributor

⌛️ Testing commit 5a3aa2f with merge fc2373c...

Contributor

bors commented Nov 20, 2016

⌛️ Testing commit 5a3aa2f with merge fc2373c...

bors added a commit that referenced this pull request Nov 20, 2016

Auto merge of #37888 - bluss:chars-count, r=alexcrichton
Improve .chars().count()

Use a simpler loop to count the `char` of a string: count the
number of non-continuation bytes. Use `count += <conditional>` which the
compiler understands well and can apply loop optimizations to.

benchmark descriptions and results for two configurations:

- ascii: ascii text
- cy: cyrillic text
- jp: japanese text
- words ascii: counting each split_whitespace item from the ascii text
- words jp: counting each split_whitespace item from the jp text

```
x86-64 rustc -Copt-level=3
 name               orig_ ns/iter      cmov_ ns/iter      diff ns/iter   diff %
 count_ascii        1,453 (1755 MB/s)  1,398 (1824 MB/s)           -55   -3.79%
 count_cy           5,990 (856 MB/s)   2,545 (2016 MB/s)        -3,445  -57.51%
 count_jp           3,075 (1169 MB/s)  1,772 (2029 MB/s)        -1,303  -42.37%
 count_words_ascii  4,157 (521 MB/s)   1,797 (1205 MB/s)        -2,360  -56.77%
 count_words_jp     3,337 (1071 MB/s)  1,772 (2018 MB/s)        -1,565  -46.90%

x86-64 rustc -Ctarget-feature=+avx -Copt-level=3
 name               orig_ ns/iter      cmov_ ns/iter      diff ns/iter   diff %
 count_ascii        1,444 (1766 MB/s)  763 (3343 MB/s)            -681  -47.16%
 count_cy           5,871 (874 MB/s)   1,527 (3360 MB/s)        -4,344  -73.99%
 count_jp           2,874 (1251 MB/s)  1,073 (3351 MB/s)        -1,801  -62.67%
 count_words_ascii  4,131 (524 MB/s)   1,871 (1157 MB/s)        -2,260  -54.71%
 count_words_jp     3,253 (1099 MB/s)  1,331 (2686 MB/s)        -1,922  -59.08%
```

I briefly explored a more involved blocked algorithm (looking at 8 or more bytes at a time),
but the code in this PR was always winning `count_words_ascii` in particular (counting
many small strings); this solution is an improvement without tradeoffs.

@bors bors merged commit 5a3aa2f into rust-lang:master Nov 21, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

@bluss bluss deleted the bluss:chars-count branch Nov 21, 2016

@brson brson added the relnotes label Nov 22, 2016

@llogiq

This comment has been minimized.

Show comment
Hide comment
@llogiq

llogiq Nov 28, 2016

Contributor

I'm curious – bytecount is much faster than anything else at counting bytes, and should be adaptable to this situation (count bytes lower than 128) without perf loss.

Contributor

llogiq commented Nov 28, 2016

I'm curious – bytecount is much faster than anything else at counting bytes, and should be adaptable to this situation (count bytes lower than 128) without perf loss.

@bluss

This comment has been minimized.

Show comment
Hide comment
@bluss

bluss Nov 28, 2016

Contributor

Go ahead and experiment. My comment was

I briefly explored a more involved blocked algorithm (looking at 8 or more bytes at a time),
but the code in this PR was always winning count_words_ascii in particular (counting
many small strings); this solution is an improvement without tradeoffs.

I'm leaving the door open to such improvements, but I suggest looking out for the small-input case as well.

Contributor

bluss commented Nov 28, 2016

Go ahead and experiment. My comment was

I briefly explored a more involved blocked algorithm (looking at 8 or more bytes at a time),
but the code in this PR was always winning count_words_ascii in particular (counting
many small strings); this solution is an improvement without tradeoffs.

I'm leaving the door open to such improvements, but I suggest looking out for the small-input case as well.

@bluss

This comment has been minimized.

Show comment
Hide comment
@bluss

bluss Nov 28, 2016

Contributor

Oh by the way @llogiq did you see this comment? I wanted to tell you, due to possible appication in bytecount, that it can be beneficial (it was to me) to use this kind of raw pointer solution instead of computing separate slice parts up front. (Edit: Oh I now see why you couldn't possibly see that comment).

Contributor

bluss commented Nov 28, 2016

Oh by the way @llogiq did you see this comment? I wanted to tell you, due to possible appication in bytecount, that it can be beneficial (it was to me) to use this kind of raw pointer solution instead of computing separate slice parts up front. (Edit: Oh I now see why you couldn't possibly see that comment).

@bluss

This comment has been minimized.

Show comment
Hide comment
@bluss

bluss Nov 29, 2016

Contributor

By the way, it's not counting just bytes lower than 128, but any (non-)continuation byte.

Contributor

bluss commented Nov 29, 2016

By the way, it's not counting just bytes lower than 128, but any (non-)continuation byte.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment