Speed up String::from_utf16 #55530

ljedrz · 2018-10-31T09:31:27Z

Collecting into a Result is idiomatic, but not necessarily fast due to rustc not being able to preallocate for the resulting collection. This is fine in case of an error, but IMO we should optimize for the common case, i.e. a successful conversion.

This changes the behavior of String::from_utf16 from collecting into a Result to pushing to a preallocated String in a loop.

According to my simple benchmark this change makes String::from_utf16 around twice as fast.

rust-highfive · 2018-10-31T09:31:31Z

r? @aidanhs

(rust_highfive has picked a reviewer for you, use r? to override)

src/liballoc/string.rs

ljedrz · 2018-10-31T12:07:56Z

I created more benchmarks to check which initial capacity provides the best results in different scenarios. The results aren't too surprising:

test bench_short_old        ... bench:         953 ns/iter (+/- 364)
test bench_short_new_len    ... bench:         428 ns/iter (+/- 226)
test bench_short_new_len15  ... bench:         420 ns/iter (+/- 226)
test bench_short_new_len2   ... bench:         340 ns/iter (+/- 91)
test bench_short_new_len3   ... bench:         296 ns/iter (+/- 20)

test bench_medium_old       ... bench:       1,153 ns/iter (+/- 407)
test bench_medium_new_len   ... bench:         558 ns/iter (+/- 145)
test bench_medium_new_len15 ... bench:         528 ns/iter (+/- 195)
test bench_medium_new_len2  ... bench:         479 ns/iter (+/- 78)
test bench_medium_new_len3  ... bench:         444 ns/iter (+/- 46)

test bench_long_old         ... bench:       1,862 ns/iter (+/- 180)
test bench_long_new_len     ... bench:         860 ns/iter (+/- 467)
test bench_long_new_len15   ... bench:         840 ns/iter (+/- 50)
test bench_long_new_len2    ... bench:         774 ns/iter (+/- 292)
test bench_long_new_len3    ... bench:         661 ns/iter (+/- 26)

The good news is that any choice is at least twice as fast as the current solution, so it boils down to how often we can expect to see non-ASCII characters and code units above 0x0800 or how much extra capacity we can provide without being wasteful.

The conservative approach would be to just go with .len() - this incurs zero wasted capacity while still being twice as fast as the current code.

TimNN · 2018-11-13T18:31:09Z

Ping from triage @aidanhs / @rust-lang/libs: This PR requires your review.

SimonSapin · 2018-11-13T19:37:57Z

@bors r+

Thanks!

bors · 2018-11-13T19:37:58Z

📌 Commit 19aa101 has been approved by SimonSapin

…r=SimonSapin Speed up String::from_utf16 Collecting into a `Result` is idiomatic, but not necessarily fast due to rustc not being able to preallocate for the resulting collection. This is fine in case of an error, but IMO we should optimize for the common case, i.e. a successful conversion. This changes the behavior of `String::from_utf16` from collecting into a `Result` to pushing to a preallocated `String` in a loop. According to [my simple benchmark](https://gist.github.com/ljedrz/953a3fb74058806519bd4d640d6f65ae) this change makes `String::from_utf16` around **twice** as fast.

RalfJung · 2018-11-14T12:40:02Z

src/liballoc/string.rs

+        let mut ret = String::with_capacity(v.len());
+        for c in decode_utf16(v.iter().cloned()) {
+            if let Ok(c) = c {
+                ret.push(c);


Would be nice to have a comment explaining why the code works this way instead of the more "obvious" collect call.

Basically, what you wrote in the PR should be in the code.

Good idea; since it is already rolled up, I can add this comment afterwards, along with some other assorted code adjustments.

Rollup of 16 pull requests Successful merges: - #54906 (Reattach all grandchildren when constructing specialization graph.) - #55182 (Redox: Update to new changes) - #55211 (Add BufWriter::buffer method) - #55507 (Add link to std::mem::size_of to size_of intrinsic documentation) - #55530 (Speed up String::from_utf16) - #55556 (Use `Mmap` to open the rmeta file.) - #55622 (NetBSD: link libstd with librt in addition to libpthread) - #55827 (A few tweaks to iterations/collecting) - #55901 (fix various typos in doc comments) - #55926 (Change sidebar selector to fix compatibility with docs.rs) - #55930 (A handful of hir tweaks) - #55932 (core/char: Speed up `to_digit()` for `radix <= 10`) - #55935 (appveyor: Use VS2017 for all our images) - #55936 (save-analysis: be even more aggressive about ignorning macro-generated defs) - #55948 (submodules: update clippy from d8b4269 to 7e0ddef) - #55956 (add tests for some fixed ICEs)

…r=SimonSapin Speed up String::from_utf16 Collecting into a `Result` is idiomatic, but not necessarily fast due to rustc not being able to preallocate for the resulting collection. This is fine in case of an error, but IMO we should optimize for the common case, i.e. a successful conversion. This changes the behavior of `String::from_utf16` from collecting into a `Result` to pushing to a preallocated `String` in a loop. According to [my simple benchmark](https://gist.github.com/ljedrz/953a3fb74058806519bd4d640d6f65ae) this change makes `String::from_utf16` around **twice** as fast.

@ghost

Rollup of 17 pull requests Successful merges: - #55182 (Redox: Update to new changes) - #55211 (Add BufWriter::buffer method) - #55507 (Add link to std::mem::size_of to size_of intrinsic documentation) - #55530 (Speed up String::from_utf16) - #55556 (Use `Mmap` to open the rmeta file.) - #55622 (NetBSD: link libstd with librt in addition to libpthread) - #55750 (Make `NodeId` and `HirLocalId` `newtype_index`) - #55778 (Wrap some query results in `Lrc`.) - #55781 (More precise spans for temps and their drops) - #55785 (Add mem::forget_unsized() for forgetting unsized values) - #55852 (Rewrite `...` as `..=` as a `MachineApplicable` 2018 idiom lint) - #55865 (Unix RwLock: avoid racy access to write_locked) - #55901 (fix various typos in doc comments) - #55926 (Change sidebar selector to fix compatibility with docs.rs) - #55930 (A handful of hir tweaks) - #55932 (core/char: Speed up `to_digit()` for `radix <= 10`) - #55956 (add tests for some fixed ICEs) Failed merges: r? @ghost

arthurprs · 2018-11-21T12:49:04Z

@ljedrz Is this a size_hint limitation when collecting into Result<,>? Otherwise I see no reason why the slice iterator couldn't propagate the hint to String::from_iter.

ljedrz · 2018-11-21T12:51:47Z

@arthurprs Yes; the implementation of FromIterator for Result sets the lower bound to zero. The same happens with Option::from_iter. Related: #52910.

Speed up String::from_utf16

19aa101

rust-highfive assigned aidanhs Oct 31, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 31, 2018

kennytm reviewed Oct 31, 2018

View reviewed changes

src/liballoc/string.rs Show resolved Hide resolved

ljedrz mentioned this pull request Nov 4, 2018

Preallocate the vector containing predicates in decode_predicates #55534

Closed

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 13, 2018

kennytm mentioned this pull request Nov 14, 2018

Rollup of 16 pull requests #55943

Closed

RalfJung reviewed Nov 14, 2018

View reviewed changes

pietroalbini mentioned this pull request Nov 15, 2018

Rollup of 17 pull requests #55974

Merged

bors merged commit 19aa101 into rust-lang:master Nov 15, 2018

ljedrz deleted the speed_up_String_from_utf16 branch November 15, 2018 15:43

SimonSapin mentioned this pull request Nov 19, 2018

Use Mmap to open the rmeta file. #55556

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up String::from_utf16 #55530

Speed up String::from_utf16 #55530

Uh oh!

ljedrz commented Oct 31, 2018 •

edited

Loading

Uh oh!

rust-highfive commented Oct 31, 2018

Uh oh!

Uh oh!

ljedrz commented Oct 31, 2018 •

edited

Loading

Uh oh!

TimNN commented Nov 13, 2018

Uh oh!

SimonSapin commented Nov 13, 2018

Uh oh!

bors commented Nov 13, 2018

Uh oh!

RalfJung Nov 14, 2018

Uh oh!

ljedrz Nov 14, 2018

Uh oh!

arthurprs commented Nov 21, 2018

Uh oh!

ljedrz commented Nov 21, 2018 •

edited

Loading

Uh oh!

Uh oh!

Speed up String::from_utf16 #55530

Speed up String::from_utf16 #55530

Uh oh!

Conversation

ljedrz commented Oct 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-highfive commented Oct 31, 2018

Uh oh!

Uh oh!

ljedrz commented Oct 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TimNN commented Nov 13, 2018

Uh oh!

SimonSapin commented Nov 13, 2018

Uh oh!

bors commented Nov 13, 2018

Uh oh!

RalfJung Nov 14, 2018

Choose a reason for hiding this comment

Uh oh!

ljedrz Nov 14, 2018

Choose a reason for hiding this comment

Uh oh!

arthurprs commented Nov 21, 2018

Uh oh!

ljedrz commented Nov 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ljedrz commented Oct 31, 2018 •

edited

Loading

ljedrz commented Oct 31, 2018 •

edited

Loading

ljedrz commented Nov 21, 2018 •

edited

Loading