perf(time): split via StringView slicing instead of char-by-char StringBuilder append#245
Closed
mizchi wants to merge 2 commits into
Closed
perf(time): split via StringView slicing instead of char-by-char StringBuilder append#245mizchi wants to merge 2 commits into
mizchi wants to merge 2 commits into
Conversation
…ngBuilder append
time/util.mbt's split(s, delimiter) was implemented as char-by-char
StringBuilder::write_char + to_string + reset per segment. The file
even had a 'FIXME: use split method of String' comment acknowledging
this.
Replace with index-tracking + StringView slicing: track the segment
start, scan code units, and on a delimiter push s[start:i].to_owned()
into the result. One to_owned per segment, no per-char append, no
per-segment grow_if_necessary.
PlainDateTime::from_string calls split(str, 'T') for every parse;
Duration::from_string calls split(s, '.'). So every datetime /
duration parse pays this cost.
plain_datetime_parse bench (native, 3-run median,
PlainDateTime::from_string('2024-05-23T14:37:12.123456789') x 200k):
baseline: 179 ms
patched : 132 ms (-26.3%)
Collaborator
|
I assume the comment meant to use |
Collaborator
|
I think the fix is correct, and there is indeed performance improvement. However, to do that better, it would be better to use the stdlib's
So I will replace this PR with #246 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
time/util.mbt'ssplit(s, delimiter)was implemented as char-by-charStringBuilder::write_char+to_string+resetper segment. The file already had aFIXME: use split method of Stringcomment acknowledging this.Replaced with index-tracking + StringView slicing:
One
.to_owned()per segment — no per-character append, no StringBuilder, no per-segmentgrow_if_necessary.Why this is hot
PlainDateTime::from_stringcallssplit(str, 'T')for every parse — both segments then go toPlainDate::from_string/PlainTime::from_string.Duration::from_stringalso callssplit(s, '.'). So every datetime / duration parse pays this cost.Benchmark
Scenario:
bench-x/cmd/plain_datetime_parse/main.mbt—PlainDateTime::from_string("2024-05-23T14:37:12.123456789")× 200 000 iters. Native release, Linux x86_64, 3-run median wall time.Callgrind total instructions: 2.46 G → 1.93 G (-21.5%).
StringBuilder::write_char(8.42%),grow_if_necessary(6.23%), and the per-charcode_unit_at → unsafe_to_char → write_charchain fall away entirely.Tests
The file's pre-existing
FIXME: use split method of Stringcomment is also removed.