Skip to content

perf(time): split via StringView slicing instead of char-by-char StringBuilder append#245

Closed
mizchi wants to merge 2 commits into
moonbitlang:mainfrom
mizchi:pr-time-split-substring
Closed

perf(time): split via StringView slicing instead of char-by-char StringBuilder append#245
mizchi wants to merge 2 commits into
moonbitlang:mainfrom
mizchi:pr-time-split-substring

Conversation

@mizchi
Copy link
Copy Markdown
Contributor

@mizchi mizchi commented May 23, 2026

Summary

time/util.mbt's split(s, delimiter) was implemented as char-by-char StringBuilder::write_char + to_string + reset per segment. The file already had a FIXME: use split method of String comment acknowledging this.

// before — per-char work, per-segment new String + grow_if_necessary
let buf = StringBuilder::new(size_hint=0)
for i = 0; i < s.length(); i = i + 1 {
  let code_unit = s.code_unit_at(i)
  if code_unit == delimiter_code {
    spl.push(buf.to_string())
    buf.reset()
  } else {
    buf.write_char(UInt16::unsafe_to_char(code_unit))
  }
}
spl.push(buf.to_string())

Replaced with index-tracking + StringView slicing:

let mut start = 0
for i = 0; i < s.length(); i = i + 1 {
  if s.code_unit_at(i) == delimiter_code {
    spl.push(s[start:i].to_owned())
    start = i + 1
  }
}
spl.push(s[start:].to_owned())

One .to_owned() per segment — no per-character append, no StringBuilder, no per-segment grow_if_necessary.

Why this is hot

PlainDateTime::from_string calls split(str, 'T') for every parse — both segments then go to PlainDate::from_string / PlainTime::from_string. Duration::from_string also calls split(s, '.'). So every datetime / duration parse pays this cost.

Benchmark

Scenario: bench-x/cmd/plain_datetime_parse/main.mbtPlainDateTime::from_string("2024-05-23T14:37:12.123456789") × 200 000 iters. Native release, Linux x86_64, 3-run median wall time.

baseline patched delta
plain_datetime_parse 179 ms 132 ms -26.3%

Callgrind total instructions: 2.46 G → 1.93 G (-21.5%). StringBuilder::write_char (8.42%), grow_if_necessary (6.23%), and the per-char code_unit_at → unsafe_to_char → write_char chain fall away entirely.

Tests

moonbitlang/x/time    148 / 148 pass

The file's pre-existing FIXME: use split method of String comment is also removed.

mizchi and others added 2 commits May 23, 2026 19:25
…ngBuilder append

time/util.mbt's split(s, delimiter) was implemented as char-by-char
StringBuilder::write_char + to_string + reset per segment. The file
even had a 'FIXME: use split method of String' comment acknowledging
this.

Replace with index-tracking + StringView slicing: track the segment
start, scan code units, and on a delimiter push s[start:i].to_owned()
into the result. One to_owned per segment, no per-char append, no
per-segment grow_if_necessary.

PlainDateTime::from_string calls split(str, 'T') for every parse;
Duration::from_string calls split(s, '.'). So every datetime /
duration parse pays this cost.

plain_datetime_parse bench (native, 3-run median,
PlainDateTime::from_string('2024-05-23T14:37:12.123456789') x 200k):
  baseline: 179 ms
  patched : 132 ms  (-26.3%)
@peter-jerry-ye
Copy link
Copy Markdown
Collaborator

I assume the comment meant to use String::split directly...

@peter-jerry-ye
Copy link
Copy Markdown
Collaborator

I think the fix is correct, and there is indeed performance improvement. However, to do that better, it would be better to use the stdlib's split, because

  1. it reduces the maintaince cost
  2. the stdlib version returns Iter[StringView], reducing unnecessary reallocation even furthur.

So I will replace this PR with #246

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants