Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazily cache UTF-8 character length on RString #4531

Closed
wants to merge 1 commit into from
Closed

Lazily cache UTF-8 character length on RString #4531

wants to merge 1 commit into from

Conversation

lopopolo
Copy link
Contributor

@lopopolo lopopolo commented Jun 24, 2019

Store the UTF-8 length as computed by utf8_strlen on RString when MRB_UTF8_STRING is defined. This significantly (3200x) speeds up String operations at the cost of one mrb_int of memory per RString.

The UTF-8 char_len is computed on first call to utf8_strlen and is cached until mrb_str_modify is called, when it is zeroed out. utf8_strlen gets a new fastpath for when the character length is cached and retains the fast path for ASCII strings which is used as a fallback.

Benchmark

/usr/bin/time -l ./bin/mruby -e 'def str(len); ("💪" * len); end; s = str(100000); puts s.length; 1000000.times { s.length }'

This PR @ ca692e6

Completes in 0.30 seconds.

$ /usr/bin/time -l ./bin/mruby -e 'def str(len); ("💪" * len); end; s = str(100000); puts s.length; 1000000.times { s.length }'
100000
        0.30 real         0.30 user         0.00 sys
   2306048  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
       572  page reclaims
         0  page faults
         0  swaps
         0  block input operations
         0  block output operations
         0  messages sent
         0  messages received
         0  signals received
         0  voluntary context switches
        48  involuntary context switches

master

Completes in 963 seconds. The caching implementation is 3210x faster than master.

$ /usr/bin/time -l ./bin/mruby -e 'def str(len); ("💪" * len); end; s = str(100000); puts s.length; 1000000.times { s.length }'
100000
      963.41 real       961.82 user         0.74 sys
   2252800  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
       559  page reclaims
         0  page faults
         0  swaps
         0  block input operations
         0  block output operations
         0  messages sent
         0  messages received
         0  signals received
         0  voluntary context switches
    139399  involuntary context switches

@lopopolo
Copy link
Contributor Author

Related to GH-4522 and GH-4529

cc @dearblue

@shuujii
Copy link
Contributor

shuujii commented Jun 24, 2019

I think this patch will not be accepted as it is because the size of struct RString will increase (see: #3297).

However, it may be possible because the size does not increase in a limited environment such as 64-bit system + MRB_INT32 + boxing.

@shuujii
Copy link
Contributor

shuujii commented Jun 24, 2019

64-bit system + MRB_INT32 + boxing

Sorry, boxing has nothing to do with it.

@lopopolo
Copy link
Contributor Author

The tests do not currently pass when invoked like this:

CFLAGS="-Wall -DMRB_UTF8_STRING" rake test

str_replace does not work as expected. mruby-io seems to make assumptions about the layout of the RString struct. I think I bit off more than I can chew. Abandoning this PR.

@lopopolo lopopolo closed this Jun 25, 2019
@lopopolo lopopolo deleted the utf8-strlen-cache-char-length branch December 17, 2021 04:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants