Optimize File.join common use case by byroot · Pull Request #15898 · ruby/ruby

byroot · 2026-01-18T11:33:56Z

File.join is a hotspot for common libraries such as Zeitwerk and Bootsnap. It has a fairly flexible signature, but 99% of the time it's called with just two (or a small number of) UTF-8 strings.

If we optimistically optimize for that use case we can cut down a large number of type and encoding checks, significantly speeding up the method.

The one remaining expensive check we could try to optimize is str_null_check. Given it's common to use the same base string for joining, we could memoize it. Also we could precompute it for literal strings.

compare-ruby: ruby 4.1.0dev (2026-01-17T14:40:03Z master 00a3b71eaf) +PRISM [arm64-darwin25]
built-ruby: ruby 4.1.0dev (2026-01-18T12:10:38Z spedup-file-join 069bab58d4) +PRISM [arm64-darwin25]
warming up....

|              |compare-ruby|built-ruby|
|:-------------|-----------:|---------:|
|two_strings   |      2.475M|    9.444M|
|              |           -|     3.82x|
|many_strings  |    551.975k|    2.346M|
|              |           -|     4.25x|
|array         |    514.946k|  522.034k|
|              |           -|     1.01x|
|mixed         |    621.236k|  633.189k|
|              |           -|     1.02x|

Edit: Found some more optimizations

chompdirsep searches from the start of the string each time, which perhaps is necessary for certain encodings (not even sure?) but for the common encodings it's very wasteful. Instead we can start from the back of the string and only compare one or two characters in most cases.

Also replace StringValueCStr for the simpler rb_str_null_check as we only care about whether the string contains NULL bytes, we don't care whether it is NULL terminated or not.

We also only check the final string for NULLs.

compare-ruby: ruby 4.1.0dev (2026-01-17T14:40:03Z master 00a3b71eaf) +PRISM [arm64-darwin25]
built-ruby: ruby 4.1.0dev (2026-01-18T12:55:15Z spedup-file-join 5948e92e03) +PRISM [arm64-darwin25]
warming up....

|              |compare-ruby|built-ruby|
|:-------------|-----------:|---------:|
|two_strings   |      2.477M|   19.317M|
|              |           -|     7.80x|
|many_strings  |    547.577k|   10.298M|
|              |           -|    18.81x|
|array         |    515.280k|  523.291k|
|              |           -|     1.02x|
|mixed         |    621.840k|  635.422k|
|              |           -|     1.02x|

`File.join` is a hotspot for common libraries such as Zeitwerk and Bootsnap. It has a fairly flexible signature, but 99% of the time it's called with just two (or a small number of) UTF-8 strings. If we optimistically optimize for that use case we can cut down a large number of type and encoding checks, significantly speeding up the method. The one remaining expensive check we could try to optimize is `str_null_check`. Given it's common to use the same base string for joining, we could memoize it. Also we could precompute it for literal strings. ``` compare-ruby: ruby 4.1.0dev (2026-01-17T14:40:03Z master 00a3b71) +PRISM [arm64-darwin25] built-ruby: ruby 4.1.0dev (2026-01-18T12:10:38Z spedup-file-join 069bab5) +PRISM [arm64-darwin25] warming up.... | |compare-ruby|built-ruby| |:-------------|-----------:|---------:| |two_strings | 2.475M| 9.444M| | | -| 3.82x| |many_strings | 551.975k| 2.346M| | | -| 4.25x| |array | 514.946k| 522.034k| | | -| 1.01x| |mixed | 621.236k| 633.189k| | | -| 1.02x| ```

`chompdirsep` searches from the start of the string each time, which perhaps is necessary for certain encodings (not even sure?) but for the common encodings it's very wasteful. Instead we can start from the back of the string and only compare one or two characters in most cases. Also replace `StringValueCStr` for the simpler `rb_str_null_check` as we only care about whether the string contains `NULL` bytes, we don't care whether it is NULL terminated or not. We also only check the final string for NULLs. ``` compare-ruby: ruby 4.1.0dev (2026-01-17T14:40:03Z master 00a3b71) +PRISM [arm64-darwin25] built-ruby: ruby 4.1.0dev (2026-01-18T12:55:15Z spedup-file-join 5948e92e03) +PRISM [arm64-darwin25] warming up.... | |compare-ruby|built-ruby| |:-------------|-----------:|---------:| |two_strings | 2.477M| 19.317M| | | -| 7.80x| |many_strings | 547.577k| 10.298M| | | -| 18.81x| |array | 515.280k| 523.291k| | | -| 1.02x| |mixed | 621.840k| 635.422k| | | -| 1.02x| ```

eregon · 2026-01-20T13:28:29Z

+static inline bool
+rb_str_encindex_fastpath(int encindex)
+{
+    // The overwhelming majority of strings are in one of these 3 encodings.


It'd be nice to document the properties of these encodings. I think the main one you're using here is that it's safe to search a 7-bit ASCII character with memchr(), i.e. that multibyte characters can never contain a 7-bit ASCII character/a byte without the upper bit set.
And of course that they are ASCII-compatible.

This comment has been minimized.

Sign in to view

byroot force-pushed the spedup-file-join branch 3 times, most recently from f8b74b4 to 069bab5 Compare January 18, 2026 12:10

byroot force-pushed the spedup-file-join branch 2 times, most recently from f2a5e04 to cb26ec5 Compare January 18, 2026 13:15

byroot force-pushed the spedup-file-join branch from cb26ec5 to 6ec72e3 Compare January 18, 2026 13:51

byroot merged commit 7e0e998 into ruby:master Jan 18, 2026
90 checks passed

byroot deleted the spedup-file-join branch January 18, 2026 15:32

byroot mentioned this pull request Jan 20, 2026

File.dirname: add a spec for Shift JIS handling ruby/spec#1330

Merged

eregon reviewed Jan 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize File.join common use case#15898

Optimize File.join common use case#15898
byroot merged 2 commits intoruby:masterfrom
byroot:spedup-file-join

byroot commented Jan 18, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

Uh oh!

eregon Jan 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

byroot commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

Uh oh!

eregon Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

byroot commented Jan 18, 2026 •

edited

Loading

eregon Jan 20, 2026 •

edited

Loading