Optimize File.join common use case#15898
Merged
byroot merged 2 commits intoruby:masterfrom Jan 18, 2026
Merged
Conversation
This comment has been minimized.
This comment has been minimized.
f8b74b4 to
069bab5
Compare
`File.join` is a hotspot for common libraries such as Zeitwerk and Bootsnap. It has a fairly flexible signature, but 99% of the time it's called with just two (or a small number of) UTF-8 strings. If we optimistically optimize for that use case we can cut down a large number of type and encoding checks, significantly speeding up the method. The one remaining expensive check we could try to optimize is `str_null_check`. Given it's common to use the same base string for joining, we could memoize it. Also we could precompute it for literal strings. ``` compare-ruby: ruby 4.1.0dev (2026-01-17T14:40:03Z master 00a3b71) +PRISM [arm64-darwin25] built-ruby: ruby 4.1.0dev (2026-01-18T12:10:38Z spedup-file-join 069bab5) +PRISM [arm64-darwin25] warming up.... | |compare-ruby|built-ruby| |:-------------|-----------:|---------:| |two_strings | 2.475M| 9.444M| | | -| 3.82x| |many_strings | 551.975k| 2.346M| | | -| 4.25x| |array | 514.946k| 522.034k| | | -| 1.01x| |mixed | 621.236k| 633.189k| | | -| 1.02x| ```
f2a5e04 to
cb26ec5
Compare
`chompdirsep` searches from the start of the string each time, which perhaps is necessary for certain encodings (not even sure?) but for the common encodings it's very wasteful. Instead we can start from the back of the string and only compare one or two characters in most cases. Also replace `StringValueCStr` for the simpler `rb_str_null_check` as we only care about whether the string contains `NULL` bytes, we don't care whether it is NULL terminated or not. We also only check the final string for NULLs. ``` compare-ruby: ruby 4.1.0dev (2026-01-17T14:40:03Z master 00a3b71) +PRISM [arm64-darwin25] built-ruby: ruby 4.1.0dev (2026-01-18T12:55:15Z spedup-file-join 5948e92e03) +PRISM [arm64-darwin25] warming up.... | |compare-ruby|built-ruby| |:-------------|-----------:|---------:| |two_strings | 2.477M| 19.317M| | | -| 7.80x| |many_strings | 547.577k| 10.298M| | | -| 18.81x| |array | 515.280k| 523.291k| | | -| 1.02x| |mixed | 621.840k| 635.422k| | | -| 1.02x| ```
cb26ec5 to
6ec72e3
Compare
eregon
reviewed
Jan 20, 2026
Comment on lines
+33
to
+36
| static inline bool | ||
| rb_str_encindex_fastpath(int encindex) | ||
| { | ||
| // The overwhelming majority of strings are in one of these 3 encodings. |
Member
There was a problem hiding this comment.
It'd be nice to document the properties of these encodings. I think the main one you're using here is that it's safe to search a 7-bit ASCII character with memchr(), i.e. that multibyte characters can never contain a 7-bit ASCII character/a byte without the upper bit set.
And of course that they are ASCII-compatible.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
File.joinis a hotspot for common libraries such as Zeitwerk and Bootsnap. It has a fairly flexible signature, but 99% of the time it's called with just two (or a small number of) UTF-8 strings.If we optimistically optimize for that use case we can cut down a large number of type and encoding checks, significantly speeding up the method.
The one remaining expensive check we could try to optimize is
str_null_check. Given it's common to use the same base string for joining, we could memoize it. Also we could precompute it for literal strings.Edit: Found some more optimizations
chompdirsepsearches from the start of the string each time, which perhaps is necessary for certain encodings (not even sure?) but for the common encodings it's very wasteful. Instead we can start from the back of the string and only compare one or two characters in most cases.Also replace
StringValueCStrfor the simplerrb_str_null_checkas we only care about whether the string containsNULLbytes, we don't care whether it is NULL terminated or not.We also only check the final string for NULLs.