Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsafe concurrent Hash access #514

Closed
eregon opened this issue Jul 18, 2023 · 9 comments · Fixed by #515
Closed

Unsafe concurrent Hash access #514

eregon opened this issue Jul 18, 2023 · 9 comments · Fixed by #515
Labels

Comments

@eregon
Copy link
Contributor

eregon commented Jul 18, 2023

It happens for SEQUENCE_UPCASED_PERCENT_ENCODING_TABLE and SEQUENCE_ENCODING_TABLE
See oracle/truffleruby#3166 (comment)

This happens on TruffleRuby and possibly on JRuby.
On CRuby the error will not happen due to the GIL.
On CRuby it may still cause duplicate computation and assignment for the same key.

A good solution would be to use a Concurrent::Map from concurrent-ruby instead of raw Hash there.
And the same for SEQUENCE_ENCODING_TABLE just above.
(as a note, this is the correct pattern: ruby-concurrency/concurrent-ruby#970)
Then it's fast on CRuby and is thread-safe and scales on no-GVL Rubies.

Of course this would add a dependency on concurrent-ruby for this gem.

@eregon eregon changed the title Unsafe concurrent access to SEQUENCE_UPCASED_PERCENT_ENCODING_TABLE and SEQUENCE_ENCODING_TABLE Unsafe concurrent Hash access Jul 18, 2023
@eregon
Copy link
Contributor Author

eregon commented Jul 18, 2023

An alternative would be to use a Mutex around all accesses to these global Hash'es.
But that would be slower on CRuby due to the extra Mutex on read, and not scale on no-GVL Rubies.
So performance-wise it's strictly worse than concurrent-ruby for all Rubies.

@dentarg
Copy link
Collaborator

dentarg commented Jul 18, 2023

It happens for SEQUENCE_UPCASED_PERCENT_ENCODING_TABLE and SEQUENCE_ENCODING_TABLE

Because I looked it up I'll note that those were introduced with 880b34c

@dentarg
Copy link
Collaborator

dentarg commented Jul 18, 2023

A good solution would be to use a Concurrent::Map from concurrent-ruby instead of raw Hash there

Would that help even before ruby-concurrency/concurrent-ruby#970 is addressed?

@eregon
Copy link
Contributor Author

eregon commented Jul 18, 2023

Yes, the only thing I meant by that link is one should use that compute_if_absent pattern to avoid assigning a given key twice.

@dentarg
Copy link
Collaborator

dentarg commented Jul 18, 2023

Right, so the way to address this (with concurrent-ruby) is using Concurrent::Map and compute_if_absent like this?

@eregon
Copy link
Contributor Author

eregon commented Jul 19, 2023

Yes

@eregon
Copy link
Contributor Author

eregon commented Jul 22, 2023

More comments by @byroot at oracle/truffleruby#3166 (comment)

@eregon
Copy link
Contributor Author

eregon commented Jul 28, 2023

@byroot Any chance you could take a look at this? :)

It'd be good to figure out what's the desired solution for addressable.

@byroot
Copy link
Contributor

byroot commented Jul 29, 2023

I was quite busy lately. I'll try to find some time to dig into this next week.

casperisfine pushed a commit to casperisfine/addressable that referenced this issue Jul 31, 2023
Fix: sporkmonger#514
Fix: oracle/truffleruby#3166

These hashes lazily memoize percent encoded characters, this is an
issue on GVL-less Ruby implementations as it can cause concurrent
access.

But these are actually quite wastful as the key is always a single byte
so rather than use string keys are lazily memoize these, we can precompute
two static arrays of 255 elements once and for all.
casperisfine pushed a commit to casperisfine/addressable that referenced this issue Jul 31, 2023
Fix: sporkmonger#514
Fix: oracle/truffleruby#3166

These hashes lazily memoize percent encoded characters, this is an
issue on GVL-less Ruby implementations as it can cause concurrent
access.

But these are actually quite wastful as the key is always a single byte
so rather than use string keys are lazily memoize these, we can precompute
two static arrays of 255 elements once and for all.
casperisfine pushed a commit to casperisfine/addressable that referenced this issue Jul 31, 2023
Fix: sporkmonger#514
Fix: oracle/truffleruby#3166

These hashes lazily memoize percent encoded characters, this is an
issue on GVL-less Ruby implementations as it can cause concurrent
access.

But these are actually quite wastful as the key is always a single byte
so rather than use string keys are lazily memoize these, we can precompute
two static arrays of 255 elements once and for all.
casperisfine pushed a commit to casperisfine/addressable that referenced this issue Jul 31, 2023
Fix: sporkmonger#514
Fix: oracle/truffleruby#3166

These hashes lazily memoize percent encoded characters, this is an
issue on GVL-less Ruby implementations as it can cause concurrent
access.

But these are actually quite wastful as the key is always a single byte
so rather than use string keys are lazily memoize these, we can precompute
two static arrays of 255 elements once and for all.
dentarg pushed a commit that referenced this issue Jul 31, 2023
Fix: #514
Fix: oracle/truffleruby#3166

These hashes lazily memoize percent encoded characters, this is an
issue on GVL-less Ruby implementations as it can cause concurrent
access.

But these are actually quite wasteful as the key is always a single byte
so rather than use string keys are lazily memoize these, we can precompute
two static arrays of 255 elements once and for all.

Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants