Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the object cache to better account for race conditions #13204

Closed
wants to merge 12 commits into from

Conversation

fowles
Copy link
Member

@fowles fowles commented Jul 4, 2023

Supersedes #13075

byroot and others added 3 commits July 4, 2023 10:09
Superseeds: protocolbuffers#13054

The object cache is fundamentally subject to race conditions.
Objects must be created before they are registered into the cache,
so if two threads try to create the same object, we'll inevitably
end up with two instances mapping to the same underlying memory.

To entirely prevent that we'd need a lot of extra locking which
I don't think is really worth it compared to a few useless allocations.

Instead we can replace `ObjectCache_Add` by a `getset` type of operation,
the extra instance is still created, but the later threads will receive
the "canonical" instance and will be able to abandon their duplicated
instance.

Additionally, this PR moves the ObjectCache implementation in Ruby,
as it's much easier to debug there, and the performance difference
is negligible. The `ObjectCache` instance is also exposed as
`Google::Protobuf::OBJECT_CACHE` to better allow to debug
potential memory issues.
@fowles fowles marked this pull request as ready for review July 4, 2023 19:23
@fowles fowles requested a review from a team as a code owner July 4, 2023 19:23
@fowles fowles requested review from haberman and removed request for a team July 4, 2023 19:23
@fowles fowles added ruby 🅰️ safe for tests Mark a commit as safe to run presubmits over labels Jul 4, 2023
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Jul 4, 2023
@fowles fowles added the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Jul 4, 2023
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Jul 4, 2023
@fowles fowles requested review from mkruskal-google and removed request for haberman July 5, 2023 16:21
@fowles fowles added the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Jul 5, 2023
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Jul 5, 2023
ruby/lib/google/protobuf/object_cache.rb Show resolved Hide resolved
ruby/lib/google/protobuf/object_cache.rb Outdated Show resolved Hide resolved
ruby/ext/google/protobuf_c/protobuf.c Show resolved Hide resolved
ruby/ext/google/protobuf_c/message.c Show resolved Hide resolved
ruby/ext/google/protobuf_c/repeated_field.c Outdated Show resolved Hide resolved
@JasonLunn
Copy link
Contributor

Should unit tests be added to verify any of the following:

  1. Selection of the cache implementation on the current platform
  2. Standalone behavior of both ObjectCache implementations outside of the side effects / asserts elsewhere in the runtime
  3. Verification of the synchronization / safety properties of both caches
  4. Verification of the garbage collection / memory consumption properties of the legacy cache

@fowles fowles added the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Jul 5, 2023
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Jul 5, 2023
@fowles fowles requested a review from a team as a code owner July 7, 2023 22:35
@fowles fowles added the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Jul 7, 2023
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Jul 8, 2023
@fowles fowles dismissed JasonLunn’s stale review July 8, 2023 02:14

responded to comments already

@fowles fowles added platform related Any issue releated to specific platform or OS 🅰️ safe for tests Mark a commit as safe to run presubmits over labels Jul 8, 2023
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Jul 8, 2023
@fowles fowles added the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Jul 8, 2023
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Jul 8, 2023
@fowles fowles added the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Jul 8, 2023
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Jul 8, 2023
@copybara-service copybara-service bot closed this in a01d047 Jul 8, 2023
stanhu added a commit to stanhu/protobuf that referenced this pull request Aug 9, 2023
protocolbuffers#13204 refactored the
Ruby object cache to use a key of `LL2NUM(key_val)` instead of
`LL2NUM(key_val >> 2)`. On 32-bit systems, it appears that
`LL2NUM(key_val)` returns inconsistent results, possibly due to
overflow. This causes cache lookups to fail. This commit restores the
previous behavior of using `ObjectCache_GetKey`, which discards the
lower 2 bits, which are zero.

Closes protocolbuffers#13481
stanhu added a commit to stanhu/protobuf that referenced this pull request Aug 9, 2023
protocolbuffers#13204 refactored the
Ruby object cache to use a key of `LL2NUM(key_val)` instead of
`LL2NUM(key_val >> 2)`. On 32-bit systems, LL2NUM(key_val) returns
inconsistent results because a large value has to be stored as a
Bignum on the heap. This causes cache lookups to fail.

This commit restores the previous behavior of using
`ObjectCache_GetKey`, which discards the lower 2 bits, which are zero.
This enables a key to be stored as a Fixnum on both 32 and 64-bit
platforms.

Closes protocolbuffers#13481
stanhu added a commit to stanhu/protobuf that referenced this pull request Aug 11, 2023
protocolbuffers#13204 refactored the
Ruby object cache to use a key of `LL2NUM(key_val)` instead of
`LL2NUM(key_val >> 2)`. On 32-bit systems, `LL2NUM(key_val)` returns
inconsistent results because a large value has to be stored as a
Bignum on the heap. This causes cache lookups to fail.

This commit restores the previous behavior of using
`ObjectCache_GetKey`, which discards the lower 2 bits, which are
zero. This enables a key to be stored as a Fixnum on both 32 and
64-bit platforms.

As https://patshaughnessy.net/2014/1/9/how-big-is-a-bignum describes,
a Fixnum uses:

* 1 bit for the `FIXNUM_FLAG`.
* 1 bit for the sign bit.

Therefore the largest possible Fixnum value on a 64-bit value is
4611686018427387903 (2^62 - 1). On a 32-bit system, the largest value
is 1073741823 (2^30 - 1).

For example, a possible VALUE pointer address on a 32-bit system:

0xff5b4af8 => 4284173048

Dropping the lower 2 bits makes up for the loss of range to these
flags. In the example above, we see that shifting by 2 turns the value
into a 30-bit number, which can be represented as a Fixnum:

(0xff5b4af8 >> 2) => 1071043262

This bug can also manifest on a 64-bit system if the upper bits are 0xff.

Closes protocolbuffers#13481
copybara-service bot pushed a commit that referenced this pull request Aug 15, 2023
#13204 refactored the Ruby object cache to use a key of `LL2NUM(key_val)` instead of `LL2NUM(key_val >> 2)`. On 32-bit systems, `LL2NUM(key_val)` returns inconsistent results because a large value has to be stored as a Bignum on the heap. This causes cache lookups to fail.

This commit restores the previous behavior of using `ObjectCache_GetKey`, which discards the lower 2 bits, which are zero. This enables a key to be stored as a Fixnum on both 32 and 64-bit platforms.

As https://patshaughnessy.net/2014/1/9/how-big-is-a-bignum describes, a Fixnum uses:

* 1 bit for the `FIXNUM_FLAG`.
* 1 bit for the sign flag.

Therefore the largest possible Fixnum value on a 64-bit value is 4611686018427387903 (2^62 - 1). On a 32-bit system, the largest value  is 1073741823 (2^30 - 1).

For example, a possible VALUE pointer address on a 32-bit system:

0xff5b4af8 => 4284173048

Dropping the lower 2 bits makes up for the loss of range to these flags. In the example above, we see that shifting by 2 turns the value into a 30-bit number, which can be represented as a Fixnum:

(0xff5b4af8 >> 2) => 1071043262

This bug can also manifest on a 64-bit system if the upper bits are 0xff.

Closes #13481

Closes #13494

COPYBARA_INTEGRATE_REVIEW=#13494 from stanhu:sh-fix-ruby-protobuf-32bit d63122a
FUTURE_COPYBARA_INTEGRATE_REVIEW=#13494 from stanhu:sh-fix-ruby-protobuf-32bit d63122a
PiperOrigin-RevId: 557189479
copybara-service bot pushed a commit that referenced this pull request Aug 15, 2023
#13204 refactored the Ruby object cache to use a key of `LL2NUM(key_val)` instead of `LL2NUM(key_val >> 2)`. On 32-bit systems, `LL2NUM(key_val)` returns inconsistent results because a large value has to be stored as a Bignum on the heap. This causes cache lookups to fail.

This commit restores the previous behavior of using `ObjectCache_GetKey`, which discards the lower 2 bits, which are zero. This enables a key to be stored as a Fixnum on both 32 and 64-bit platforms.

As https://patshaughnessy.net/2014/1/9/how-big-is-a-bignum describes, a Fixnum uses:

* 1 bit for the `FIXNUM_FLAG`.
* 1 bit for the sign flag.

Therefore the largest possible Fixnum value on a 64-bit value is 4611686018427387903 (2^62 - 1). On a 32-bit system, the largest value  is 1073741823 (2^30 - 1).

For example, a possible VALUE pointer address on a 32-bit system:

0xff5b4af8 => 4284173048

Dropping the lower 2 bits makes up for the loss of range to these flags. In the example above, we see that shifting by 2 turns the value into a 30-bit number, which can be represented as a Fixnum:

(0xff5b4af8 >> 2) => 1071043262

This bug can also manifest on a 64-bit system if the upper bits are 0xff.

Closes #13481

Closes #13494

COPYBARA_INTEGRATE_REVIEW=#13494 from stanhu:sh-fix-ruby-protobuf-32bit d63122a
FUTURE_COPYBARA_INTEGRATE_REVIEW=#13494 from stanhu:sh-fix-ruby-protobuf-32bit d63122a
PiperOrigin-RevId: 557189479
copybara-service bot pushed a commit that referenced this pull request Aug 15, 2023
#13204 refactored the Ruby object cache to use a key of `LL2NUM(key_val)` instead of `LL2NUM(key_val >> 2)`. On 32-bit systems, `LL2NUM(key_val)` returns inconsistent results because a large value has to be stored as a Bignum on the heap. This causes cache lookups to fail.

This commit restores the previous behavior of using `ObjectCache_GetKey`, which discards the lower 2 bits, which are zero. This enables a key to be stored as a Fixnum on both 32 and 64-bit platforms.

As https://patshaughnessy.net/2014/1/9/how-big-is-a-bignum describes, a Fixnum uses:

* 1 bit for the `FIXNUM_FLAG`.
* 1 bit for the sign flag.

Therefore the largest possible Fixnum value on a 64-bit value is 4611686018427387903 (2^62 - 1). On a 32-bit system, the largest value  is 1073741823 (2^30 - 1).

For example, a possible VALUE pointer address on a 32-bit system:

0xff5b4af8 => 4284173048

Dropping the lower 2 bits makes up for the loss of range to these flags. In the example above, we see that shifting by 2 turns the value into a 30-bit number, which can be represented as a Fixnum:

(0xff5b4af8 >> 2) => 1071043262

This bug can also manifest on a 64-bit system if the upper bits are 0xff.

Closes #13481

Closes #13494

COPYBARA_INTEGRATE_REVIEW=#13494 from stanhu:sh-fix-ruby-protobuf-32bit d63122a
FUTURE_COPYBARA_INTEGRATE_REVIEW=#13494 from stanhu:sh-fix-ruby-protobuf-32bit d63122a
PiperOrigin-RevId: 557216800
copybara-service bot pushed a commit that referenced this pull request Aug 15, 2023
#13204 refactored the Ruby object cache to use a key of `LL2NUM(key_val)` instead of `LL2NUM(key_val >> 2)`. On 32-bit systems, `LL2NUM(key_val)` returns inconsistent results because a large value has to be stored as a Bignum on the heap. This causes cache lookups to fail.

This commit restores the previous behavior of using `ObjectCache_GetKey`, which discards the lower 2 bits, which are zero. This enables a key to be stored as a Fixnum on both 32 and 64-bit platforms.

As https://patshaughnessy.net/2014/1/9/how-big-is-a-bignum describes, a Fixnum uses:

* 1 bit for the `FIXNUM_FLAG`.
* 1 bit for the sign flag.

Therefore the largest possible Fixnum value on a 64-bit value is 4611686018427387903 (2^62 - 1). On a 32-bit system, the largest value  is 1073741823 (2^30 - 1).

For example, a possible VALUE pointer address on a 32-bit system:

0xff5b4af8 => 4284173048

Dropping the lower 2 bits makes up for the loss of range to these flags. In the example above, we see that shifting by 2 turns the value into a 30-bit number, which can be represented as a Fixnum:

(0xff5b4af8 >> 2) => 1071043262

This bug can also manifest on a 64-bit system if the upper bits are 0xff.

Closes #13481

Closes #13494

COPYBARA_INTEGRATE_REVIEW=#13494 from stanhu:sh-fix-ruby-protobuf-32bit d63122a
PiperOrigin-RevId: 557211768
zhangskz pushed a commit that referenced this pull request Aug 17, 2023
#13204 refactored the Ruby object cache to use a key of `LL2NUM(key_val)` instead of `LL2NUM(key_val >> 2)`. On 32-bit systems, `LL2NUM(key_val)` returns inconsistent results because a large value has to be stored as a Bignum on the heap. This causes cache lookups to fail.

This commit restores the previous behavior of using `ObjectCache_GetKey`, which discards the lower 2 bits, which are zero. This enables a key to be stored as a Fixnum on both 32 and 64-bit platforms.

As https://patshaughnessy.net/2014/1/9/how-big-is-a-bignum describes, a Fixnum uses:

* 1 bit for the `FIXNUM_FLAG`.
* 1 bit for the sign flag.

Therefore the largest possible Fixnum value on a 64-bit value is 4611686018427387903 (2^62 - 1). On a 32-bit system, the largest value  is 1073741823 (2^30 - 1).

For example, a possible VALUE pointer address on a 32-bit system:

0xff5b4af8 => 4284173048

Dropping the lower 2 bits makes up for the loss of range to these flags. In the example above, we see that shifting by 2 turns the value into a 30-bit number, which can be represented as a Fixnum:

(0xff5b4af8 >> 2) => 1071043262

This bug can also manifest on a 64-bit system if the upper bits are 0xff.

Closes #13481

Closes #13494

COPYBARA_INTEGRATE_REVIEW=#13494 from stanhu:sh-fix-ruby-protobuf-32bit d63122a
PiperOrigin-RevId: 557211768
esrauchg pushed a commit that referenced this pull request Aug 17, 2023
#13204 refactored the Ruby object cache to use a key of `LL2NUM(key_val)` instead of `LL2NUM(key_val >> 2)`. On 32-bit systems, `LL2NUM(key_val)` returns inconsistent results because a large value has to be stored as a Bignum on the heap. This causes cache lookups to fail.

This commit restores the previous behavior of using `ObjectCache_GetKey`, which discards the lower 2 bits, which are zero. This enables a key to be stored as a Fixnum on both 32 and 64-bit platforms.

As https://patshaughnessy.net/2014/1/9/how-big-is-a-bignum describes, a Fixnum uses:

* 1 bit for the `FIXNUM_FLAG`.
* 1 bit for the sign flag.

Therefore the largest possible Fixnum value on a 64-bit value is 4611686018427387903 (2^62 - 1). On a 32-bit system, the largest value  is 1073741823 (2^30 - 1).

For example, a possible VALUE pointer address on a 32-bit system:

0xff5b4af8 => 4284173048

Dropping the lower 2 bits makes up for the loss of range to these flags. In the example above, we see that shifting by 2 turns the value into a 30-bit number, which can be represented as a Fixnum:

(0xff5b4af8 >> 2) => 1071043262

This bug can also manifest on a 64-bit system if the upper bits are 0xff.

Closes #13481

Closes #13494

COPYBARA_INTEGRATE_REVIEW=#13494 from stanhu:sh-fix-ruby-protobuf-32bit d63122a
PiperOrigin-RevId: 557211768
zhangskz pushed a commit that referenced this pull request Aug 17, 2023
#13204 refactored the Ruby object cache to use a key of `LL2NUM(key_val)` instead of `LL2NUM(key_val >> 2)`. On 32-bit systems, `LL2NUM(key_val)` returns inconsistent results because a large value has to be stored as a Bignum on the heap. This causes cache lookups to fail.

This commit restores the previous behavior of using `ObjectCache_GetKey`, which discards the lower 2 bits, which are zero. This enables a key to be stored as a Fixnum on both 32 and 64-bit platforms.

As https://patshaughnessy.net/2014/1/9/how-big-is-a-bignum describes, a Fixnum uses:

* 1 bit for the `FIXNUM_FLAG`.
* 1 bit for the sign flag.

Therefore the largest possible Fixnum value on a 64-bit value is 4611686018427387903 (2^62 - 1). On a 32-bit system, the largest value  is 1073741823 (2^30 - 1).

For example, a possible VALUE pointer address on a 32-bit system:

0xff5b4af8 => 4284173048

Dropping the lower 2 bits makes up for the loss of range to these flags. In the example above, we see that shifting by 2 turns the value into a 30-bit number, which can be represented as a Fixnum:

(0xff5b4af8 >> 2) => 1071043262

This bug can also manifest on a 64-bit system if the upper bits are 0xff.

Closes #13481

Closes #13494

COPYBARA_INTEGRATE_REVIEW=#13494 from stanhu:sh-fix-ruby-protobuf-32bit d63122a
PiperOrigin-RevId: 557211768

Co-authored-by: Stan Hu <stanhu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform related Any issue releated to specific platform or OS ruby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants