Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finer-grained constant invalidation #5000

Closed
wants to merge 1 commit into from

Conversation

kddnewton
Copy link
Contributor

Today, when you reference a static constant anywhere in your code YARV adds two instructions: opt_getinlinecache and opt_setinlinecache.

opt_getinlinecache stores a cached number which refers to the global constant state (ruby_vm_global_constant_state) that was set when the constant was last looked up. If the cached number matches the current global constant state, then it jumps to its target instruction. If it doesn't match, then it carries on to the next instruction which will look up the constant and put it on the stack.

opt_setinlinecache gets the value of the constant off the stack, looks up the current global constant state, and stores both of those values in the associated opt_getinlinecache instruction's cache entry.

Effectively, that means any time ruby_vm_global_constant_state is incremented, every constant cache is busted and must be looked up again. In the case where you have a system that dynamically defines constants, this means you can pay a significant penalty on constant lookup relatively frequently. This also has implications for JITs, because it forces them to discard generated code that specialized on the constant value in the cache.

The times when ruby_vm_global_constant_state is incremented include:

  • A constant is assigned
  • A constant is removed
  • A constant's visibility changes
  • A module is included
  • A constant is autoloaded

You can inspect ruby_vm_global_constant_state by calling RubyVM.stat(:global_constant_state) which returns the value of the current constant state.

In this commit, we've added a ID table constant cache to the VM struct, which functions as a map between IDs and an rb_serial_t. Effectively it's storing a map of constant name to its own cache state. This means we can change opt_getinlinecache and opt_setinlinecache in the following ways.

opt_getinlinecache now stores both a constant state and a pointer to an entry in the VM's constant cache. When this instruction is executed it checks if a cache entry exists for the given ID and that the state matches the one stored in the cache. Effectively this means that every ID now has its own global cache state.

opt_setinlinecache now accepts an additional operand that is the ID that is being checked. It does this so that it can increment the number in the cache corresponding to the given ID.

With this model, whenever a constant changes in the ways mentioned above it only clears the cache for its specific name (as opposed to for every constant globally). This avoids a lot of invalidation. The only caveat is that when a module is included, it must invalidate all of its constants as it means that constant lookup will change for the object that is including the module.

Because global_constant_state is no longer a thing, the commit changes RubyVM.stat to return a hash representing the VM's global constant cache. On start, it returns something that looks like:

{
  :IO=>1,
  :READABLE=>1,
  :WRITABLE=>1,
  :PRIORITY=>1,
  :WaitReadable=>1,
  :WaitWritable=>1,
  :EAGAINWaitReadable=>1,
  :EAGAINWaitWritable=>1,
  :EWOULDBLOCKWaitReadable=>1,
  :EWOULDBLOCKWaitWritable=>1,
  :EINPROGRESSWaitReadable=>1,
  :EINPROGRESSWaitWritable=>1,
  :SEEK_SET=>1,
  :SEEK_CUR=>1,
  :SEEK_END=>1,
  :SEEK_DATA=>1,
  :SEEK_HOLE=>1,
  :STDIN=>1,
  :STDOUT=>1,
  :STDERR=>1,
  :ARGF=>1,
  ...
}

Today, when you reference a static constant anywhere in your code YARV adds two instructions: `opt_getinlinecache` and `opt_setinlinecache`.

`opt_getinlinecache` stores a cached number which refers to the global constant state (`ruby_vm_global_constant_state`) that was set when the constant was last looked up. If the cached number matches the current global constant state, then it jumps to its target instruction. If it doesn't match, then it carries on to the next instruction which will look up the constant and put it on the stack.

`opt_setinlinecache` gets the value of the constant off the stack, looks up the current global constant state, and stores both of those values in the associated `opt_getinlinecache` instruction's cache entry.

Effectively, that means any time `ruby_vm_global_constant_state` is incremented, every constant cache is busted and must be looked up again. In the case where you have a system that dynamically defines constants, this means you can pay a significant penalty on constant lookup relatively frequently. This also has implications for JITs, because it forces them to discard generated code that specialized on the constant value in the cache.

The times when `ruby_vm_global_constant_state` is incremented include:

* A constant is assigned
* A constant is removed
* A constant's visibility changes
* A module is included
* A constant is autoloaded

You can inspect `ruby_vm_global_constant_state` by calling `RubyVM.stat(:global_constant_state)` which returns the value of the current constant state.

In this commit, we've added a ID table constant cache to the VM struct, which functions as a map between `ID`s and an `rb_serial_t`. Effectively it's storing a map of constant name to its own cache state. This means we can change `opt_getinlinecache` and `opt_setinlinecache` in the following ways.

`opt_getinlinecache` now stores both a constant state _and_ a pointer to an entry in the VM's constant cache. When this instruction is executed it checks if a cache entry exists for the given `ID` and that the state matches the one stored in the cache. Effectively this means that every `ID` now has its own global cache state.

`opt_setinlinecache` now accepts an additional operand that is the `ID` that is being checked. It does this so that it can increment the number in the cache corresponding to the given `ID`.

With this model, whenever a constant changes in the ways mentioned above it only clears the cache for its specific name (as opposed to for every constant globally). This avoids a lot of invalidation. The only caveat is that when a module is included, it must invalidate all of its constants as it means that constant lookup will change for the object that is including the module.

Because `global_constant_state` is no longer a thing, the commit changes `RubyVM.stat` to return a hash representing the VM's global constant cache. On start, it returns something that looks like:

```ruby
{
  :IO=>1,
  :READABLE=>1,
  :WRITABLE=>1,
  :PRIORITY=>1,
  :WaitReadable=>1,
  :WaitWritable=>1,
  :EAGAINWaitReadable=>1,
  :EAGAINWaitWritable=>1,
  :EWOULDBLOCKWaitReadable=>1,
  :EWOULDBLOCKWaitWritable=>1,
  :EINPROGRESSWaitReadable=>1,
  :EINPROGRESSWaitWritable=>1,
  :SEEK_SET=>1,
  :SEEK_CUR=>1,
  :SEEK_END=>1,
  :SEEK_DATA=>1,
  :SEEK_HOLE=>1,
  :STDIN=>1,
  :STDOUT=>1,
  :STDERR=>1,
  :ARGF=>1,
  ...
}
```
@k0kubun
Copy link
Member

k0kubun commented Oct 21, 2021

Do you have benchmark results (micro and non-micro)? It would encourage us to merge your change.

@kddnewton
Copy link
Contributor Author

Sorry @k0kubun I actually meant to open this on my fork to get all the tests passing first, ignore this for now!

@kddnewton kddnewton closed this Oct 21, 2021
@kddnewton kddnewton deleted the constant-invalidation branch October 21, 2021 16:54
@kddnewton kddnewton restored the constant-invalidation branch October 21, 2021 16:55
@kddnewton kddnewton deleted the constant-invalidation branch December 2, 2021 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants