Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify shapes of ActiveModel::Attributes #47804

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

casperisfine
Copy link
Contributor

if defined?(@ivar) a performance anti-pattern in Ruby 3.2+ even more so with YJIT.

This pattern cause the object shape to be inconsistent which slow downs instance variable access.

ruby 3.2.1 (2023-02-08 revision 31819e82c8) [arm64-darwin22]
       #value (orig):  1811240.2 i/s
        #value (opt):  2045238.9 i/s - 1.13x  faster
ruby 3.2.1 (2023-02-08 revision 31819e82c8) +YJIT [arm64-darwin22]
       #value (orig):  4379180.3 i/s
        #value (opt):  6347280.7 i/s - 1.45x  faster

Benchmark: https://gist.github.com/casperisfine/872f0a486b5ccdf90d9feb830c76d9ad

Backward compatibility

There is a big backward compatibility concern here, and I'm not sure we can actually make this change safety.

Until #47747, when you serialized an ActiveRecord::Base instance, lots of ActiveModel::Attribute instances would be serialized with it. Which means this optimization would break any instance serialized with an older Rails.

Currently the PR is only enabling the optimization if you switched to the new AR Marshal format, but:

  • You can switch to that format and still deserialize old payloads, this PR would break that.
  • Active Record may not be the only way these are serialized. We may break more stuff.

I can't think of any decent way to do this optimization while still retaining perfect backward/forward compatibility...

cc @tenderlove, I wonder if you have opinions here.

@rafaelfranca
Copy link
Member

I don't remember but it is possible to define custom marshal loading to keep backwards compatibility?

activemodel/lib/active_model/attribute.rb Outdated Show resolved Hide resolved
activemodel/lib/active_model/attribute.rb Outdated Show resolved Hide resolved
activemodel/lib/active_model/attribute.rb Outdated Show resolved Hide resolved
Copy link
Member

@jonathanhefner jonathanhefner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the PR is only enabling the optimization if you switched to the new AR Marshal format, but:

  • You can switch to that format and still deserialize old payloads, this PR would break that.
  • Active Record may not be the only way these are serialized. We may break more stuff.

What about using separate boolean ivars to indicate undefined-ness? Then I think we could unify the implementations with something like:

attr_reader :has_value, :has_value_for_database

def initialize(name, value_before_type_cast, type, original_attribute = nil, value = nil)
  @name = name
  @value_before_type_cast = value_before_type_cast
  @type = type
  @original_attribute = original_attribute
  @value = value
  @has_value = !value.nil?
  @value_for_database = nil
  @has_value_for_database = false
end

def has_value?
  has_value || (has_value.nil? && defined?(@value))
  # Or possibly:
  # has_value || (@has_value = !!defined?(@value) if has_value.nil?)
end
alias :has_been_read? :has_value?

def has_value_for_database?
  has_value_for_database || (has_value_for_database.nil? && defined?(@value_for_database))
end

Since old serializations will not have the boolean ivars, we can detect them with has_value.nil? and has_value_for_database.nil?, and handle them by falling back to defined?.

@tenderlove
Copy link
Member

I don't remember but it is possible to define custom marshal loading to keep backwards compatibility?

I don't think it is. Additionally, it doesn't seem like there is a way to upgrade marshal formats. According to this comment marshal contains its version number, but AFAICT there is no way to deserialize one version then upgrade it to a newer version.

If I understand this PR correctly, we basically want to initialize @value and @value_for_database to sentinel values. I wonder if we could possibly change Ruby to call like an after_load callback on deserialized objects. For example:

class MyObj
  def initialize name
    @value    = name
    @has_value_for_database = :undef
  end

  # Called after object has been deserialized
  def marshal_loaded
    @has_value_for_database ||= :undef
  end
end

obj = MyObj.new :a
p Marshal.load Marshal.dump(obj)

What about using separate boolean ivars to indicate undefined-ness? Then I think we could unify the implementations with something like:

I think this could work. We'll still end up with more shapes than we'd like, but over time (old marshal data being replaced) they should go away. Additionally, those extra shapes would only exist in systems that deserialized old data.

@casperisfine
Copy link
Contributor Author

I don't remember but it is possible to define custom marshal loading to keep backwards compatibility?

Yes and no. Payload that were serialized form versions that didn't have a marshal_dump method, won't invoke marshal_load.

So if we want to keep compat, we need to handle instance that were serialized in 7.0 :/

@casperisfine
Copy link
Contributor Author

Since old serializations will not have the boolean ivars, we can detect them with has_value.nil? and has_value_for_database.nil?, and handle them by falling back to defined?.

Yes, except to avoid warnings in 2.7, we'll also have to check for defined?, I can try, but I fear we might lose a good part of the optimization -_-

@casperisfine
Copy link
Contributor Author

Also it's not so much forward compatibility that is my problem, but backward, as in Marshal.dump(rails_7_1_object) should be loadable from 7.0 so that you can do a rolling release of 7.1.

Ideally when you first ship 7.1 without the new defaults enabled, it should generate exactly the same payloads as 7.0 used to.

@rafaelfranca
Copy link
Member

Right, the tricky part is that some of those serialized objects can be in database, which record not being accessed or even updated for years, so in theory we can never delete the compatibility code, unless we generate some kind of helper to help people migrate data.

@casperisfine
Copy link
Contributor Author

Hum, in my mind the contract was that we do keep YAML compact for long time, but Marshal is only across one version to the next. Did I imagine this?

@rafaelfranca
Copy link
Member

oh yeah. I forgot Marshal is not used in database, so I'd not worry with backwards compatibly. If the new behavior is enabled, we can just tell people to bump the cache version.

@jonathanhefner
Copy link
Member

Yes, except to avoid warnings in 2.7, we'll also have to check for defined?

In which cases? I was thinking we could replace all occurrences of defined?(@value) with has_value?, and attr_reader :has_value would eliminate the need for defined?(@has_value). (Likewise for @value_for_database / has_value_for_database.)

Also it's not so much forward compatibility that is my problem, but backward, as in Marshal.dump(rails_7_1_object) should be loadable from 7.0 so that you can do a rolling release of 7.1.

Could that be handled by a config.* value + conditional definition of initialize?

@casperisfine
Copy link
Contributor Author

In which cases?

If you load an object that was serialized by 7.0 from 7.1, @has_value won't be defined.

attr_reader :has_value would eliminate the need for defined?(@has_value)

Hum, that's a good point worth testing.

Could that be handled by a config.* value + conditional definition of initialize?

That's already the case in this PR. I piggy back on the config introduced in #47747

@tenderlove
Copy link
Member

tenderlove commented Mar 29, 2023

One other thing that may (or may not) be important. defined?(@iv) didn't have an inline cache in Ruby 3.2, but it will in 3.3, and I think YJIT will have support (I paired on support but I don't know if it's been upstreamed yet) YJIT will also have support

Obviously we should reduce the number of shapes in the system, but maybe the performance impact isn't as great in Ruby 3.3?

@casperisfine
Copy link
Contributor Author

but maybe the performance impact isn't as great in Ruby 3.3?

Right, but it's still basically a year away, so I'd like to reap some benefits now if possible.

@casperisfine casperisfine force-pushed the ar-value-shapes branch 3 times, most recently from 07fc929 to d9713d6 Compare March 31, 2023 10:20
@casperisfine
Copy link
Contributor Author

Ok, so one solution could be to keep that code unchanged, and to use another class entirely e.g. Attribute71. So we can have both active at the same time.

But that's quite ugly, so I'm also tempted to just delay this post 7.1 release, and just remove the old format then.

@casperisfine casperisfine force-pushed the ar-value-shapes branch 4 times, most recently from 7321f14 to cec9f5b Compare October 27, 2023 09:10
`if defined?(@ivar)` a performance anti-pattern in Ruby 3.2+
even more so with YJIT.

This pattern cause the object shape to be inconsistent which slow
downs instance variable access.

```
ruby 3.2.1 (2023-02-08 revision 31819e82c8) [arm64-darwin22]
       #value (orig):  1811240.2 i/s
        #value (opt):  2045238.9 i/s - 1.13x  faster
```

```
ruby 3.2.1 (2023-02-08 revision 31819e82c8) +YJIT [arm64-darwin22]
       #value (orig):  4379180.3 i/s
        #value (opt):  6347280.7 i/s - 1.45x  faster
```

Benchmark: https://gist.github.com/casperisfine/872f0a486b5ccdf90d9feb830c76d9ad
The `UNDEF.equal?` overheadis significant enough that what is gained
by higher cache rate is lost doing the check.

It's because we have to both lookup the constant, and then call
`equal?` which surprisingly doesn't have an optimized opcode in the
interpreter. But it does in YJIT, hence why it's still faster with
it enabled.

So instead we can use a secondary instance variable to keep a flag.

Prior to this Attribute instances had 6 ivars, now they have 8, so
they still fit in the same size 80 slots, meaning the memory usage
is unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants