Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert AttributeMethodMatcher to Module Builder #30895

Closed

Conversation

shioyama
Copy link
Contributor

@shioyama shioyama commented Oct 15, 2017

This is quite a large and substantial refactor of two core modules, ActiveModel::AttributeMethods and ActiveRecord::AttributeMethods. The inspiration for this PR is this blog post on the module builder pattern which I wrote a few months ago (the end of the post discusses a change to AM::AttributeMethods which is almost the same to the one here). I had intended only to change ActiveModel, but the two modules are so heavily coupled that it was impossible to change one without the other.

I have tried to make the absolute minimal changes to achieve the goal of de-coupling described below, and wherever possible I have moved methods without actually altering their logic.

There are still some points to discuss and more tests to write, so this is not a finished PR; for now I'd just like to get some feedback on the general idea and current implementation.

Summary

ActiveModel and ActiveRecord provide a simple mechanism for defining prefixed/suffixed/affixed attribute methods, whereby you call a class method like attribute_method_prefix (etc) with a prefix (or suffix/affix), then call define_attribute_methods to actually define the attribute methods.

Under the hood, calling one of these class methods creates an instance of a class AttributeMethodMatcher which stores the prefix and/or suffix, and appends the instance onto a class attribute attribute_method_matchers. (Similar for defining aliases.)

https://github.com/shioyama/rails/blob/5668dc6b1863ef43be8f8ef0fb1d5db913085fb3/activemodel/lib/active_model/attribute_methods.rb#L387-L412

When you actually call an attribute method (i.e. a method with a prefix/suffix), AR/AM dispatches to a handler using one of two mechanisms:

  • AM::AttributeMethods overrides method_missing to dispatch calls for a method (say) reset_foo_to_default! to reset_attribute_to_default!("foo"), if foo is a key on the hash stored in the hash returned by the attributes method. (This one is actually not very well documented.)
  • In addition, by calling define_attribute_methods, you can convert these prefixes/suffixes/affixes into actual methods, which are defined on instances of an anonymous module (in ActiveModel) or instances of a named module GeneratedAttributeMethods (in ActiveRecord).

While it works, there are a few problems with the current implementation, the major one being that the components (attribute method matchers on the class, instance methods on generated attribute methods, etc.) are heavily coupled to each other, making it hard to read the code and even harder to extend it. This is even more clear when you look at how heavily coupled AR::AttributeMethods is to the internals of AM::AttributeMethods.

I recently wrote a blog post where (toward the end of the post) I discuss the problems with this approach. This PR is an attempt to implement what I describe in the post.

The basic concept here is to convert the matcher class ActiveModel::ClassMethods::AttributeMethodMatcher into a subclass of Module, which I've promoted in namespace to ActiveModel::AttributeMethodMatcher. Since the matcher has the prefix and/or suffix, you can define the method_missing/respond_to? overrides, and also define attribute methods (and aliases) themselves on the matcher and include it into the class. Coupling is vastly reduced since everything is encapsulated in one place.

The first commit in this PR does this in AM, which is quite a simple change and passes tests easily. The hard part is to make this work in ActiveRecord, since AR currently overrides private methods like generated_attribute_methods.

However, what is pleasantly surprising is that making a similar change in AR also yields cleaner code. Whereas currently AR::AttributeMethods has to override many methods in AM::AttributeMethods, here the customization happens instead by subclassing ActiveModel::AttributeMethodMatcher (the class which generates the matcher modules). By doing this, a number of the private methods added to the model class are instead moved to the matcher module, avoiding namespace pollution and keeping the implementation more encapsulated.

Concretely, the following AR methods are moved out of the model class:

  • generated_attribute_methods
  • define_method_attribute
  • define_proxy_call
  • attribute_method_matchers_cache
  • attribute_method_matchers_matching

In addition, initialize_generated_modules is no longer used (but still required for generated association methods), and the define_method_#{matcher.method_missing_target} pattern of methods are all moved to the matcher, so this method namespace is now free in the model class.

The biggest change is when you look at the ancestors of an AR class. Since the modules are now each associated with their prefix/suffix, and I've overridden inspect to show you the corresponding regex, you can actually see all the matchers applied to the class.

So instead of this:

Topic.ancestors
=> [Topic(...),
 Topic::GeneratedAssociationMethods,
 #<ActiveRecord::AttributeMethods::GeneratedAttributeMethods:0x0055c121def890>,
 ActiveRecord::Base,
...

where the cryptic ActiveRecord::AttributeMethods::GeneratedAttributeMethods holds generated methods, and the method_missing and respond_to? are defined on the class itself, you get this:

=> [Topic(...),           
 Topic::GeneratedAssociationMethods,           
 <ActiveRecord::AttributeMethodMatcher: /^(?:)(.*)(?:_in_database)$/>,                        
 <ActiveRecord::AttributeMethodMatcher: /^(?:)(.*)(?:_change_to_be_saved)$/>,                 
 <ActiveRecord::AttributeMethodMatcher: /^(?:will_save_change_to_)(.*)(?:\?)$/>,              
 <ActiveRecord::AttributeMethodMatcher: /^(?:)(.*)(?:_before_last_save)$/>,                   
 <ActiveRecord::AttributeMethodMatcher: /^(?:saved_change_to_)(.*)(?:)$/>,
 <ActiveRecord::AttributeMethodMatcher: /^(?:saved_change_to_)(.*)(?:\?)$/>,
 <ActiveRecord::AttributeMethodMatcher: /^(?:restore_)(.*)(?:!)$/>,
 <ActiveRecord::AttributeMethodMatcher: /^(?:)(.*)(?:_previous_change)$/>,
 <ActiveRecord::AttributeMethodMatcher: /^(?:)(.*)(?:_previously_changed\?)$/>,
 <ActiveRecord::AttributeMethodMatcher: /^(?:)(.*)(?:_was)$/>,
 <ActiveRecord::AttributeMethodMatcher: /^(?:)(.*)(?:_will_change!)$/>,
 <ActiveRecord::AttributeMethodMatcher: /^(?:)(.*)(?:_change)$/>,
 <ActiveRecord::AttributeMethodMatcher: /^(?:)(.*)(?:_changed\?)$/>,
 <ActiveRecord::AttributeMethodMatcher: /^(?:)(.*)(?:\?)$/>,
 <ActiveRecord::AttributeMethodMatcher: /^(?:)(.*)(?:_came_from_user\?)$/>,
 <ActiveRecord::AttributeMethodMatcher: /^(?:)(.*)(?:_before_type_cast)$/>,
 <ActiveRecord::AttributeMethodMatcher: /^(?:)(.*)(?:=)$/>,
 <ActiveRecord::AttributeMethodMatcher: /^(?:)(.*)(?:)$/>,
 ActiveRecord::Base,
...

So you can clearly see all the matchers in the ancestor chain. When you call define_attribute_methods on the class, this actually delegates to calling the same method on each of these modules, which then defines the attribute methods on the module. So rather than having one big collection of attribute methods, you have methods each paired to the respective matcher that defines them.

e.g.:

Topic.ancestors[3]
=> <ActiveRecord::AttributeMethodMatcher: /^(?:)(.*)(?:_change_to_be_saved)$/>
Topic.ancestors[3].instance_methods
=> [:method_missing,
 :respond_to?,
 :heading_change_to_be_saved,
 :id_change_to_be_saved,
 :title_change_to_be_saved,
 :author_name_change_to_be_saved,
 :author_email_address_change_to_be_saved,
 :written_on_change_to_be_saved,
 :bonus_time_change_to_be_saved,
 :last_read_change_to_be_saved,
 :content_change_to_be_saved,
 :important_change_to_be_saved,
 :approved_change_to_be_saved,
 :replies_count_change_to_be_saved,
 :unique_replies_count_change_to_be_saved,
 :parent_id_change_to_be_saved,
 :parent_title_change_to_be_saved,
 :type_change_to_be_saved,
 :group_change_to_be_saved,
 :created_at_change_to_be_saved,
 :updated_at_change_to_be_saved]

Here you can see that methods are defined for the to_be_saved suffix; other modules are similar.

Other Information

Another thing that I noticed while working on this change is that currently, each AR::Base subclass includes its own @generated_attribute_methods instance, where it defines its attribute methods. This means that if I have a class SuperPost which inherits from Post, then I will actually include three modules, one defined in ActiveRecord (which has no actual attribute methods defined on it), one on Post which will have attribute methods defined on it, and another one defined in SuperPost which has (generally the same, or a subset of) attribute methods defined on it. There is no need to have these duplicate modules AFAICT, but the current code doesn't work without doing it this way.

With the change here, although there are many more modules (one per prefix/suffix), they are only included once in the first subclass of ActiveRecord::Base. This avoids defining a potentially large number of methods multiple times if you have a deep inheritance tree.

One point that I feel is still problematic is the method instance_method_already_implemented?. Currently the AR::AttributeMethodMatcher` class just calls through to the model class method, which is simple but creates coupling I was hoping to avoid. For now I've left this since I want to avoid making any more changes before discussion, but this is one point I'm still not really happy with.

Thanks for reading! I'm happy and ready to explain any and all changes here.

@rails-bot
Copy link

Thanks for the pull request, and welcome! The Rails team is excited to review your changes, and you should hear from @eileencodes (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

This repository is being automatically checked for code quality issues using Code Climate. You can see results for this analysis in the PR status below. Newly introduced issues should be fixed before a Pull Request is considered ready to review.

Please see the contribution instructions for more information.

@@ -38,6 +38,7 @@ def self.columns_hash
end

def test_define_attribute_methods
skip "implementation changed, need to update test"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am skipping the two tests in this file because they are no longer relevant given the changes made here. I intend to replace them but decided to first get the entire change looked over once before continuing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not skip these tests just to get this green. Otherwise someone might accidentally merge and then we're not testing correctly. A failing build is a good indication that this PR is not ready.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to fix them right now, just don't skip them either please.

@shioyama shioyama force-pushed the attribute_methods_module_builder branch from e5ce883 to 7342c88 Compare October 15, 2017 06:01
@matthewd
Copy link
Member

although there are many more modules (one per prefix/suffix)

I think you're substantially underestimating the outcry that would follow such an apparent "bloating" of the inheritance chain: the number of modules we include is already pretty controversial, and they're currently carrying a lot more functionality per include.

A little further from straight "people would complain", I'm legitimately concerned about supering through so many method_missing layers.

Perhaps it's worth comparing a less textbook-pure version, where we still use the module instance to contain things, but handle all the matchers in a single instance?


As for only needing one instance per inheritance chain, it sounds like we might need more tests around behaviour when define_attribute_method is called in an AR::Base grandchild subclass (whereby said attribute should not appear on the class's siblings).

@shioyama
Copy link
Contributor Author

shioyama commented Oct 15, 2017

I think you're substantially underestimating the outcry that would follow such an apparent "bloating" of the inheritance chain: the number of modules we include is already pretty controversial, and they're currently carrying a lot more functionality per include.

That may be true, but personally I don't see the issue with inheritance "bloat" if there are no performance or code complexity implications. In this case I don't see those implications (I don't think the supering through method_missing would have performance implications, but maybe I'm not seeing something).

But another point is that with this change, ActiveRecord::AttriuteMethods::Read and ActiveRecord::AttributeMethods::Write are so small they could be merged into Core or another module, and similarly I think there are other modules that might be simplified/removed. Of course that will not add up to all the prefixes/suffixes, but the issue to me is not about simply the number of modules but the complexity of reasoning about what those modules are doing. In this sense, yes, the current inheritance chain is longer for sure, but the modules added here are actually much easier to understand conceptually.

A little further from straight "people would complain", I'm legitimately concerned about supering through so many method_missing layers.

Yes, this is a legitimate concern, but I don't see any performance implications at least. I'm making this PR to identify if there are show-stopper issues with this idea, though, so if you have a concrete example of where this would cause problems I'd like to discuss it.


The direction that I'm pushing here is (for one) separating AR from AM more cleanly, as well as making the attribute methods code more modular. I personally think this outweighs any concerns about a longer inheritance chain, but that's my personal view.

@shioyama
Copy link
Contributor Author

I think you're substantially underestimating the outcry that would follow such an apparent "bloating" of the inheritance chain

The other point to be made about the inheritance chain is that although having an extra dozen or more modules may trigger outcry among some, for others (like myself), the fact of actually being able to see all the different patterns of methods defined on the model class is really really illuminating. When I first made the change here and looked at the ancestor chain it was like a light came on, showing very clearly what all these different modules were actually doing to the model.

Of course, you can see those same matchers by looking at Post.attribute_method_matchers and checking the prefixes and suffixes of each. But it's not obvious to look there without digging into the code. By putting these matchers explicitly into the ancestors chain, you give them the same status as other modules, which (I think) they deserve given the fact that they are defining the attribute methods that are one of the most visible parts of a model (AR or AM) class.

@eileencodes eileencodes assigned matthewd and unassigned eileencodes Oct 15, 2017
@shioyama
Copy link
Contributor Author

@eileencodes Ok removed those skips.

@shioyama
Copy link
Contributor Author

Perhaps it's worth comparing a less textbook-pure version, where we still use the module instance to contain things, but handle all the matchers in a single instance?

One thing that might make sense to investigate would be to reduce the number of modules by grouping them logically, e.g. dirty matchers go together, etc. But the problem is that the matcher class would then no longer neatly wrap around a single method missing target, method name, etc. - instead you'd have more complexity there again.

I'm eager to hear other suggestions, but also I'd like to know concretely what the problem is with having more modules in the inheritance chain. The one thing I notice is that a NoMethodError brings up a stack trace that looks like an indefinite recursion error (becuase it's going though the same code with different matcher regexes many times). This could be confusing, and so far this is the one concrete issue I see with that approach, but it seems to me a cosmetic one.

@soulcutter
Copy link

I think this is on the right track. I don’t think the number of modules included is quite as controversial as the kitchen sink of methods defined on a
Model, which this actually cleans up slightly. This probably does expose the weight of method definitions in a different way than before, but that was always there.

Are there benchmarks around this? I suspect perf is similar, but it can be hard to intuit.

Whichever way this PR goes, this was a neat idea.

@shioyama
Copy link
Contributor Author

@soulcutter

Are there benchmarks around this? I suspect perf is similar, but it can be hard to intuit.

I just ran the performance benchmarks in activerecord/examples/performance.rb, with 100 benchmark records and 5 seconds benchmark time, and indeed there seem to be some problem spots.

Here is the result from master (commit hash 5668dc6), skipping the warming up:

Calculating -------------------------------------                                                                                                                                            
            Model#id      1.506M (± 8.2%) i/s -      7.558M in   5.062531s                                                                                                                   
Model.new (instantiation)                                                                                                                                                                    
                         70.277k (± 8.2%) i/s -    349.804k in   5.019615s                                                                                                                   
Model.new (setting attributes)                                                                                                                                                               
                         42.462k (± 8.3%) i/s -    213.460k in   5.070581s                                                                                                                   
         Model.first      4.002k (± 8.4%) i/s -     20.196k in   5.091957s                                                                                                                   
          Model.take      7.007k (± 7.0%) i/s -     34.850k in   5.000724s                                                                                                                   
Model.all limit(100)    157.316  (± 8.9%) i/s -    780.000  in   5.003810s                                                                                                                   
 Model.all take(100)    157.794  (± 8.9%) i/s -    795.000  in   5.085958s                                                                                                                   
Model.all limit(100) with relationship                                                                                                                                                       
                         80.095  (± 8.7%) i/s -    400.000  in   5.053453s                                                                                                                   
Model.all limit(10,000)                                                                                                                                                                      
                          4.219  (± 0.0%) i/s -     30.000  in   7.112930s                                                                                                                   
   Model.named_scope     32.372k (± 8.8%) i/s -    162.678k in   5.075393s                                                                                                                   
        Model.create      1.802k (± 7.7%) i/s -      8.996k in   5.027716s                                                                                                                   
Resource#attributes=     35.808k (± 8.1%) i/s -    177.714k in   5.003288s                                                                                                                   
     Resource#update      1.852k (± 9.2%) i/s -      9.180k in   5.013268s                                                                                                                   

Here is the result from this branch:

Calculating -------------------------------------                                             
            Model#id      1.522M (± 8.4%) i/s -      7.586M in   5.033554s                    
Model.new (instantiation)                      
                         20.211k (± 7.4%) i/s -    101.133k in   5.036667s                    
Model.new (setting attributes)                 
                         14.938k (± 7.5%) i/s -     75.313k in   5.074483s                    
         Model.first      2.986k (± 7.4%) i/s -     14.847k in   5.004941s                    
          Model.take      4.471k (± 7.6%) i/s -     22.450k in   5.057055s                    
Model.all limit(100)     69.676  (± 8.6%) i/s -    350.000  in   5.066785s                    
 Model.all take(100)     69.864  (± 8.6%) i/s -    350.000  in   5.055046s                    
Model.all limit(100) with relationship         
                         35.021  (± 8.6%) i/s -    174.000  in   5.015003s                    
Model.all limit(10,000)                        
                          1.988  (± 0.0%) i/s -     14.000  in   7.048725s                    
   Model.named_scope     31.179k (± 8.3%) i/s -    156.420k in   5.059643s                    
        Model.create      1.604k (± 7.9%) i/s -      8.007k in   5.030966s                    
Resource#attributes=     13.237k (± 9.6%) i/s -     66.990k in   5.122866s                    
     Resource#update      1.586k (± 7.5%) i/s -      7.904k in   5.016089s  

So you can see there are some very large differences, e.g. Resource#attributes= is 177.714k on master but only 66.990k on this branch, a drop of nearly two thirds in iterations. But also just instantiating a model is much slower. Obviously something is impacting performance very heavily.

I'm curious to know what would be causing the slowdown, so I'll dig a bit more and report back. You're absolutely right that these things can be hard to intuit.

@shioyama
Copy link
Contributor Author

@matthewd

Perhaps it's worth comparing a less textbook-pure version, where we still use the module instance to contain things, but handle all the matchers in a single instance?

So I thought about this a bit and I've got an idea which I think may solve many of the same issues this PR solves, without the module "bloat". Since there also seem to be performance issues related (I suppose?) to the number of modules, I'm going to close this and in a few days open a new PR with a "lite" version.

What I'm basically envisioning is:

  • one module (as now with GeneratedAttributeMethods but probably renamed) which in addition to having attribute methods defined on it, also holds the matchers and methods like define_proxy_call, so that (like this PR) these methods get out of the model class.
  • rather than an instance variable pointing to generated attribute methods, the model has a class attribute pointing to the module. Like this PR, the class attribute module instance would be dup'ed from Base and included, and subclasses would then avoid including unneeded extra modules (which I consider a bug right now, I'll post a separate issue for this).
  • rather than class attributes for attribute_method_matchers, the model class would simply delegate define_attribute_prefix etc to the module. This would actually be cleaner than this PR since it means that the class does not even need the attribute_method_matchers class attribute anymore (since the matchers would be on the module rather than on itself).

I believe this should work and would potentially actually decrease the number of modules in the inheritance chain, while also removing some private methods that are currently added by AttributeMethods. It would also decouple AR::AttributeMethods from AM::AttributeMethods in the same way that this PR does, since everything woudl be wrapped up in a module (so you would be able to use AM::AttributeMethods alongside AR::AttributeMethods with just a little extra work).

Does this sound like a good plan? I'd like to at least get a thumbs up on the idea before proceeding. It won't be any more complicated than this PR so shouldn't take too much time to implement.

@shioyama shioyama closed this Oct 18, 2017
@matthewd
Copy link
Member

Wow, I was concerned that missed methods going through so many method_missing invocations could be a drag on performance, but I wasn't expecting it to have anywhere near this sort of impact anywhere. I too am curious what's going on -- at this scale, it sounds more like it's accidentally doing repeated work (or maybe blowing the method cache?) than simple added call overhead.

Most of your proposed plan sounds like a much better explanation of my half-thought; I agree that does seem to have merit. It'd also be an interesting middle-ground from which to revisit the more radical multi-module approach in future, and possibly better understand what's causing the performance differential.

As for reducing the number of modules in an inheritance scenario... I agree a subclass needn't have its own module by default. But I do believe we expect a subclass to be able to define its own matchers and attributes, and have those properly interact with those defined by the superclass -- so it seems necessary for it to be possible. (It's also worth noting that AR::Base isn't necessarily the only abstract class in the hierarchy.)

@shioyama
Copy link
Contributor Author

shioyama commented Oct 18, 2017

I haven't dug into the details of what's slowing things down, but just removing the respond_to? method in the matcher modules bumped performance up by about 20%. That's only a small part of the difference though. I've tried testing other changes but nothing seems to make a difference.

But I do believe we expect a subclass to be able to define its own matchers and attributes, and have those properly interact with those defined by the superclass, and have those properly interact with those defined by the superclass

So I'd like to clarify this point, because it confuses me a bit.

Ignoring this PR, in the code in the master branch, a subclass creates a new module instance stored in @generated_attribute_methods. This module holds attribute methods defined on the class. When you call define_attribute_methods (which happens each time an AR class is subclassed), attribute methods are defined for all the names in attribute_names (which is the same for all subclasses).

Each subclass defines (and includes) this instance variable, including ActiveRecord::Base itself (although no methods are defined on that one).

But actually, I don't see any reason why subclasses beyond the first subclass of Base should need these modules. The matchers themselves (again, in master, not this branch) are defined in a class attribute which can be modified in subclasses, so this part is not related to the included module.

In terms of the generated methods, you only really have to possibilities (AFAICT):

  • a method from attribute_names is defined in the subclass, and also in its parent class. Both methods are the same, dispatching (e.g. if the prefix is clear_ and attribute name "foo") to clear_attribute("foo").
  • a method with the name (clear_foo) is defined in the subclass itself, in which case the attribute method will not be defined in the @generated_attribute_methods of that subclass. However, it will be defined in the parent class, and if the method on the subclass calls super, it will call through to the attribute method on the parent class.

In either case, the fact that the generated attribute methods module is included in the subclass has no impact on the result. In the former case the method is the same as the parent method, both dispatching to clear_attribute("foo"). In the latter case, the method is not defined on the subclass module anyway.

Am I missing something? This is why I want to post an issue for this: I think AR is needlessly defining a module and defining attribute methods on this module for no reason other than that the current code depends on using an instance variable on the subclass, which (I think) can be fixed.

@shioyama
Copy link
Contributor Author

shioyama commented Oct 21, 2017

@matthewd So I figured out at least half the reason for the slowdown, and it's pretty trivial.

(results below updated, there was a typo in the undefine_attributes_method name in my first reply which I fixed.)

The fix is in shioyama/rails@a414920. Basically the problem is that when define_attribute_methods was being called, I was checking if attribute methods have been generated yet in each module, but since there are 18 modules this is a lot of checks. By just wrapping the whole thing in a check from the class, this is sped up a lot whenever define_attribute_methods (or undefine_attribute_methods) is called more than once.

Here are the results with that change (just showing relevant stuff):

Calculating -------------------------------------                                             
            Model#id    989.544k (± 7.5%) i/s -      4.939M in   5.026478s                    
Model.new (instantiation)                      
                         45.590k (± 7.4%) i/s -    231.928k in   5.119377s                    
Model.new (setting attributes)                 
                         21.987k (± 6.3%) i/s -    111.194k in   5.079722s                    
        Model.create      1.137k (± 5.5%) i/s -      5.763k in   5.085320s                    
Resource#attributes=     15.918k (± 7.0%) i/s -     79.612k in   5.029058s   

So you can see now that initialization is much closer to the results from master (45.6 i/s vs 70.2 i/s). However this only takes it halfway there.

I suspect the remainder of the difference is accounted for by similar cases where rather than returning immediately, a procedure is looping over 18 modules before returning the same result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants