New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undefine attribute methods of descendants when resetting column information #31475

Merged
merged 3 commits into from Dec 19, 2017

Conversation

Projects
None yet
4 participants
@shioyama
Contributor

shioyama commented Dec 15, 2017

I'm not quite sure if this is the right solution, but while looking through these tests I notice that the test checking that "reset column information resets children" is not really correct. I've modified the test to show that it fails in the sense that when column information is changed on the parent, the child does not reset its attribute methods, and thus although the (current) test passes, the method is actually falling through to method_missing, which I don't believe it should do.

My fix is to undefine attribute methods on all descendants when they are undefined on the parent, which passes this and other tests. But mostly I'd just like to clarify the expected behaviour.

@rails-bot

This comment has been minimized.

rails-bot commented Dec 15, 2017

Thanks for the pull request, and welcome! The Rails team is excited to review your changes, and you should hear from @georgeclaghorn (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

This repository is being automatically checked for code quality issues using Code Climate. You can see results for this analysis in the PR status below. Newly introduced issues should be fixed before a Pull Request is considered ready to review.

Please see the contribution instructions for more information.

@shioyama

This comment has been minimized.

Contributor

shioyama commented Dec 16, 2017

So I'm investigating the history of this issue, and commit eecfa84 explains a lot. @jonleighton wrote in the commit message:

This fixes a situation I encountered where a subclass would cache the
name of a generated attribute method in @_defined_class_methods. Then,
when the superclass has it's attribute methods undefined, the subclass
would always have to dispatch through method_missing, because the
presence of the attribute in @_defined_class_methods would mean that it
is never generated again, even if undefine_attribute_methods is called
on the subclass.

There various other confusing edge cases like this. STI classes share
columns, so let's just keep all the attribute method generation state
isolated to the base class.

This is actually what is happening now, because the change he made to define_attribute_methods has been updated in the meantime such that methods are no longer only defined on the base class, but on whatever class is being instantiated/initialized (each subclass has its own instance of @generated_attribute_methods pointing to its own attribute methods module). So actually this test which refers to testing that instance methods are defined on the base class is incorrect, since this is not how instance methods are currently defined.

I think the correct fix is to do something similar to what was done in this commit, except rather than defining attribute methods only on the base class, instead modify undefine_attribute_methods to remove them on all classes from the class' base class through to all descendants.

I'll push another commit or two doing this, but I'd like to have a discussion about this because I think the current code is broken. When you undefine_attribute_methods currently on a class in a heirarchy, you get into exactly the type of situation that @jonleighton was describing above.

@shioyama

This comment has been minimized.

Contributor

shioyama commented Dec 16, 2017

I've added a failing spec in 6c474c3 demonstrating the problem. This will fail even with the change here since I'm only undefining methods on descendants, not ancestors up to base.

But I'd like to discuss what the spec should be here, because what the tests say, and what the code actually does, are inconsistent.

To recap, when you initialize a class (or call init_with, or allocate) you internally call define_attribute_methods on the class, which (the first time) will memoize a module in @generated_attribute_methods for that class (not its ancestors up to base, or its descendants).

At this point, descendants will see the methods (because they are defined on a module included in an ancestor), but their @attribute_methods_generated instance variable will be false, and their @generated_attribute_methods will still be nil. It's only when an instance of the subclass is first initialized that they get define_attribute_methods called on the subclass, and then at this point they get their own module in their ancestor chain and their @attribute_methods_generated becomes true.

However, when you call undefine_attribute_methods on a class with subclasses, confusingly the class gets these variables reset, but descendants and ancestors do not. This is the problem.

I think the best solution is to make undefine_attribute_methods on any class in a class heirarchy descending from AR::Base undefine the methods on all classes in the chain. Anything else will leave you in a situation where some descendants or ancestors will be dispatching through method_missing when they should not be.

@shioyama

This comment has been minimized.

Contributor

shioyama commented Dec 16, 2017

Correction: define_attribute_methods will memoize modules in @generated_attribute_methods for the class and all superclasses up to the base class (but not to descendants).

https://github.com/rails/rails/blob/6c474c38486189908f63c1f7e408d30867a256ec/activerecord/lib/active_record/attribute_methods.rb#L62

This means that calling define_attribute_methods on any class in the hierarchy will result in all classes in the heirarchy up to base seeing the attribute methods (since descendants will inherit these methods).

But the situation with the instance variables described above remains: if we e.g. reset column information and add a new column to a class in the heirarchy, descendants will never get it defined, so you fall through to method_missing.

@shioyama

This comment has been minimized.

Contributor

shioyama commented Dec 16, 2017

After thinking about this a while, I've come to the conclusion that the safest thing is just to fix reset_column_information and not undefine_attribute_methods.

My fix is to update the line calling undefine_attribute_methods and replace it with:

([base_class] + base_class.descendants).each(&:undefine_attribute_methods)

For reset_column_information, this is essential to ensure that all attribute methods connected to the table are actually reset.

Let me know if this sounds reasonable. I think there's more that can be done but this is a reasonable first step.

@matthewd

This comment has been minimized.

Member

matthewd commented Dec 16, 2017

if we e.g. reset column information and add a new column to a class in the heirarchy, descendants will never get it defined

If I'm reading this right, what you're describing is approximately true of a number of collection-shaped class attributes where descendants append to the inherited collection: semantically they're adding to (or sometimes otherwise mutating) the current value from the ancestor... but in practice the inherited value just gets copied upon the first mutation, and future changes in the ancestor's value go un-noticed.

@shioyama

This comment has been minimized.

Contributor

shioyama commented Dec 16, 2017

If I'm reading this right, what you're describing is approximately true of a number of collection-shaped class attributes where descendants append to the inherited collection: semantically they're adding to (or sometimes otherwise mutating) the current value from the ancestor... but in practice the inherited value just gets copied upon the first mutation, and future changes in the ancestor's value go un-noticed.

Yes, and that's fine I suppose in many cases, but in this particular case with reset_column_information, the inconsistency results in not actually fully resetting the column information.

I think the fix is simple here; I'm not implying we should go through all cases of class attributes and update all descendants.

Where I'm going with this, for reference, is trying to remove reliance on the method_missing + respond_to? fallback in AR::AttributeMethods. They're slow, they make the code more complex, and I don't think they're really necessary, given some straightforward changes like this one.

@matthewd

This comment has been minimized.

Member

matthewd commented Dec 16, 2017

I haven't fully absorbed all the information you've given here yet, let alone then thought about it for a while... but my quick gut reaction is "do we want base_class.descendants, or do we want self.ancestors [up to & including base_class] + self.descendants?"

I'll concede that you'd need a pretty serious object hierarchy for it to make a practical difference, but at first glance, ISTM we shouldn't need to concern ourselves with other branches of the tree -- just those that affect or are affected by the target class.


Actually, considering what the other lines in reset_column_information are doing.. is it even intended to be used on a non-base class? 🤔

@shioyama

This comment has been minimized.

Contributor

shioyama commented Dec 16, 2017

ISTM we shouldn't need to concern ourselves with other branches of the tree -- just those that affect or are affected by the target class.

But if we're resetting column information, this affects all branches of the tree, no? i.e. self.ancestors [up to & including base_class] + self.descendants would miss the other branches, which share the column information. (I may be missing something here.)

Just to clarify/repeat: the starting point for this PR is that the test as it stands on master says it tests that reset column information "resets children". But actually after resetting on the parent (base class) the child is dispatching to method missing for the newly-added column, which to me means it was not actually reset.

is it even intended to be used on a non-base class?

Not really sure... I just assumed since it's a public method on a class, that's a valid use case.

@matthewd

This comment has been minimized.

Member

matthewd commented Dec 16, 2017

Yeah, I think my questions about affected classes and reasonable self values overlap: i.e., what does it even mean to reset the schema information for a child class, without targeting the base class it's nominally inherited from?

(This is more me thinking out loud than anything else -- I'm not specifically asking you for answers 😅)

At the moment, it feels like it's just not considering the possibility: it deals with the schema via the table_name, which seems rightly a responsibility of the base class only. And then reload_schema_from_cache only affects descendants. If the schema really has changed, our ancestors/cousins aren't going to notice: with this change they'll redefine their attribute methods, but still be working from their previously cached schema contents. OTOH, outright resetting the base class feels like overreach: if that's what our caller intended, surely that's what they would've done.

.. which brings me back to "does this method make sense for non-base classes, and if so, what does it mean?"

Either way, I definitely agree that this is a bug: the test you've added should pass. Right now I think I'm just equivocating on whether reset_column_information should use self or base_class.

@shioyama shioyama changed the title from Undefine attribute methods of descendants when undefining them on parent to Undefine attribute methods of descendants when resetting column information Dec 16, 2017

@shioyama

This comment has been minimized.

Contributor

shioyama commented Dec 16, 2017

I see your point now and agree, it doesn't really make sense to call reset_column_information on anything but the base class. Shall I just change the line to use self instead of base_class?

shioyama added some commits Dec 15, 2017

Undefine attribute methods on all descendants when resetting column info
If we don't do this, then we end up with an inconsistent situation where
a parent class may e.g. reset column information, but child classes will
contine to see attribute methods as already generated, and thus not pick
up this new column (falling through to method_missing).
@shioyama

This comment has been minimized.

Contributor

shioyama commented Dec 17, 2017

I went ahead and changed the line to use self instead of base_class, rebased on the latest, and added a changelog entry.

@shioyama

This comment has been minimized.

Contributor

shioyama commented Dec 17, 2017

Related: #22057

@shioyama

This comment has been minimized.

Contributor

shioyama commented Dec 17, 2017

Incidentally, not sure it's relevant, but commit 9deb6ab by @sgrif which fixes #22057 has two tests and one code change. The code change is:

diff --git a/activerecord/lib/active_record/model_schema.rb b/activerecord/lib/active_record/model_schema.rb
index a9bd094a66..e3f304b0af 100644
--- a/activerecord/lib/active_record/model_schema.rb
+++ b/activerecord/lib/active_record/model_schema.rb
@@ -339,6 +339,9 @@ def reload_schema_from_cache
         @columns = nil
         @columns_hash = nil
         @attribute_names = nil
+        direct_descendants.each do |descendant|
+          descendant.send(:reload_schema_from_cache)
+        end
       end

However, while the first test fails if this change is reversed on master, the second one (the one I've changed in this PR) currently passes even without the code change. It also passes with the updated test here.

@shioyama

This comment has been minimized.

Contributor

shioyama commented Dec 19, 2017

@matthewd I'd really like to get this merged, is there anything else I need to do?

My long-winded comments above maybe made this seem more complicated than it really is.

We agree that the updated spec here should pass. I think we also agree that the fix is to call define_attribute_methods on descendants. The only question is whether we call this on [self] + descendants or [base_class] + base_class.descendants, which to me seems a fairly minor distinction.

I don't like leaving PRs behind that get buried under newer ones and just become stale. If this won't get merged, I'll close it.

@matthewd matthewd merged commit d9e4bff into rails:master Dec 19, 2017

2 checks passed

codeclimate All good!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@matthewd

This comment has been minimized.

Member

matthewd commented Dec 19, 2017

We agree that the updated spec here should pass. I think we also agree that the fix is to call define_attribute_methods on descendants.

Yep, sounds fair, thanks 👍🏻

@shioyama

This comment has been minimized.

Contributor

shioyama commented Dec 19, 2017

Thanks! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment