Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-106727: Make inspect.getsource smarter for class for same name definitions #106815

Merged
merged 4 commits into from Jul 18, 2023

Conversation

gaogaotiantian
Copy link
Member

@gaogaotiantian gaogaotiantian commented Jul 16, 2023

When there are multiple class definitions with the same class name found in a file, we currently just return the first one found. This is not the best solution.

It would be pretty difficult(maybe not impossible) to do this perfectly. A couple of heuristics are introduced:

  • If a method with a first_lineno is found in the class definition, we think that's our target.
  • If we find the docstring in the definition and only one of the candidates has the match, that's our target.
  • If we can't figure it out, pick the last one. (This is a slightly better guess than the first one).

One test case was removed - we happened to be correct on that case, just lucky. We can't solve it right now (flipping the if condition won't make a difference).

@carljm
Copy link
Member

carljm commented Jul 18, 2023

The rationale for switching to last-wins instead of first-wins is reasonable: barring conditionals, last-wins at runtime.

I can also see some rationale to prefer leftmost (i.e. prefer top-level definition over something nested in e.g. a conditional.)

I'm not sure the rationale for the docstring or method based heuristics is strong enough. Simplicity of behavior has value. More heuristics make the behavior less predictable and harder to explain; we should be quite sure that they significantly increase our probability of giving the right answer.

Also, it seems like any changes to this heuristic could be backwards-incompatible and change behavior of existing code. Are we sure the benefit of changes here is sufficient to justify this potential breakage?

@gaogaotiantian
Copy link
Member Author

Using methods is almost definitive when there's a positive match - that's the only thing we have for classes that contain some compile time info. As long as the class has a method, we can find it with a super high certainty.

docstring would work as long as the class to be found has a unique docstring - which may not be super common, but not super rare either.

The bottom line is - I don't think either of the heuristics would generate a false positive in any case (unless the user manually assign a method of one class to another).

Yes this is not completely backward-compatible, but the code we potentially could break, is the code that we want to improve. (I know the sentence is a bit twisted)

I think the real way to solve this is to add a compile time info on the class, but that's a long shot.

@gaogaotiantian
Copy link
Member Author

gaogaotiantian commented Jul 18, 2023

How bad it is to add an extra dunderscore field that is not listed in dir for classes?

We have the information all along when we create the class, we just do not record it. This would solve this whole thing and may be useful in the future. An extra field won't break the existing code either. Just don't know how difficult it is to propose a new field on all classes.

Copy link
Member

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, agreed, the new heuristics do look like they should pretty much always be correct if they match.

The docstring heuristic looks like it could be expensive for a file with multiple large class definitions of the same name, but this should be quite unusual.

Can we add a test specifically showing the docstring heuristic in action?

Lib/inspect.py Show resolved Hide resolved
@carljm
Copy link
Member

carljm commented Jul 18, 2023

How bad it is to add an extra dunderscore field that is not listed in dir for classes?

A new entire Python object attached to every class is a significant memory cost. It's not out of the question, but the value provided should be quite strong; I don't think making inspect.getsource work better in a few marginal edge cases meets the bar.

@gaogaotiantian
Copy link
Member Author

Okay I agree adding a class field only for inspect.getsource probably is not the best idea. As for the test - it's not obvious on the diff, but the existing test for cls238.cls239 is exactly testing this:

if True:
    class cls238:
        class cls239:
            '''if clause cls239'''
else:
    class cls238:
        class cls239:
            '''else clause 239'''
            pass

This is the perfect case where the old way just get it correct by luck (it would've got it wrong if it's if False), but the new way can always get it correctly.

Copy link
Member

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, looks good to me! I guess there's no documentation update needed here, since this is a bugfix: inspect.getsource doesn't document its heuristics, it claims to just get the right source for the given class, and this will make it do that more often.

@carljm carljm enabled auto-merge (squash) July 18, 2023 23:17
@carljm carljm merged commit 663854d into python:main Jul 18, 2023
25 checks passed
@gaogaotiantian
Copy link
Member Author

Thank you for the review!

Comment on lines +1081 to +1084
if isinstance(member, types.FunctionType):
for lineno, end_lineno in self.lineno_found:
if lineno <= member.__code__.co_firstlineno <= end_lineno:
return lineno
Copy link
Contributor

@Viicos Viicos Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the line numbers of the __code__ attribute of the methods is really smart, I haven't thought about that! We can be pretty sure the right class is being detected this way.

As I said in the issue: #106727 (comment), I'm afraid of how it will behave with metaclasses:

class Meta(type):
    def __new__(mcs, name, bases, namespace):
        cls = super().__new__(mcs, name, bases, namespace)
        cls.my_method = lambda a: a
        return cls


class A(metaclass=Meta):
    pass

A.__dict__["my_method"].__code__.co_firstlineno
#> 4

Will it raise false positives if the __code__.co_firstlineno attribute of the method defined in the metaclass happen to falls within the lineno <= ... <= end_lineno range?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if the metaclass and the class using it both have the same name, your example will confuse this heuristic. Nice observation!

But since the metaclass must come before the class using it, this just means we effectively maintain the previous first-one-wins behavior in this unusual scenario.

So while there may be scope for further improvement in edge cases, I'm still satisfied this PR was strictly an improvement over the previous behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is if the metaclass is defined in another file, the __code__.co_firstlineno integer of any method of that metaclass could clash with a method defined on the inspected class. I'm greatly satisfied by the improvements made in this PR as well; I'll see if there's a way to handle this metaclass edge case in some way!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we can deal with this. For all the methods defined in the class, they should have __module__ attribute to indicate in which module it was defined. We can simply compare this to the class passed in to rule out all the cases where the metaclass is not defined within the same file. This will also rule out the dynamically added methods.

Of course, there could be other evil cases like assigning the method of one class to another, that's a different story.

But we should get a pretty decent improvement by checking the modules, I'll work on the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants