gh-106727: Make `inspect.getsource` smarter for class for same name definitions #106815

gaogaotiantian · 2023-07-16T23:58:57Z

When there are multiple class definitions with the same class name found in a file, we currently just return the first one found. This is not the best solution.

It would be pretty difficult(maybe not impossible) to do this perfectly. A couple of heuristics are introduced:

If a method with a first_lineno is found in the class definition, we think that's our target.
If we find the docstring in the definition and only one of the candidates has the match, that's our target.
If we can't figure it out, pick the last one. (This is a slightly better guess than the first one).

One test case was removed - we happened to be correct on that case, just lucky. We can't solve it right now (flipping the if condition won't make a difference).

Issue: inspect.getsource (findsource) not working as expected when duplicate class is used #106727

carljm · 2023-07-18T17:29:59Z

The rationale for switching to last-wins instead of first-wins is reasonable: barring conditionals, last-wins at runtime.

I can also see some rationale to prefer leftmost (i.e. prefer top-level definition over something nested in e.g. a conditional.)

I'm not sure the rationale for the docstring or method based heuristics is strong enough. Simplicity of behavior has value. More heuristics make the behavior less predictable and harder to explain; we should be quite sure that they significantly increase our probability of giving the right answer.

Also, it seems like any changes to this heuristic could be backwards-incompatible and change behavior of existing code. Are we sure the benefit of changes here is sufficient to justify this potential breakage?

gaogaotiantian · 2023-07-18T18:35:07Z

Using methods is almost definitive when there's a positive match - that's the only thing we have for classes that contain some compile time info. As long as the class has a method, we can find it with a super high certainty.

docstring would work as long as the class to be found has a unique docstring - which may not be super common, but not super rare either.

The bottom line is - I don't think either of the heuristics would generate a false positive in any case (unless the user manually assign a method of one class to another).

Yes this is not completely backward-compatible, but the code we potentially could break, is the code that we want to improve. (I know the sentence is a bit twisted)

I think the real way to solve this is to add a compile time info on the class, but that's a long shot.

gaogaotiantian · 2023-07-18T18:54:02Z

How bad it is to add an extra dunderscore field that is not listed in dir for classes?

We have the information all along when we create the class, we just do not record it. This would solve this whole thing and may be useful in the future. An extra field won't break the existing code either. Just don't know how difficult it is to propose a new field on all classes.

carljm

Ok, agreed, the new heuristics do look like they should pretty much always be correct if they match.

The docstring heuristic looks like it could be expensive for a file with multiple large class definitions of the same name, but this should be quite unusual.

Can we add a test specifically showing the docstring heuristic in action?

Lib/inspect.py

carljm · 2023-07-18T19:06:00Z

How bad it is to add an extra dunderscore field that is not listed in dir for classes?

A new entire Python object attached to every class is a significant memory cost. It's not out of the question, but the value provided should be quite strong; I don't think making inspect.getsource work better in a few marginal edge cases meets the bar.

gaogaotiantian · 2023-07-18T22:45:04Z

Okay I agree adding a class field only for inspect.getsource probably is not the best idea. As for the test - it's not obvious on the diff, but the existing test for cls238.cls239 is exactly testing this:

if True:
    class cls238:
        class cls239:
            '''if clause cls239'''
else:
    class cls238:
        class cls239:
            '''else clause 239'''
            pass

This is the perfect case where the old way just get it correct by luck (it would've got it wrong if it's if False), but the new way can always get it correctly.

carljm

Ok, looks good to me! I guess there's no documentation update needed here, since this is a bugfix: inspect.getsource doesn't document its heuristics, it claims to just get the right source for the given class, and this will make it do that more often.

gaogaotiantian · 2023-07-18T23:40:42Z

Thank you for the review!

Viicos · 2023-07-19T08:40:13Z

Lib/inspect.py

+                if isinstance(member, types.FunctionType):
+                    for lineno, end_lineno in self.lineno_found:
+                        if lineno <= member.__code__.co_firstlineno <= end_lineno:
+                            return lineno


Using the line numbers of the __code__ attribute of the methods is really smart, I haven't thought about that! We can be pretty sure the right class is being detected this way.

As I said in the issue: #106727 (comment), I'm afraid of how it will behave with metaclasses:

class Meta(type): def __new__(mcs, name, bases, namespace): cls = super().__new__(mcs, name, bases, namespace) cls.my_method = lambda a: a return cls class A(metaclass=Meta): pass A.__dict__["my_method"].__code__.co_firstlineno #> 4

Will it raise false positives if the __code__.co_firstlineno attribute of the method defined in the metaclass happen to falls within the lineno <= ... <= end_lineno range?

Yes, if the metaclass and the class using it both have the same name, your example will confuse this heuristic. Nice observation!

But since the metaclass must come before the class using it, this just means we effectively maintain the previous first-one-wins behavior in this unusual scenario.

So while there may be scope for further improvement in edge cases, I'm still satisfied this PR was strictly an improvement over the previous behavior.

What I meant is if the metaclass is defined in another file, the __code__.co_firstlineno integer of any method of that metaclass could clash with a method defined on the inspected class. I'm greatly satisfied by the improvements made in this PR as well; I'll see if there's a way to handle this metaclass edge case in some way!

Actually, we can deal with this. For all the methods defined in the class, they should have __module__ attribute to indicate in which module it was defined. We can simply compare this to the class passed in to rule out all the cases where the metaclass is not defined within the same file. This will also rule out the dynamically added methods.

Of course, there could be other evil cases like assigning the method of one class to another, that's a different story.

But we should get a pretty decent improvement by checking the modules, I'll work on the PR.

Improve inspect.getsource for class

9e9d7af

bedevere-bot added the awaiting review label Jul 16, 2023

bedevere-bot mentioned this pull request Jul 16, 2023

inspect.getsource (findsource) not working as expected when duplicate class is used #106727

Closed

📜🤖 Added by blurb_it.

867730b

gaogaotiantian requested review from carljm and pablogsal July 17, 2023 00:02

Add newline at the end of the file

fc0a563

carljm reviewed Jul 18, 2023

View reviewed changes

Lib/inspect.py Show resolved Hide resolved

Merge branch 'python:main' into inspect-class

3c880fe

carljm approved these changes Jul 18, 2023

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting review labels Jul 18, 2023

carljm enabled auto-merge (squash) July 18, 2023 23:17

carljm merged commit 663854d into python:main Jul 18, 2023
25 checks passed

bedevere-bot removed the awaiting merge label Jul 18, 2023

Viicos reviewed Jul 19, 2023

View reviewed changes

gaogaotiantian mentioned this pull request Jul 21, 2023

gh-106727: Add __module__ check for inspect.getsource(cls) #106968

Merged

Viicos mentioned this pull request Aug 10, 2023

Extract attribute docstrings for FieldInfo.description pydantic/pydantic#6563

Merged

5 tasks

sobolevn mentioned this pull request Mar 14, 2024

[3.12] gh-116780: Fix test_inspect in -OO mode (GH-116788) #116799

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-106727: Make `inspect.getsource` smarter for class for same name definitions #106815

gh-106727: Make `inspect.getsource` smarter for class for same name definitions #106815

gaogaotiantian commented Jul 16, 2023 •

edited by bedevere-bot

carljm commented Jul 18, 2023

gaogaotiantian commented Jul 18, 2023

gaogaotiantian commented Jul 18, 2023 •

edited

carljm left a comment

carljm commented Jul 18, 2023

gaogaotiantian commented Jul 18, 2023

carljm left a comment

gaogaotiantian commented Jul 18, 2023

Viicos Jul 19, 2023 •

edited

carljm Jul 21, 2023

Viicos Jul 21, 2023

gaogaotiantian Jul 21, 2023

gh-106727: Make inspect.getsource smarter for class for same name definitions #106815

gh-106727: Make inspect.getsource smarter for class for same name definitions #106815

Conversation

gaogaotiantian commented Jul 16, 2023 • edited by bedevere-bot

carljm commented Jul 18, 2023

gaogaotiantian commented Jul 18, 2023

gaogaotiantian commented Jul 18, 2023 • edited

carljm left a comment

Choose a reason for hiding this comment

carljm commented Jul 18, 2023

gaogaotiantian commented Jul 18, 2023

carljm left a comment

Choose a reason for hiding this comment

gaogaotiantian commented Jul 18, 2023

Viicos Jul 19, 2023 • edited

Choose a reason for hiding this comment

carljm Jul 21, 2023

Choose a reason for hiding this comment

Viicos Jul 21, 2023

Choose a reason for hiding this comment

gaogaotiantian Jul 21, 2023

Choose a reason for hiding this comment

gh-106727: Make `inspect.getsource` smarter for class for same name definitions #106815

gh-106727: Make `inspect.getsource` smarter for class for same name definitions #106815

gaogaotiantian commented Jul 16, 2023 •

edited by bedevere-bot

gaogaotiantian commented Jul 18, 2023 •

edited

Viicos Jul 19, 2023 •

edited