PR: eliminate call_for_nodes #633

edreamleo · 2023-01-02T11:10:05Z

A preliminary PR for PR #632.

Background

rope.base.ast.call_for_nodes looks fishy to me:

It defines neither a depth first nor a breadth first traversal, but a strange partial mix.
I know of no simple description of what it does.
It is used in two search-related contexts, but in each case only a single node is returned.
Almost any change to call_for_nodes causes multiple unit tests to fail.

@lieryan Is there a simple explanation for what call_for_nodes does?

Summary of changes

Replace calls to call_for_nodes where the recursive kwarg is (implicitly) false.
See patched_ast and _PatchingASTWalker._handle.
Remove the recursive kwarg from call_for_nodes.
Remove the call to call_for_nodes in _ASTMatcher.find_matches.
Remove the call to call_for_nodes in _ResultChecker._find_node.
Remove call_for_nodes itself.

Why call_for_nodes should not be a utility

I understand neither what _ASTMatcher.find_matches and _ResultChecker._find_node are supposed to do nor how they do it. Imo, these strange traversals should be presented adjacent to the search code. Duplicating these traversals is better than creating a strange helper.

…/ekr-fork-rope2 into ekr-no-call-for-nodes2

lieryan · 2023-01-03T12:02:55Z

I don't know what makes you think that call_for_nodes is "strange", but it is just a simple, textbook pre-order depth-first search/traversal for ast nodes.

It's implemented as recursive function i.e. using implicit stack, and it has a mechanism for early exit once it found a match.

lieryan · 2023-01-03T13:11:51Z

_ASTMatcher.find_matches is basically the core workhorse for similarfinder, which also powers restructuring and a few other refactorings.

It's basically an AST-based code search mechanism, where Rope constructs a tree of ast nodes to match and _ASTMatcher tries to find code that matches that AST pattern.

_ResultChecker._find_node is just a helper for testing patchedast. It's for testing the sorted_children that was constructed by patchedast.py. It basically just try to find a node with a certain type, and then check that its sorted_children have the expected structure.

The output of patchedast.py itself (i.e. sorted_children) is used by other refactoring code, mainly extract and restructuring refactoring.

sorted_children have very similar structure to parso's node's children. There are going to be a bunch other potential issues for replacing stdlib ast with parso, but if we do that, than we may essentially get the sorted_children for free from parso.

edreamleo · 2023-01-03T13:49:59Z

@lieryan > call_for_nodes is just a simple, textbook pre-order DFS/traversal

This would be the textbook traversal:

def traverse(node, callback):
    callback(node)
    for child in ast.iter_child_nodes(node):
        traverse(child, callback)

call_for_nodes is an incomplete traversal:

def call_for_nodes(node, callback):
    result = callback(node)
    if not result:
        for child in ast.iter_child_nodes(node):
            call_for_nodes(child, callback)

call_for_nodes skips all children if the parent returns a result. Otherwise, call_for_nodes visits all children, even if one of the children's callback returns a result!

If more than one child returns a result, what happens depends on how the caller uses call_for_nodes. That's very strange, imo.

Rope uses call_for_nodes in two places. Let's look at each in turn.

_ASTMatcher.find_matches in similarfinder.py

def find_matches(self):
    if self.matches is None:
        self.matches = []
        ast.call_for_nodes(self.body, self._check_node)
    return self.matches

Only RawSimilarFinder._get_matched_asts instantiates _ASTMatcher.

_ASTMatcher.body is the same as RawSimilarFinder.ast.
I'll assume that _ASTMatcher.body could be any ast.AST node, but I might be mistaken.

Anyway, ASTMatcher._check_node appends nodes (via several helpers) to self.matches. But call_for_nodes isn't complete, so not all parts of self.body's subtree will contribute to self.matches!

Perhaps the code works (or almost works?) because the partial traversal happens when comparing two trees which are always (usually?) similar enough that the partial traversals match. But this is hand waving. I'm totally confused.

_ResultChecker._find_node in patchedasttest.py

def _find_node(self, text):
    goal = text
    if not isinstance(text, (tuple, list)):
        goal = [text]

    << define class Search >>

    search = Search()
    ast.call_for_nodes(self.ast, search)
    return search.result

class Search:
    result = None

    def __call__(self, node):
        for text in goal:
            if sys.version_info >= (3, 8) and text in [
                "Num",
                "Str",
                "NameConstant",
                "Ellipsis",
            ]:
                text = "Constant"
            if str(node).startswith(text):
                self.result = node
                break
            if node.__class__.__name__.startswith(text):
                self.result = node
                break
        return self.result is not None

Once again, the incomplete nature of call_for_node seems to make it difficult to characterize exactly which nodes get searched.

But Aha. Search.__call__ always returns True once an assignment is made. This means that the traversal stops at the first search! So at last I understand the code.

Using this Aha, perhaps_ResultChecker._find_node could be simplified. Something like this (not tested):

def _find_node(self, text):
    goal = text
    if not isinstance(text, (tuple, list)):
        goal = [text]

    def match(node):
        for text in goal:
            if sys.version_info >= (3, 8) and text in [
                "Num", "Str", "NameConstant", "Ellipsis",
            ]:
                text = "Constant"
            if str(node).startswith(text):
                raise Found(node)
            if node.__class__.__name__.startswith(text):
                raise Found(node)

    def find(node):
        match(node)
        for child in ast.iter_child_nodes(node):
            find(child)

    try:
        find(self.ast)
        return None
    except Found as node:
        return node

Summary

I don't understand why _ASTMatcher.find_matches works. If it (and its unit tests) work, they would seem to work almost by accident. If there are invariants involved, they are well hidden.

I think I do understand why _ResultChecker._find_node works. It might be possible to simplify it. Imo, call_for_nodes does Rope's code no favors.

edreamleo · 2023-01-03T14:09:59Z

@lieryan I am going to mark this as a draft while I play with simplifying _ResultChecker._find_node.

lieryan · 2023-01-03T15:15:20Z

Think of it like searching in a regular string:

s = "abc def ghi def"
s.index("def")

Once you found the location of "def" inside s, if the callback returned True, then it indicates that there's no need to search further inside the substring "def" to find more matches. This is the early exit part of call_for_nodes.

call_for_nodes would, however, continue searching for another match in a different part of the s, it doesn't just return the first match. In this case, it'd would continue and found both instances of "def".

Strictly speaking, _ResultChecker actually only return the last match, because earlier matches would gets overwritten, this is generally sufficient because most test cases only expects to have a single match anyway.

For the purpose extract refactoring and restructuring, they don't really have a well-defined meaning if the matched call nodes internally have additional matches. For similar finder, the current implementation only returns the outer-most match, you could potentially generate additional matches if the matched nodes have grandchildren that also matches the search pattern, but it may not necessarily be desirable to enable that by default.

edreamleo · 2023-01-03T15:19:23Z

@lieryan Thanks for your comments. Yes, I see you are correct--the last match governs. However, because the traversal is partial, exactly what the last match will be can't be described succinctly.

lieryan · 2023-01-03T15:28:21Z

For the purpose of _ResultChecker, tests that have multiple matches probably should've been considered a test-bug. There's really no benefit of processing multiple matches in this context, as it would just makes the test more confusing. _ResultChecker probably could've asserted that there's exactly one match (and we can rewrite the two tests that actually have more than one match).

The alternative is to make _ResultChecker's Search.result be a list of results instead of just a singular result, this would allow check_children and check_region to process multiple matches as result could accumulate more than one results. IMO, that seems to be unnecessarily complicating the test cases.

lieryan · 2023-01-03T15:39:53Z

However, because the traversal is partial, exactly what the last match will be can't be described succinctly.

Actually, because it's a pre-order traversal, the last match is well defined. If you are searching for:

(a + b) + (c + (d + e))

with pattern:

${var1} + ${var2}

Then the match would be the outer-most last match: (c + (d + e)). Since pre-order traversal calls the callback for the visited nodes first before recursing to their children, if the callback wants to early-exit, then call_for_nodes wouldn't even recurse to (d + e) part.

Note that returning the last match isn't inherent to call_for_nodes. That's just how _ResultChecker have implemented its Search class. call_for_nodes just provides a mechanism for early exit; if the caller wants to get all matches, then it could defined a callback that never returns True. It's just turn into regular pre order traversal in that case.

edreamleo · 2023-01-03T15:59:52Z

I am going to close this PR.

edreamleo · 2023-01-03T16:16:20Z

@lieryan Thanks for your comment. Imo it would be good to mention the concept of "outermost-last-match" in the docstring for call_for_nodes. In fact, since this function is used only in matching contexts, it might be good to rename the function.

lieryan · 2023-01-04T04:19:10Z

It's not correct to document the "outermost-last-match" in call_for_nodes, because it is really just a general purpose pre-order traversal algorithm, you can use it to handle multiple matches in whatever ways you choose depending on how you write the callback. Case in point, _ASTMatcher actually does handle multiple matches.

As previously mentioned, the "outermost-last-match" is only true for _ResultChecker only because _ResultChecker overwrites earlier matches. If _ResultChecker accumulates multiple matches instead in a list, then it would've collected all the matches. If it doesn't return True on match, then it would've recursively checked submatches. It's not for call_for_nodes to decide how the callback handle multiple matches.

If you want to document that it's "outermost-last-match" behavior, it would be in _ResultChecker, not in call_for_nodes. But this is a helper code for rope's own tests, so it's going to be mainly as internal documentation/developer note, not user/library documentation.

edreamleo marked this pull request as draft January 2, 2023 11:10

edreamleo added 5 commits January 2, 2023 07:42

Simplify call_for_nodes. A failed attempt at bypassing same

543efc2

Tweaks

e03d0bb

Remove last call to ast.call_for_nodes

2ba7485

Remove call_for_nodes

3f63335

Remove comment

4025a97

edreamleo marked this pull request as ready for review January 2, 2023 16:40

Merge branch 'master' into ekr-no-call-for-nodes2

22ceb81

This was referenced Jan 3, 2023

PR: Remove from ast import * from rope.base.ast #632

Closed

PR: Rename rope.base.ast.parse to rope_parse #635

Closed

edreamleo added 2 commits January 3, 2023 04:46

Merge branch 'master' into ekr-no-call-for-nodes2

eab84f5

Merge branch 'ekr-no-call-for-nodes2' of https://github.com/edreamleo…

8f71056

…/ekr-fork-rope2 into ekr-no-call-for-nodes2

edreamleo mentioned this pull request Jan 3, 2023

Remove call_for_nodes(recursive) argument #634

Merged

Merge branch 'master' into ekr-no-call-for-nodes2

831f2a4

edreamleo marked this pull request as draft January 3, 2023 14:09

edreamleo closed this Jan 3, 2023

edreamleo mentioned this pull request Jan 4, 2023

PR: Add comments & docstrings related to call_for_nodes #642

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR: eliminate call_for_nodes #633

PR: eliminate call_for_nodes #633

edreamleo commented Jan 2, 2023 •

edited

lieryan commented Jan 3, 2023 •

edited

lieryan commented Jan 3, 2023

edreamleo commented Jan 3, 2023 •

edited

edreamleo commented Jan 3, 2023

lieryan commented Jan 3, 2023 •

edited

edreamleo commented Jan 3, 2023

lieryan commented Jan 3, 2023

lieryan commented Jan 3, 2023 •

edited

edreamleo commented Jan 3, 2023

edreamleo commented Jan 3, 2023

lieryan commented Jan 4, 2023 •

edited

PR: eliminate call_for_nodes #633

PR: eliminate call_for_nodes #633

Conversation

edreamleo commented Jan 2, 2023 • edited

lieryan commented Jan 3, 2023 • edited

lieryan commented Jan 3, 2023

edreamleo commented Jan 3, 2023 • edited

edreamleo commented Jan 3, 2023

lieryan commented Jan 3, 2023 • edited

edreamleo commented Jan 3, 2023

lieryan commented Jan 3, 2023

lieryan commented Jan 3, 2023 • edited

edreamleo commented Jan 3, 2023

edreamleo commented Jan 3, 2023

lieryan commented Jan 4, 2023 • edited

edreamleo commented Jan 2, 2023 •

edited

lieryan commented Jan 3, 2023 •

edited

edreamleo commented Jan 3, 2023 •

edited

lieryan commented Jan 3, 2023 •

edited

lieryan commented Jan 3, 2023 •

edited

lieryan commented Jan 4, 2023 •

edited