Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to compare AST nodes for equality recursively #60191

Open
Julian mannequin opened this issue Sep 20, 2012 · 20 comments
Open

Provide a way to compare AST nodes for equality recursively #60191

Julian mannequin opened this issue Sep 20, 2012 · 20 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@Julian
Copy link
Mannequin

Julian mannequin commented Sep 20, 2012

BPO 15987
Nosy @benjaminp, @Julian, @serhiy-storchaka, @ilevkivskyi, @mlouielu, @tonybaloney, @tonybaloney, @isidentical
PRs
  • bpo-15987: Add ast.AST class richcompare methods #1368
  • bpo-15987: Add ast.AST richcompare methods #1375
  • bpo-15987: Add ast.AST class richcompare methods  #14970
  • gh-60191: Implement ast.compare #19211
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2012-09-20.18:43:56.747>
    labels = ['3.8', 'type-feature', 'library']
    title = 'Provide a way to compare AST nodes for equality recursively'
    updated_at = <Date 2020-03-28.18:12:54.152>
    user = 'https://github.com/Julian'

    bugs.python.org fields:

    activity = <Date 2020-03-28.18:12:54.152>
    actor = 'BTaskaya'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2012-09-20.18:43:56.747>
    creator = 'Julian'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 15987
    keywords = ['patch']
    message_count = 17.0
    messages = ['170826', '170907', '170909', '171000', '185288', '185289', '292656', '292665', '292717', '341541', '341546', '341555', '342118', '348545', '349185', '365213', '365215']
    nosy_count = 9.0
    nosy_names = ['benjamin.peterson', 'Julian', 'serhiy.storchaka', 'levkivskyi', 'louielu', 'anthonypjshaw', 'anthony shaw', 'BTaskaya', 'Philip Dye']
    pr_nums = ['1368', '1375', '14970', '19211']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue15987'
    versions = ['Python 3.8']

    @Julian
    Copy link
    Mannequin Author

    Julian mannequin commented Sep 20, 2012

    As is, as far as I can tell, there's no way to easily compare two AST nodes to see if they have the same children and same fields (recursively).

    I'm writing some unit tests for a NodeTransformers, so I've settled for comparing ast.dump()s of each, which is kind of dirty, but 1) works and 2) produces reasonable failure messages. (As a side note of what else I've tried, comparing, say, a list of the iter_child_nodes is not a good alternative, since the tests I'm writing are making assertions that a given node was not modified, which means they deepcopy the node and then want to assert that the "transformed" node is unchanged.)

    I don't know the global implications of changing ast.AST.__eq__ to know if that's feasible (hopefully someone will comment on that), but if it isn't, another provided way would be nice.

    @Julian Julian mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Sep 20, 2012
    @benjaminp
    Copy link
    Contributor

    This is a reasonable request. Should comparison include lineno/col_offset or not? If you implement, __eq__, you should also implement __hash__. Maybe, it would be best to start with a helper comparison function in the ast module.

    @Julian
    Copy link
    Mannequin Author

    Julian mannequin commented Sep 21, 2012

    I'd say yes (to both lineno/col_offset). And yeah that sounds like what I had in mind (a helper function).

    If I'm specific for a moment about implementation, perhaps something like ast.diff, which yielded tuples of differing nodes (in say, breadth first order down the tree) from two given nodes, and took args for configuration, compare_lineno and compare_col_offset (both defaulted to True), and then eq was just next(ast.diff(self, other, compare_lineno=True, compare_col_offset=True), None) is None.

    Sound good to you?

    @benjaminp
    Copy link
    Contributor

    Yes, though some things like what to return if one has an entire subtree that the other doesn't have will have to be worked out.

    @brettcannon
    Copy link
    Member

    I have a use for this as well, but w/o the lineno/col_offset comparison and all I need is a True/False result.

    @brettcannon
    Copy link
    Member

    IOW I think a function that uses ast.walk() and has flags for specifying whether _attributes should also be checked and then uses a class check and then uses _fields to do all other checking.

    @mlouielu
    Copy link
    Mannequin

    mlouielu mannequin commented May 1, 2017

    If we only need a binary True/False result, could we just return a compare of dump(a) == dump(b)?

    This can also add the include_attributes flags for need.

    @mlouielu
    Copy link
    Mannequin

    mlouielu mannequin commented May 1, 2017

    Provide a recursive way to compare AST nodes, it will compare with fields, but no attributes.

    The performance compare with ast.dump methods in unittest

    # Recursive compare
    ./python -m unittest test.test_ast.ASTCompareTest
    ......
    ----------------------------------------------------------------------
    Ran 6 tests in 0.669s

    OK

    # ast.dump compare
    ./python -m unittest test.test_ast.ASTCompareTest
    ......
    ----------------------------------------------------------------------
    Ran 6 tests in 2.192s

    @mlouielu
    Copy link
    Mannequin

    mlouielu mannequin commented May 2, 2017

    Update to AST base type richcompare.

    the unittest finished time was better than python version:

    Ran 7 tests in 0.278s
    

    @tonybaloney
    Copy link
    Mannequin

    tonybaloney mannequin commented May 6, 2019

    This discussion is inconclusive and targets an old version of CPython, can this issue be closed?

    @tonybaloney
    Copy link
    Mannequin

    tonybaloney mannequin commented May 6, 2019

    Closing issue, PR branch has since been removed and targets Python 3.4

    @tonybaloney tonybaloney mannequin closed this as completed May 6, 2019
    @vstinner
    Copy link
    Member

    vstinner commented May 6, 2019

    Please don't close an issue too fast.

    The PR 1368 is still open and contains valuable code.

    Moreover, I don't see anyone here saying that the feature is a bad idea. The feature has not been implemented, so the issue should remain open, even if PR 1368 is outdated.

    ...issue... targets Python 3.4

    Well, it's just that the issue has been created a long time ago, but the feature request remain value for Python 3.8.

    @vstinner vstinner reopened this May 6, 2019
    @vstinner vstinner added the 3.8 only security fixes label May 10, 2019
    @ilevkivskyi
    Copy link
    Member

    Btw, I am +1 on this feature (preferably with an option to check line, column, end line, and end column). I always wanted this, but never had time to actually implement this.

    @PhilipDye
    Copy link
    Mannequin

    PhilipDye mannequin commented Jul 27, 2019

    If consensus has been reached on this, I am willing to do the work.

    @ilevkivskyi
    Copy link
    Member

    If consensus has been reached on this, I am willing to do the work.

    It looks like there is already an active PR #14970, there are some non-implemented comments from a core review.

    @serhiy-storchaka
    Copy link
    Member

    I am not sure that implementing a rich comparison of AST nodes is the right way. There are too much options (compare attributes or not, compare recursively or not, compare types or not) and __eq__ does not support options. In addition, it requires implementing compatible __hash__, but how do you implement a hash of mutable object? In addition, recursive comparison can introduce a regression in existing code -- instead of fast returning False the comparison will spent a nontrivial amount of time.

    If implement a comparison of AST nodes, it should be a separate function which support multiple options. Would be nice also to have examples where this feature can be used before implementing it.

    See also bpo-37792.

    @isidentical
    Copy link
    Sponsor Member

    The solution to hashing used in PR 14970 was just using the default hashing function which is kind of a workaround to the real problem. I now concur with your comments. Will try to draft out an ast.compare function.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @Eclips4 Eclips4 self-assigned this May 10, 2024
    @Eclips4 Eclips4 removed the 3.8 only security fixes label May 10, 2024
    @Eclips4
    Copy link
    Member

    Eclips4 commented May 11, 2024

    So, before implementing this, I would like to create a discussion on discuss.python.org to gather user feedback on what this comparison should do.

    Is it a full comparison that includes comparing each field, or is it a structural comparison nodes, where's these nodes are treated as equal BinOp(left=Constant(5), op=Mul(), right=Constant(2)) and BinOp(left= Constant(1), op = Mul(), right = Constant(1))?

    The second thing is about hashing: our last nodes are mutable, so should we make them immutable and provide a ast.replace() function?

    jeremyhylton added a commit that referenced this issue May 22, 2024
    * bpo-15987: Implement ast.compare
    
    Add a compare() function that compares two ASTs for structural equality. There are two set of attributes on AST node objects, fields and attributes. The fields are always compared, since they represent the actual structure of the code. The attributes can be optionally be included in the comparison. Attributes capture things like line numbers of column offsets, so comparing them involves test whether the layout of the program text is the same. Since whitespace seems inessential for comparing ASTs, the default is to compare fields but not attributes.
    
    ASTs are just Python objects that can be modified in arbitrary ways. The API for ASTs is under-specified in the presence of user modifications to objects. The comparison respects modifications to fields and attributes, and to _fields and _attributes attributes. A user could create obviously malformed objects, and the code will probably fail with an AttributeError when that happens. (For example, adding "spam" to _fields but not adding a "spam" attribute to the object.) 
    
    Co-authored-by: Jeremy Hylton <jeremy@alum.mit.edu>
    @jeremyhylton
    Copy link
    Contributor

    @Eclips4 I missed this discussion when I was working on pr19211 at the sprints. Happy to continue discussion on the topic, as we have a little time before 3.14 is released.

    As you see in the PR, this is a separate compare() function that totally side steps things like eq and hash. It also focuses on full-field-level equality. Seems hard to say that two ASTs should be equal if they are "1 + 2" and "2 + 3" since they are literally different (literally). A comparison function that says "some differences in the code are allowed" might be framed as more of a pattern-matching problem that could answer questions like "Are these both arithmetic expressions that have the same structure except for constants?" Or "Are these both functions of two-positional arguments that don't call other functions?"

    @Eclips4
    Copy link
    Member

    Eclips4 commented Jun 23, 2024

    @Eclips4 I missed this discussion when I was working on pr19211 at the sprints. Happy to continue discussion on the topic, as we have a little time before 3.14 is released.

    As you see in the PR, this is a separate compare() function that totally side steps things like eq and hash. It also focuses on full-field-level equality. Seems hard to say that two ASTs should be equal if they are "1 + 2" and "2 + 3" since they are literally different (literally). A comparison function that says "some differences in the code are allowed" might be framed as more of a pattern-matching problem that could answer questions like "Are these both arithmetic expressions that have the same structure except for constants?" Or "Are these both functions of two-positional arguments that don't call other functions?"

    I think the current implementation is good enough. Actually, if we have compare maybe it's good to have something like a contains?

    estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
    * bpo-15987: Implement ast.compare
    
    Add a compare() function that compares two ASTs for structural equality. There are two set of attributes on AST node objects, fields and attributes. The fields are always compared, since they represent the actual structure of the code. The attributes can be optionally be included in the comparison. Attributes capture things like line numbers of column offsets, so comparing them involves test whether the layout of the program text is the same. Since whitespace seems inessential for comparing ASTs, the default is to compare fields but not attributes.
    
    ASTs are just Python objects that can be modified in arbitrary ways. The API for ASTs is under-specified in the presence of user modifications to objects. The comparison respects modifications to fields and attributes, and to _fields and _attributes attributes. A user could create obviously malformed objects, and the code will probably fail with an AttributeError when that happens. (For example, adding "spam" to _fields but not adding a "spam" attribute to the object.) 
    
    Co-authored-by: Jeremy Hylton <jeremy@alum.mit.edu>
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants