Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement generic AST pattern matcher helper #699

Closed
JukkaL opened this issue May 31, 2015 · 10 comments
Closed

Implement generic AST pattern matcher helper #699

JukkaL opened this issue May 31, 2015 · 10 comments

Comments

@JukkaL
Copy link
Collaborator

JukkaL commented May 31, 2015

Currently the mypy implementation has a lot of ad-hoc code that checks whether some part of an AST looks like something specific. For example, we have code that detects various special forms:

N = NamedTuple('N', [('x', int), ...])
T = TypeVar('T', int, str)

It would be better to have a generic utility that makes it easy to write code to detect cases like the above.

A straw man proposal:

pattern = AstPattern('NameExpr = "typing.NamedTuple"(StrExpr, [(StrExpr, Type), ...])')
result = pattern.match(ast_node)
if result:
    name, strlit, tuples = result

The pattern syntax would be a subset of Python syntax to make the implementation easier (but with different semantics).

@JukkaL
Copy link
Collaborator Author

JukkaL commented May 31, 2015

Something like this could be useful when implementing #698.

@elazarg
Copy link
Contributor

elazarg commented Sep 3, 2016

@JukkaL I felt this need too - I wanted to suggest that, but now I see there's an issue for this.

I think I'd like to work on it, if nobody else did it already.

@gvanrossum
Copy link
Member

Go for it!

@elazarg
Copy link
Contributor

elazarg commented Sep 14, 2016

I have something. The matcher looks like

Or(StringOfIds(_names), MatchList(_names, etc))

Where _names is a handle, bound through mutation.

A code excerpt (note that it uses inspect inside call.bind(), but that's an orthogonal issue):

def check_namedtuple(self, node: Node, var_name: str = None) -> TypeInfo:
    ...
    _typename, _names, _module, _types = Identifier(), Identifier(), Identifier(), WildCard()
    if callee.fullname == 'collections.namedtuple':
        def namedtuple(typename, field_names, *, verbose = ..., rename = ...): ...
        args = call.bind(namedtuple)
        match = Or(StringOfIds(_names), MatchList(_names, etc))
        bad_fields = match.find_mismatch(args.get('field_names'))
        bad_module = None
    elif callee.fullname == 'typing.NamedTuple':
        def NamedTuple(typename, fields, *, verbose = ..., rename = ..., module = ...): ...
        args = call.bind(NamedTuple)
        match = MatchList(MatchTuple(_names, _types), etc)
        bad_fields = match.find_mismatch(args.get('fields'))
        bad_module = _module.find_mismatch(args.get('module'))
    else:
        return None
    if not args:
        return self.build_namedtuple_typeinfo('namedtuple', [], [])
    if _typename.find_mismatch(args['typename']):
        self.fail("namedtuple() expects a string literal as the first argument", call)
    ...

It could have been cleaner if it need not be typechecked...

Together with the inspection, it saves about 40 lines in the main logic. It's paid back with interest, but mostly in a dedicated file, and it should be reusable (although I only tried it on namedtuple so far).

@gvanrossum gvanrossum added refactoring Changing mypy's internals topic-named-tuple labels Sep 14, 2016
@gvanrossum
Copy link
Member

@JukkaL, can you gude @elazarg here?

@JukkaL
Copy link
Collaborator Author

JukkaL commented Sep 14, 2016

Sure! I'm going to PyCon UK tomorrow so I may not have much time until next week.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Sep 20, 2016

The proposed syntax feels too magical for me. It's shorter than the current code but I had to read through it several times before I understood what is going on.

The real challenge seems to be to come up with something that is both easy to understand and compact (and not too difficult to implement, of course). Each line of code will be read many times but written only once, so ease of understanding is the key thing.

Here's another idea (again, not convinced it's a good idea, but here goes anyway):

if callee.fullname == 'collections.namedtuple':
    pattern = CallPattern('def _(typename, field_names, *, verbose=..., rename=...): ...')
    pattern.describe_arg('typename', StrExpr)
    pattern.describe_arg('field_names', Either(StrExpr, ListPattern(StrExpr)))
    result = pattern.match(call, self.fail)  # Generate error on failure
elif callee.fullname == 'typing.NamedTuple':
    pattern = CallPattern('def _(typename, fields, *, verbose=..., rename=..., module=...): ...')
    pattern.describe_arg('typename', StrExpr)
    pattern.describe_arg('fields', ListPattern(TuplePattern(StrExpr, Type)))
    result = pattern.match(call, self.fail)
else:
    return None
if not result:
    return ...  # Error
# use result[<argname>] to access things
...

Automating the generation of meaningful error messages would also be nice.

@elazarg
Copy link
Contributor

elazarg commented Sep 20, 2016

Your API is indeed more readable.

On first impression: one problem with CallPattern was writing the parser. I guess we can simply use existing facilities, but inspect will not be so easily applicable (and I want it - I don't want to reimplement argument matching). Besides, the example I gave has an actual implementation. Anyway, I may give CallPattern suggestion a try later on.

The reasons for making the mutation-based patterns was to be able to find and report the exact point of error, differentiate between ListPattern(TuplePattern(StrExpr, Type)) (a single element) and ListPattern(TuplePattern(StrExpr, Type), ...), and yet keep working if the error is not very serious. I'm not sure now these cannot be accommodated with less magic; I will think about it.

@elazarg
Copy link
Contributor

elazarg commented Sep 20, 2016

Another decision is what to leave to the type checker. I don't like the idea of rejecting valid code just because it's not a string literal.

@JukkaL
Copy link
Collaborator Author

JukkaL commented May 17, 2018

I don't think that is worth implementing, since we aren't adding new special forms that often any more, and just reviewing (not to mention implementing) a generic solution could take more effort than we'd save using it.

@JukkaL JukkaL closed this as completed May 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants