Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Token now pattern matches correctly #1181

Merged
merged 5 commits into from
Oct 27, 2022

Conversation

orcharddweller
Copy link
Contributor

I've added fixes as per #1180 (comment). I don't think that the backward compatibility is possible in this case, as the original argument is positional, e.g.

Token(type_="TEST", value="test") # this will break no matter what, as positional arguments are not set properly

Token("TEST", "test") # this doesn't require any backwards compatibility changes

Please correct me if I'm missing something.

I've also added a bunch of tests (which only are run if the version is 3.10 or higher).
I wasn't sure how to use assert statements in structural pattern matching, but it does the job.

closes #1180

Copy link
Member

@MegaIng MegaIng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to provide backwards compatibility by not making type and value be required positional arguments, but instead use *args and **kwargs and have the actual supported signature(s) be only present in @overload declarations. But I am not sure if this should be done, you are correct that this is more work than I originally thought.

Also, the update method should also be updated with the changed parameter name.

@orcharddweller
Copy link
Contributor Author

Also, the update method should also be updated with the changed parameter name.

Fixed, also added setter for .type_.

@orcharddweller
Copy link
Contributor Author

It is possible to provide backwards compatibility by not making type and value be required positional arguments, but instead use *args and **kwargs and have the actual supported signature(s) be only present in @overload declarations. But I am not sure if this should be done, you are correct that this is more work than I originally thought.

Please let me know if you think this should be added, I'm happy to figure it out.

@erezsh
Copy link
Member

erezsh commented Aug 18, 2022

I was thinking we'll just add a new param, type_=None, and change inst.type = type to inst.type = type_ if type_ is not None else type

The downside is we'll have to set type and value to None too. But it doesn't break anything.

Btw, why not just do __match_args__ = __slots__ = (...) ?

P.S. we'll also need a deprecation warning for any use of type_.

@orcharddweller
Copy link
Contributor Author

I was thinking we'll just add a new param, type_=None, and change inst.type = type to inst.type = type_ if type_ is not None else type

That would make the typing inconsistent between new and the parameters.

The args thing seems to be ok.

Btw, why not just do match_args = slots = (...) ?

Does order of __slots__ matter? Because I think that __match_args__ should be in the same order as args to new and update

P.S. we'll also need a deprecation warning for any use of type_

On it.

lark/lexer.py Outdated Show resolved Hide resolved
@orcharddweller
Copy link
Contributor Author

orcharddweller commented Aug 25, 2022

Hey,
I've figured out that the easiest way to use *args and **kwargs with @overload, was to create methods for the old and new signatures and call them from the overloaded one.

Otherwise, I would have to manually write all the parsing of *args and **kwargs including errors etc. which would've been a huge mess.

Also, this way will allow you to remove the deprecated code in the future easily.

One caveat is that the errors will contain one of the wrapped functions names (still should be easily readable), e.g.:

TypeError: Token._deprecated_new() got multiple values for argument 'type_'

Also DeprecationWarnings are supressed by default. I'm not sure if you want to change this, so I left it as it is.

@orcharddweller
Copy link
Contributor Author

Also, I'm pretty sure it would be nice to add proper typings to Token.__new__ and to fix the typings for Token (most of the stuff there should be Optional), but it's not really in scope of the PR.

@erezsh
Copy link
Member

erezsh commented Aug 25, 2022

Why did you write

    def __new__(cls, *args, **kwargs):
        if "type_" in kwargs:
            return cls._deprecated_new(*args, **kwargs)

        return cls._future_new(*args, **kwargs)

Instead of just - ?

    def __new__(cls, *args, **kwargs):
        if "type_" in kwargs:
            kwargs["type"] = kwargs["type_"]

        return cls.same_new(*args, **kwargs)

Also DeprecationWarnings are supressed by default.

What do you mean?

most of the stuff there should be Optional

That's true! We should fix that.

P.S.

Earlier you wrote about adding a type_ parameters.

That would make the typing inconsistent between new and the parameters.

Can you please explain that a little bit more?

@orcharddweller
Copy link
Contributor Author

orcharddweller commented Aug 25, 2022

Why did you write

def __new__(cls, *args, **kwargs):
    if "type_" in kwargs:
        return cls._deprecated_new(*args, **kwargs)

    return cls._future_new(*args, **kwargs)

Instead of just - ?

def __new__(cls, *args, **kwargs):
    if "type_" in kwargs:
        kwargs["type"] = kwargs["type_"]

    return cls.same_new(*args, **kwargs)

Both should be fine, but in the first one, errors will be more readable:

TypeError: Token._deprecated_new() got multiple values for argument 'type_'

makes more sense if you put in something like Token("a", "A", type_="a"), as opposed to (which would be the error in your version):

TypeError: Token._same_new() got multiple values for argument 'type'

It's supposed to be removed, so I wouldn't worry too much about code duplication.

Also DeprecationWarnings are supressed by default.

What do you mean?

The warnings don't really show until we change the configs:
https://stackoverflow.com/a/20960427/7416999

At least they didn't show to me, when I tested this.

Earlier you wrote about adding a type_ parameters.

That would make the typing inconsistent between new and the parameters.

Can you please explain that a little bit more?

Unless we want to allow Token.type and Token.value to be optional, which would be a relatively big, and unnecessary change (as the @overload solution works), having type and value as optional in __new__ would lead to undefined behavior or require us to add runtime errors like TypeError("at least one of type and type_ has not to be None"). Having types properly statically checked might save some errors from happening on runtime.

@orcharddweller
Copy link
Contributor Author

That's true! We should fix that.

Fixed types and added typings to Token.__new__.

@erezsh
Copy link
Member

erezsh commented Aug 25, 2022

errors will be more readable:

That's a bit of a nitpick. I can easily add this, and get an even better error -

def __new__(cls, *args, **kwargs):
    if "type_" in kwargs:
        if "type" in kwargs:
             raise TypeError("Error: using both 'type' and the deprecated 'type_' as arguments.")
        kwargs["type"] = kwargs.pop("type_")

    return cls.same_new(*args, **kwargs)

The warnings don't really show until we change the configs:

That link is from 2014. But when I test it in Python 3.10 -

>>> import warnings
>>> warnings.warn('test', DeprecationWarning)
<stdin>:1: DeprecationWarning: test

@orcharddweller
Copy link
Contributor Author

errors will be more readable:

That's a bit of a nitpick. I can easily add this, and get an even better error -

def __new__(cls, *args, **kwargs):
    if "type_" in kwargs:
        if "type" in kwargs:
             raise TypeError("Error: using both 'type' and the deprecated 'type_' as arguments.")
        kwargs["type"] = kwargs.pop("type_")

    return cls.same_new(*args, **kwargs)

I don't know if this is necessarily a better error, as it does nothing to address my use case (which while valid, I admit is really minor). It's just a different error.

Nevertheless, I've added it, as I believe it makes a lot of sense here.

As I've said, both versions are fine to me, so I've changed to the one you prefer.

That link is from 2014. But when I test it in Python 3.10 -

>>> import warnings
>>> warnings.warn('test', DeprecationWarning)
<stdin>:1: DeprecationWarning: test

That link is 100% valid still. Please look at python docs regarding warnings.
Also, you can test it by importing from a module that warns DeprecationWarning. As far as I understand, warning filters are different in the interactive shell.

The issue can be solved by adding:

warnings.simplefilter('default', DeprecationWarning)

to the top of this file, but it breaks the standalone version, which I really don't understand. So, I'm leaving this the way it is.

@erezsh
Copy link
Member

erezsh commented Aug 25, 2022

Thanks!

I see, you're right, the behavior changes for imported modules. I'll look into adding it.

value if value is not None else self.value,
self
)

@classmethod
def new_borrow_pos(cls: Type[_T], type_: str, value: Any, borrow_t: 'Token') -> _T:
return cls(type_, value, borrow_t.start_pos, borrow_t.line, borrow_t.column, borrow_t.end_line, borrow_t.end_column, borrow_t.end_pos)

@property
def type_(self) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about this a bit more: Do we actually need this property for backwards compatibility? The type_ didn't exists before, and adding a attribute that is deprecated-on-release is a bit pointless.

If we decide to keep it, it should raise a DeprecationWarning.

@MegaIng
Copy link
Member

MegaIng commented Sep 30, 2022

In regards to Deprecation Warnings: They should be silenced by default, if someone cares they need to add the call to simplefilter themselves. We shouldn't do that in the library.

@erezsh erezsh merged commit d495035 into lark-parser:master Oct 27, 2022
@erezsh
Copy link
Member

erezsh commented Oct 27, 2022

@marcinplatek Thank you for your contribution!

Sorry it took so long to review and merge,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Token is inconsistent in py3.10 pattern matching
3 participants