fix(types): add some types based on MonkeyType #1152

henryiii · 2022-05-25T16:09:15Z

Typing is a bit too sparse to run mypyc, so I've tried to fill in a couple of modules (lark.py and utils.py) with the help of monkeytype. This is based on what was seen running the tests more than what things should be, so input and cleanup would be helpful!

Signed-off-by: Henry Schreiner <henryschreineriii@gmail.com>

lark/parsers/lalr_parser.py

henryiii · 2022-05-25T16:12:14Z

lark/utils.py

        return isinstance(value, self.types_to_memoize)

-    def serialize(self):
+    def serialize(self) -> Dict[int, Any]:  # type: ignore[override]


This time I just ignored the mismatch.

henryiii · 2022-05-25T16:12:59Z

lark/utils.py

@@ -134,7 +151,8 @@ def get_regexp_width(expr):
            raise ImportError('`regex` module must be installed in order to use Unicode categories.', expr)
        regexp_final = expr
    try:
-        return [int(x) for x in sre_parse.parse(regexp_final).getwidth()]
+        # TODO: bug? Returns an int?
+        return [int(x) for x in sre_parse.parse(regexp_final).getwidth()]   # type: ignore[attr-defined]


sre_parse.parse(regexp_final).getwidth() returns an int, which you can't iterate over. Not sure what was intended here.

getwidth() returns a tuple of (min_width, max_width)

>>> sre_parse.parse('a+').getwidth() (1, 4294967295)

Ahah, bug in typeshed! https://github.com/python/typeshed/blob/cb5b31cf15da14c63cb6467b5a29315471173ced/stdlib/sre_parse.pyi#L68

Well, it's quite an obscure feature. Most re libraries in other languages don't even have it.

Fixed in typeshed now, but missed today's mypy 0.960 release.

Oops, can't resolve until I fix the comment (tomorrow).

lark/utils.py

henryiii · 2022-05-25T16:15:17Z

lark/utils.py

-    return [x for x in l if not (x in dedup or dedup.add(x))]
+    # This returns None, but that's expected
+    return [x for x in l if not (x in dedup or dedup.add(x))]  # type: ignore[func-returns-value]
+    # 2x faster (ordered in PyPy and CPython 3.6+, gaurenteed to be ordered in Python 3.7+)
+    # return list(dict.fromkeys(l))


This surprises mypy, since it's using the None return. But I think using list(dict.fromkeys(l)) is simpler and faster (2x when I was trying it on a million ints).

I didn't write this function, but looks like for someone else it was the opposite. Don't know why.

If this was before 3.6+ was required, like during the 2.7 days, then this wouldn't work. And OrderedDict is 2x slower (OrderedDict and the current implementation are nearly identical for me). I'm not testing on the actual data, though.

Did you test it on heavily duplicated lists?

It's possible that fromkeys() keeps overwriting existing values, making it slower in that case.

(Just guessing, no idea if it's true)

Yes, 1,000,000 items with 100 unique values.

Signed-off-by: Henry Schreiner <henryschreineriii@gmail.com>

erezsh · 2022-09-15T19:20:28Z

@henryiii I'm sorry it took me so long to review this PR! I will try to be more prompt in the future.

I have two issues with this PR:

pickle.dump / load do not limit the input to BytesIO, so I don't think that we should either.
utils.py should not be aware of any Lark-specific types, it's a violation of concerns.
- If we need more specificity for functions like bfs() or classify(), it should be done through generics. e.g. bfs( Sequence[T] ) -> Iterator[T]
- If serialize/deserialize can't be expressed effectively using generics (though I think they should?), it's better to move them to a separate module like serialize.py

I created a new PR where I revert only these changes. Let me know if you agree with it - #1191

henryiii · 2022-09-16T01:30:56Z

That's fine, I'm assuming this will be an iterative process, and that's fine. :)

Adjustments for PR #1152

fix(types): add some types based on MonkeyType

e6b3284

Signed-off-by: Henry Schreiner <henryschreineriii@gmail.com>

henryiii commented May 25, 2022

View reviewed changes

lark/parsers/lalr_parser.py Outdated Show resolved Hide resolved

henryiii commented May 25, 2022

View reviewed changes

lark/utils.py Outdated Show resolved Hide resolved

henryiii commented May 25, 2022

View reviewed changes

henryiii mentioned this pull request May 25, 2022

fix: getwidth returns a tuple of ints, not an int python/typeshed#7951

Merged

fix(types): minor cleanup

f2e73d7

Signed-off-by: Henry Schreiner <henryschreineriii@gmail.com>

henryiii force-pushed the henryiii/fix/addtypes branch from d156be7 to f2e73d7 Compare May 25, 2022 20:18

fix: update note and one mistake

cfd72fa

Signed-off-by: Henry Schreiner <henryschreineriii@gmail.com>

erezsh added a commit that referenced this pull request Sep 15, 2022

Adjustments for PR #1152

a7896ee

henryiii closed this Sep 16, 2022

erezsh added a commit that referenced this pull request Sep 16, 2022

Merge pull request #1191 from lark-parser/adjust_pr1152

f009312

Adjustments for PR #1152

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(types): add some types based on MonkeyType #1152

fix(types): add some types based on MonkeyType #1152

henryiii commented May 25, 2022

henryiii May 25, 2022

henryiii May 25, 2022

erezsh May 25, 2022 •

edited

henryiii May 25, 2022

erezsh May 25, 2022

henryiii May 26, 2022

henryiii May 26, 2022

henryiii May 25, 2022

erezsh May 25, 2022

henryiii May 25, 2022

erezsh May 25, 2022

henryiii May 25, 2022

erezsh commented Sep 15, 2022

henryiii commented Sep 16, 2022

fix(types): add some types based on MonkeyType #1152

fix(types): add some types based on MonkeyType #1152

Conversation

henryiii commented May 25, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erezsh May 25, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erezsh commented Sep 15, 2022

henryiii commented Sep 16, 2022

erezsh May 25, 2022 •

edited