New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix python.number
pattern
#1028
Conversation
Thanks! I'll have a look.
We don't need to be backward-compatible with bugs :) |
This pattern FLOAT_NUMBER.2: (DEC_NUMBER "." DEC_NUMBER? | "." DEC_NUMBER) (("E" | "e") ["+" | "-"] DEC_NUMBER)?
| DEC_NUMBER (("E" | "e") ["+" | "-"] DEC_NUMBER)? cannot parse If I change it to FLOAT_NUMBER.2: (DEC_NUMBER "." DEC_NUMBER? | "." DEC_NUMBER)
| (DEC_NUMBER "." DEC_NUMBER? | "." DEC_NUMBER) (("E" | "e") ["+" | "-"] DEC_NUMBER)
| DEC_NUMBER (("E" | "e") ["+" | "-"] DEC_NUMBER)? then everything works fine. The reason is regexp, because if I use DIGIT: "0".."9"
DEC_NUMBER: DIGIT ("_"? DIGIT)* instead of previously used DEC_NUMBER: /\d(?:_?\d)*/i than both patterns for This seems unintentional and therefore worth investigation. |
ok, so I adapted tests from cpython here's valid cases
and invalid ones (each should result in the exception)
|
ok, now the pr is tested and ready for review, and if everything is fine ready for merge! If anyone could spot an untested case, just write it here, I'll add it to this comment and test it |
Mind creating a new If you don't want to create the unittest, I will do it. Also, the reason I didn't add those tests till now is because I didn't manage to find CPython's tests for their parser. Where did you find them? |
Well, currently Python uses PEG parser generator, so you may be interested in checking it's tests, there's such things like tests for match cases (I know you're trying to megre pr with them) Cpython also have it's own tests and all of them live in the Lib/test. I myself took cases from test_int.py |
oh, ok, |
No |
there's bug, if I use it even can't parse
and even if it could I'd still wouldn't rely on that and test each rule\terminal one-by-one |
Are you putting a newline at the end?
That's probably better anyway. |
I shouldn't have to, It's weird to oblige me to do this as exec can consists of several eval's |
It's the same weird obligation that actual python grammar also has. The point is that any higher level parsing function can just always append a newline with no loss. The problem is that it becomes annoying to express the correct behavior when you can't rely on every line ending in a newline. |
@0dminnimda It's not ideal, but making the last newline optional requires support for the EOF feature, which we're keeping back because the existing implementation has issues. It would also make the parser a little heavier, so it's not all positive, even if it worked. |
@erezsh ok, it looks like we are now ready to merge, I'm waiting for you. |
I wonder why you chose to test by specifying a different The latter approach also has the advantage of testing for collision in token regexp/priority. |
I didn't think about this approach. |
@erezsh I updated tests, any other comments? |
@erezsh could you please review and possibly merge this PR? |
@0dminnimda Looks fine. But please rebase it. It should be at most 3 commits, instead of 23. (I can rebase it for you, if you prefer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for the most part, only one small thing left.
Why not squash and merge? |
@0dminnimda Squash is done by rebasing, no? Either way, whatever gets it down to a few commits. |
No and that's the point.
Well yeah, probably one commit is the best option |
Ok, @MegaIng do you have any other comments? |
Ok, now we're probably ready to merge, @erezsh |
Rebased and merged: 2ec9636 |
Thanks for the PR! |
Yeah, that's why I was talking about squash and merge, because commit that'll end up in the repo gonna be assigned to me, you know, fair credit/blame distribution |
Sorry, I do the merges via the shell, because I want to run extra tests on the way, and fix up any remaining issues. The commits are still signed to your name. So you will still get both credit and blame 😉 |
No worries. I'll check if there's a way to connect the merge to a PR from the shell, for next time. After all, I want contributors to feel appreciated and rewarded. (as much as open source allows :) |
Python doesn't accept numbers with the
_
in the beginning/end and numbers with more than one_
in the allowed places:the same goes with complex numbers. And yes, python recognizes
_xxx
as a name, even thoughx
is a digit, but it's still not a number, so this doesn't affect us.The current implementation only filters numbers with
_
in the beginning, so here's the fix for the other cases.Hopefully, we still can make backward-incompatible changes, so it's fine to change
IMAG_NUMBER
toCOMPLEX_NUMBER
I also tested
\d(?:_?\d+)*
forDEC_NUMBER
but haven't seen any significant performance changes (everything is within the normal range, considering that I was not using a stable environment).