-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
PEP 515: Tokenizer: allow underscores for grouping in numeric literals #70519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As discussed on python-ideas: https://mail.python.org/pipermail/python-ideas/2016-February/038354.html The rules are:
Currently this only touches literals, not the inputs of int() or float(). Whether they should accept this syntax is debatable (I'd vote no). Otherwise missing: doc updates. Review question: is PyMem_RawStrdup/RawFree the right API to use here? |
I prefer simpler and more strict rule:
Thus 1__2, 12_, 1_.2, 1_e2, 1e_2, 1_j, 0x_12 are not allowed. It is easier to make the rule more lenient later if it will be needed. |
It sure is more strict, but I don't think it's simpler (and it's definitely not simpler to implement). (Also 1_j is pretty nice, I wouldn't want to lose that.) We can also check what other languages do.
|
[1] https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html |
+1. But in any case we need a PEP for this change. |
C++14 uses the same strict rule as Ada, but uses apostrphes instead of underscores. [1] Thus there are two groups of languages, implementing strict or lenient rules:
[1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3499.html |
PEP-515 is written up and posted to python-dev. |
Regarding the patch: if trailing underscores are not allowed, |
New patch matching revision of PEP. |
Proposed patch implements strict underscore rules. The implementation is not more complex. |
This patch includes int(), float(), complex() operations, as well as _pydecimal. |
New patch with minimal doc updates. |
I like the feature for literals, but I'm not sure about conversions from string. It slows down the conversion for (IMO) a very small benefit. Other languages allow it, but I've never attempted to use the feature: $ ocaml
OCaml version 4.02.1 # float_of_string "__1____2____.___e___101_";;
|
It's mostly for consistency. For example, In the current version of the conversions, the string is scanned for "_" before doing the more expensive allocation+copy. |
If the string conversions stay, may I suggest two functions:
The first one eliminates only underscores, the second one both Decimal must support both: https://hg.python.org/cpython/file/default/Modules/_decimal/_decimal.c#l1890 |
Correction: The explanation of the functions should be reversed. |
Thanks, I hadn't looked at cdecimal yet - I was planning to ask you to do the necessary changes there :) But there are a few versions of this (e.g. converting unicode digits to ASCII) scattered throughout the codebase, it would make sense to consolidate on this occasion. |
Oh, well. :)
Yes, actually I have to look at the _decimal version again, it contains https://hg.python.org/cpython/file/default/Modules/_decimal/_decimal.c#l1943 I *did* optimize it for speed at the time, I hope general functions won't be |
I still wonder about the complexity of all this for decimal. We now have two grammars on top of each other, this being the actual one for decimal: http://speleotrove.com/decimal/daconvs.html For string conversions I'd prefer a lax way (similar to OCaml) that would somehow be specified in terms of preprocessing, same as the leading/trailing whitespace removal. Short of "ignore all underscores" it isn't easy though. |
Hm. On the one hand there is a spec, so it can be argued that underscores don't belong to Decimal. On the other hand, if we get Decimal literals at one point, there will be a strong argument for allowing underscores in them as in all other number literals. Although supporting them in strings can also be added at that time. |
Raymond, you've also worked on Decimal - do you have an opinion on allowing underscores in Decimal(string) conversions? |
Nice one. While reimplementing it for Cython, I noticed that the grammar described in the PEP isn't exactly as it's implemented, though. The grammar says
whereas the latest patch (v4) says
and also implements it that way. The former doesn't allow underscores at the end of a literal. And the regexes in tokenize.py seem happy to accept "0x___", for example. Is that intended? |
The last patch isn't up to date with the PEP; Serhiy's patch is the closest one. |
Ah, thanks. Here's my implementation then: https://github.com/cython/cython/pull/499/files It seems that tests for valid complex literals are missing. I've added these to the end of the list:
|
New patch; implements the accepted version of the PEP. I added the additional tests, thanks Stefan! |
Note: the changes for format()ting ("_" as thousands separator) are still missing. Eric, would you consider doing this part? |
Yes, I'll read PEP-515 and work on the formatting. |
Thanks Eric! Serhiy, do you want to do a review? The v6/v7 patches are based on your "strict" patch with the constructor changes adapted from v4. New version v7 addresses the review comments from Stefan and Martin. |
Added comments on Rietveld. |
Thanks, Georg! The decimal parts look good to me. I understand that
I don't think that it will be a problem in practice. |
I've created bpo-27080 to track the formatting part of this. |
Thanks for the detailed review, Serhiy! Next try incoming. |
@Serhiy/anyone: can I get another review, so that we can commit this in time for beta? Thanks! |
Hi Georg, I left several comments on Rietveld. Hope it helps. |
Georg, do you think you will be able to get this in for 3.6b1? If not I can commit it while I'm at the core sprint. I'll wait until tomorrow to see if you reply, otherwise I'm just going to address patch comments and then commit it on your behalf. |
Please go ahead. Thanks for taking care of this! |
I'll get this committed today (patch still applies and passes the tests, so it should only take addressing the review comments and the What's New entry). |
New changeset 8a881dafe335 by Brett Cannon in branch 'default': |
Thanks Brett! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: