Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose underscores in numeric literals #76

Merged
merged 9 commits into from Nov 14, 2017

Conversation

takenobu-hs
Copy link
Contributor

@takenobu-hs takenobu-hs commented Sep 30, 2017

The proposal has been accepted; the following discussion is mostly of historic interest.


This is a proposal to add underscores to numeric literals.
Underscores (_) in numeric literals are simply ignored.

Rendered

Costs and Drawbacks
-------------------
* Implementation costs are mostly related to lexers.
* Maintenance costs are related to compatibility. Compatibility can be handled with language extension like ``NumericUnderscores``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably make the proposal right away suggest this language extension, as a language extensions is required for such changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could fold it into BinaryLiterals or something like this, but yes, I generally agree that a new extension is probably best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback.
I agree with the language extension.
I will update the proposal for specifying language extension.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explicitly described the language extension at this commit.

@cartazio
Copy link

cartazio commented Sep 30, 2017 via email


.. code-block:: none

x = 10 * 000 * 000 :: Int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, for primitive types, like Int GHC does constant fold the expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for helpful information.


.. code-block:: none

decimal → digit{[_ | digit]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would allow trailing underscore, i.e. 1000_1000_, do we want such thing? If we do, then there should be an example in proposal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.
For float only, I assumed the trailing underscore.

For example, the following description is possible.
gravity = 6.674_08_e−11

@cartazio
Copy link

cartazio commented Sep 30, 2017 via email

@takenobu-hs
Copy link
Contributor Author

Heck, do we have a candidate patch yet? I might try if there’s not one already!

@cartazio, Thank you, I'm glad.
I am very glad if you do it!

@takenobu-hs
Copy link
Contributor Author

i'd personally thing we dont want trailing underscores, but i'm open to being convinc

phadej is also mentioned.
Please give me feedback on the necessity of trailing underscore.
The specification definition may be a little complicated.

@takenobu-hs
Copy link
Contributor Author

takenobu-hs commented Oct 1, 2017

I studied alternative proposal for the trailing underscore.
Please give me feedback which is better.

(1) current proposal:

decimal     →  digit{_ | digit}
octal       →  octit{_ | octit}
hexadecimal →  hexit{_ | hexit}
binary      →  binit{_ | binit}

exponent → (e | E) [+ | -] decimal

(2) alternative proposal for trailing underscore:

decimal     →  digit[{_ | digit} digit]
octal       →  octit[{_ | octit} octit]
hexadecimal →  hexit[{_ | hexit} hexit]
binary      →  binit[{_ | binit} binit]

exponent → [_] (e | E) [+ | -] decimal

Examples with (2) alternative proposal:

x0 = 1_000_000   -- valid
x1 = 1000000_    -- invalid
x2 = _1000000    -- invalid

e0 = 0.0001      -- valid
e1 = 0.000_1     -- valid
e2 = 0_.0001     -- invalid
e3 = _0.0001     -- invalid
e4 = 0._0001     -- invalid
e5 = 0.0001_     -- invalid

f0 = 1e+23       -- valid
f1 = 1_e+23      -- valid
f2 = 1e_+23      -- invalid

g0 = 1e+23       -- valid
g1 = 1e+_23      -- invalid
g2 = 1e+23_      -- invalid

@osa1
Copy link
Contributor

osa1 commented Oct 2, 2017

Strong 👍 from me. We often do things like

1 * 1000 * 1000
threadDelay 5000000 -- five sec

first one is weird (GHC can evaluate that in compile time but still), second one is dangerous if you forget to update the commend after updating the code. Rust has this feature and I often use it.

@treeowl
Copy link
Contributor

treeowl commented Oct 2, 2017

The first "alternative" currently has multiplication by zero, which looks like a mistake. I'm generally in favor of the proposal. I don't think trailing underscores are useful.

@treeowl
Copy link
Contributor

treeowl commented Oct 2, 2017

Previous comment edited to remove bogus information. Sorry for the confusion.

@osa1
Copy link
Contributor

osa1 commented Oct 2, 2017

Agreed that numerals with leading underscore should not be considered as a number. Trailing underscores don't seem useful but at least harmless so no strong feelings about that.

@treeowl
Copy link
Contributor

treeowl commented Oct 2, 2017 via email

@simonpj
Copy link
Contributor

simonpj commented Oct 2, 2017

I'm happy with this.

@merijn
Copy link

merijn commented Oct 2, 2017

I'm strongly +1 on this too.

@takenobu-hs
Copy link
Contributor Author

Hi everyone, I'm glad responses! Thank you very much.
I updated proposal:

  • Modify specification to remove trailing underscores
  • Add validity examples
  • Correct and add alternative examples

I chose the option to remove the trailing underscore.
I think that is easier to recognize as a number.

Please tell me if something is wrong.

@takenobu-hs
Copy link
Contributor Author

I will wait for a few weeks of discussion period :)

By the way, I wrote a patch for my studies [1].
However, I am glad that Carter or someone will write a more suitable patch.
Especially, it would be nice if someone could express Lexer simpler.

[1] ghc/ghc@master...takenobu-hs:wip/numeric-underscores

@cartazio
Copy link

cartazio commented Oct 19, 2017 via email

@takenobu-hs
Copy link
Contributor Author

@cartazio thank you, I am grateful for your help :)
I am not in a hurry, so please do it when you have time.
Please take care of yourself.

@cartazio
Copy link

cartazio commented Oct 20, 2017 via email

@takenobu-hs
Copy link
Contributor Author

Thank you, I will add a testcase for failures / disallowed patterns in testsuite/tests/parser/should_fail/ directory.

@takenobu-hs
Copy link
Contributor Author

Hi, I added the testcase in this [1].
(testsuite/tests/parser/should_fail/NumericUnderscoresFail0.hs)
Please let me know if you have any problems. I am not in a hurry. Thanks.

[1] ghc/ghc@master...takenobu-hs:wip/numeric-underscores

@takenobu-hs
Copy link
Contributor Author

Four weeks have passed since I submitted this proposal.
I think the discussion has converged.
I would like to submit this for a decision.

cc @nomeata

@nomeata nomeata added the Pending committee review The committee needs to evaluate the proposal and make a decision label Nov 1, 2017
@nomeata
Copy link
Contributor

nomeata commented Nov 1, 2017

Committee decision process started. Thanks for the ping, @bitemyapp, I missed the earlier one among too many GitHub notifications.

@takenobu-hs
Copy link
Contributor Author

@nomeata, thank you very much :)

@cartazio
Copy link

cartazio commented Nov 2, 2017 via email

@takenobu-hs
Copy link
Contributor Author

@cartazio thank you always.
If this proposal is accepted, I will send the patch to phab.
I would be happy if you could give me comments there :)

@yav
Copy link
Contributor

yav commented Nov 7, 2017

I think we should allow underscores between the base specifier and the digits for binary, octal and hex literals. For example, I think it would be handy to write 0x_FFFF_FFFF instead of 0xFFFF_FFFF

@takenobu-hs
Copy link
Contributor Author

@yav thanks for the advice.
I considered it as follows.

Current syntax (Haskell Report 2010 + BinaryLiterals + HexFloatLiterals):

decimal     →  digit{digit}
octal       →  octit{octit}
hexadecimal →  hexit{hexit}
binary      →  binit{binit}                                  -- BinaryLiterals

integer →  decimal
         | 0 (o | O) octal
         | 0 (x | X) hexadecimal
         | 0 (b | B) binary                                  -- BinaryLiterals

float →  decimal . decimal [exponent]
       | decimal exponent
       | 0 (x | X) hexadecimal . hexadecimal [bin_exponent]  -- HexFloatLiterals
       | 0 (x | X) hexadecimal bin_exponent                  -- HexFloatLiterals

exponent     →  (e | E) [+ | -] decimal
bin_exponent →  (p | P) [+ | -] decimal                      -- HexFloatLiterals

This proposal (revised):

decimal     →  digit[{_ | digit} digit]
octal       →  octit[{_ | octit} octit]
hexadecimal →  hexit[{_ | hexit} hexit]
binary      →  binit[{_ | binit} binit]

integer →  decimal
         | 0 (o | O) [_] octal                                      -- *** for 0o_123
         | 0 (x | X) [_] hexadecimal                                -- *** for 0x_ffff
         | 0 (b | B) [_] binary                                     -- *** for 0b_0101

float →  decimal . decimal [_] [exponent]
       | decimal [_] exponent
       | 0 (x | X) [_] hexadecimal . hexadecimal [[_]bin_exponent]  -- *** for 0x_ff.ff
       | 0 (x | X) [_] hexadecimal [_] bin_exponent                 -- *** for 0x_ffp1

The lines of *** are the parts you pointed out.
Please tell me if there is any misunderstanding.
If this is correct, I will update the proposal file.

@cartazio
Copy link

cartazio commented Nov 8, 2017 via email

@yav
Copy link
Contributor

yav commented Nov 8, 2017

@takenobu-hs Yep, that's what I was thinking.

@takenobu-hs
Copy link
Contributor Author

@yav thanks for confirmation.

@cartazio thanks for a clear perspective.
I investigated to make it easy to understand the changed part of the specification.
I revise the proposal as follows. The part of numSpacer are the change points in this proposal.

This proposal (revised):

-- places affected by this proposal
numSpacer = _

-- underscores between digits
decimal     →  digit[{numSpacer | digit} digit]
octal       →  octit[{numSpacer | octit} octit]
hexadecimal →  hexit[{numSpacer | hexit} hexit]
binary      →  binit[{numSpacer | binit} binit]

-- an underscore succeeding the base specifier
integer →  decimal
         | 0 (o | O) [numSpacer] octal
         | 0 (x | X) [numSpacer] hexadecimal
         | 0 (b | B) [numSpacer] binary

float →  decimal . decimal [exponent]
       | decimal exponent
       | 0 (x | X) [numSpacer] hexadecimal . hexadecimal [bin_exponent]
       | 0 (x | X) [numSpacer] hexadecimal bin_exponent

-- an underscore preceding the exponent
exponent     →  [numSpacer] (e | E) [+ | -] decimal
bin_exponent →  [numSpacer] (p | P) [+ | -] decimal

Please let me know if it is insufficient.
If this is sufficient, I will update the proposal file.

@yav
Copy link
Contributor

yav commented Nov 9, 2017

@takenobu-hs I think we should allow multiple underscores next to each other, and at least one of your examples in the proposal suggests that you also think so. To accommodate that I think that numSpacer should be _+ (i.e. 1 or more underscores).

Also, would you mind updating the actual proposal with the current version of the syntax, so that we have the whole proposal in a single place.

Here's another suggestion: I would drop the "Current specification of numeric literals" section. It is enough to just have "current syntax" and "new syntax", or even just "new syntax".

Add underscores between the base specifier and the digits. (from yav)
Modify underscore before the exponent and after the base specifier to multiple.
Correspond to HexFloatLiterals extention.
Reconstruction of section on current specification. (from yav)
Reconstruction of new proposal notation (from cartazio)
@takenobu-hs
Copy link
Contributor Author

@yav thanks for the review.

I think we should allow multiple underscores next to each other, and at least one of your examples in the proposal suggests that you also think so. To accommodate that I think that numSpacer should be _+ (i.e. 1 or more underscores).

In the current proposal, we could allow multiple underscores in decimal, octal, hexadecimal, and binary.
Because {numSpacer | digit} represents zero or more repetitions in Haskell Report 2010.
For example decimal allows 5__6.

In addition, I changed the underscore before the exponent and after the base specifier to multiple in this revision.

Also, would you mind updating the actual proposal with the current version of the syntax, so that we have the whole proposal in a single place.

Thanks. I pushed the actual proposal with the current version of the syntax.

Here's another suggestion: I would drop the "Current specification of numeric literals" section. It is enough to just have "current syntax" and "new syntax", or even just "new syntax".

Thanks. That makes the proposal easier to understand.
I dropped the "Current specification of numeric literals" section. And I reconstructed the specification part.

Please review the revised version.

@takenobu-hs
Copy link
Contributor Author

@yav I updated numSpacer to make regular expressions more uniform and simple.
I think that the changes from the current specification became easier to understand.
Please review this revised version.

@yav yav added Accepted The committee has decided to accept the proposal and removed Pending committee review The committee needs to evaluate the proposal and make a decision labels Nov 14, 2017
@yav
Copy link
Contributor

yav commented Nov 14, 2017

Ok, I've marked this as accepted.

@takenobu-hs you should consider submitting your patch to phabricator for review. I had a quick look at it and I have just one major suggestion: instead of having two sets of definitions for the literals (one with underscores and one without), it might be simpler to have only the definitions with underscores, and then have a separate function that validates the literals, ensuring that there are no underscores unless the extension is enabled.

@nomeata nomeata merged commit e5b5173 into ghc-proposals:master Nov 14, 2017
@takenobu-hs
Copy link
Contributor Author

Hi everyone thanks for a lot of advice and comments.
@yav, @nomeata, thanks for accept and merge.

@takenobu-hs you should consider submitting your patch to phabricator for review.

I will submit the patch to phabricator after improving the following.

I had a quick look at it and I have just one major suggestion: instead of having two sets of definitions for the literals (one with underscores and one without),

Thanks for valuable suggestions. That was my concern.

it might be simpler to have only the definitions with underscores, and then have a separate function that validates the literals, ensuring that there are no underscores unless the extension is enabled.

It's beautiful. I like your unified definition.
I will investigate how to implement the function for validation.
I have not understood the implementation manner in Lexer.x yet.
I'd be nice if someone tell me hints on how to implement it on trac or ghc-devs ML :)

By the way, should I create a ticket for this proposal?

@nomeata
Copy link
Contributor

nomeata commented Nov 15, 2017

By the way, should I create a ticket for this proposal?

Sure, that’d be helpful.

@takenobu-hs
Copy link
Contributor Author

I created the ticket (https://ghc.haskell.org/trac/ghc/ticket/14473).

In addition, I added the ticket number to 0009-numeric-underscores.rst and sent a PR (#95).

Thanks.

bgamari pushed a commit to ghc/ghc that referenced this pull request Jan 22, 2018
Implement the proposal of underscores in numeric literals.
Underscores in numeric literals are simply ignored.

The specification of the feature is available here:
https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/000
9-numeric-underscores.rst

For a discussion of the various choices:
ghc-proposals/ghc-proposals#76

Implementation detail:

* Added dynamic flag
  * `NumericUnderscores` extension flag is added for this feature.

* Alex "Regular expression macros" in Lexer.x
  * Add `@numspc` (numeric spacer) macro to represent multiple
    underscores.
  * Modify `@decimal`, `@decimal`, `@binary`, `@octal`, `@hexadecimal`,
    `@exponent`, and `@bin_exponent` macros to include `@numspc`.

* Alex "Rules" in Lexer.x
  * To be simpler, we have only the definitions with underscores.
    And then we have a separate function (`tok_integral` and `tok_frac`)
    that validates the literals.

* Validation functions in Lexer.x
  * `tok_integral` and `tok_frac` functions validate
    whether contain underscores or not.
    If `NumericUnderscores` extensions are not enabled,
    check that there are no underscores.
  * `tok_frac` function is created by merging `strtoken` and
    `init_strtoken`.
  * `init_strtoken` is deleted. Because it is no longer used.

* Remove underscores from target literal string
  * `parseUnsignedInteger`, `readRational__`, and `readHexRational} use
    the customized `span'` function to remove underscores.

* Added Testcase
  * testcase for NumericUnderscores enabled.
      NumericUnderscores0.hs and NumericUnderscores1.hs
  * testcase for NumericUnderscores disabled.
      NoNumericUnderscores0.hs and NoNumericUnderscores1.hs
  * testcase to invalid pattern for NumericUnderscores enabled.
      NumericUnderscoresFail0.hs and NumericUnderscoresFail1.hs

Test Plan: `validate` including the above testcase

Reviewers: goldfire, bgamari

Reviewed By: bgamari

Subscribers: carter, rwbarton, thomie

GHC Trac Issues: #14473

Differential Revision: https://phabricator.haskell.org/D4235
@nomeata nomeata added the Implemented The proposal has been implemented and has hit GHC master label Dec 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted The committee has decided to accept the proposal Implemented The proposal has been implemented and has hit GHC master
Development

Successfully merging this pull request may close these issues.

None yet