Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose underscores in numeric literals #76

Merged
merged 9 commits into from Nov 14, 2017

Conversation

Projects
None yet
@takenobu-hs
Copy link
Contributor

commented Sep 30, 2017

The proposal has been accepted; the following discussion is mostly of historic interest.


This is a proposal to add underscores to numeric literals.
Underscores (_) in numeric literals are simply ignored.

Rendered

Costs and Drawbacks
-------------------
* Implementation costs are mostly related to lexers.
* Maintenance costs are related to compatibility. Compatibility can be handled with language extension like ``NumericUnderscores``.

This comment has been minimized.

Copy link
@nomeata

nomeata Sep 30, 2017

Contributor

You should probably make the proposal right away suggest this language extension, as a language extensions is required for such changes.

This comment has been minimized.

Copy link
@bgamari

bgamari Sep 30, 2017

Contributor

We could fold it into BinaryLiterals or something like this, but yes, I generally agree that a new extension is probably best.

This comment has been minimized.

Copy link
@takenobu-hs

takenobu-hs Oct 1, 2017

Author Contributor

Thanks for the feedback.
I agree with the language extension.
I will update the proposal for specifying language extension.

This comment has been minimized.

Copy link
@takenobu-hs

takenobu-hs Oct 1, 2017

Author Contributor

I explicitly described the language extension at this commit.

@cartazio

This comment has been minimized.

Copy link

commented Sep 30, 2017


.. code-block:: none

x = 10 * 000 * 000 :: Int

This comment has been minimized.

Copy link
@phadej

phadej Sep 30, 2017

Contributor

FWIW, for primitive types, like Int GHC does constant fold the expression.

This comment has been minimized.

Copy link
@takenobu-hs

takenobu-hs Oct 1, 2017

Author Contributor

Thank you for helpful information.


.. code-block:: none

decimal → digit{[_ | digit]}

This comment has been minimized.

Copy link
@phadej

phadej Sep 30, 2017

Contributor

This would allow trailing underscore, i.e. 1000_1000_, do we want such thing? If we do, then there should be an example in proposal.

This comment has been minimized.

Copy link
@takenobu-hs

takenobu-hs Oct 1, 2017

Author Contributor

Thank you.
For float only, I assumed the trailing underscore.

For example, the following description is possible.
gravity = 6.674_08_e−11

@cartazio

This comment has been minimized.

Copy link

commented Sep 30, 2017

@takenobu-hs

This comment has been minimized.

Copy link
Contributor Author

commented Oct 1, 2017

Heck, do we have a candidate patch yet? I might try if there’s not one already!

@cartazio, Thank you, I'm glad.
I am very glad if you do it!

@takenobu-hs

This comment has been minimized.

Copy link
Contributor Author

commented Oct 1, 2017

i'd personally thing we dont want trailing underscores, but i'm open to being convinc

phadej is also mentioned.
Please give me feedback on the necessity of trailing underscore.
The specification definition may be a little complicated.

@takenobu-hs

This comment has been minimized.

Copy link
Contributor Author

commented Oct 1, 2017

I studied alternative proposal for the trailing underscore.
Please give me feedback which is better.

(1) current proposal:

decimal     →  digit{_ | digit}
octal       →  octit{_ | octit}
hexadecimal →  hexit{_ | hexit}
binary      →  binit{_ | binit}

exponent → (e | E) [+ | -] decimal

(2) alternative proposal for trailing underscore:

decimal     →  digit[{_ | digit} digit]
octal       →  octit[{_ | octit} octit]
hexadecimal →  hexit[{_ | hexit} hexit]
binary      →  binit[{_ | binit} binit]

exponent → [_] (e | E) [+ | -] decimal

Examples with (2) alternative proposal:

x0 = 1_000_000   -- valid
x1 = 1000000_    -- invalid
x2 = _1000000    -- invalid

e0 = 0.0001      -- valid
e1 = 0.000_1     -- valid
e2 = 0_.0001     -- invalid
e3 = _0.0001     -- invalid
e4 = 0._0001     -- invalid
e5 = 0.0001_     -- invalid

f0 = 1e+23       -- valid
f1 = 1_e+23      -- valid
f2 = 1e_+23      -- invalid

g0 = 1e+23       -- valid
g1 = 1e+_23      -- invalid
g2 = 1e+23_      -- invalid
@osa1

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2017

Strong 👍 from me. We often do things like

1 * 1000 * 1000
threadDelay 5000000 -- five sec

first one is weird (GHC can evaluate that in compile time but still), second one is dangerous if you forget to update the commend after updating the code. Rust has this feature and I often use it.

@treeowl

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2017

The first "alternative" currently has multiplication by zero, which looks like a mistake. I'm generally in favor of the proposal. I don't think trailing underscores are useful.

@treeowl

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2017

Previous comment edited to remove bogus information. Sorry for the confusion.

@osa1

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2017

Agreed that numerals with leading underscore should not be considered as a number. Trailing underscores don't seem useful but at least harmless so no strong feelings about that.

@treeowl

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2017

@simonpj

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2017

I'm happy with this.

@merijn

This comment has been minimized.

Copy link

commented Oct 2, 2017

I'm strongly +1 on this too.

@takenobu-hs

This comment has been minimized.

Copy link
Contributor Author

commented Oct 2, 2017

Hi everyone, I'm glad responses! Thank you very much.
I updated proposal:

  • Modify specification to remove trailing underscores
  • Add validity examples
  • Correct and add alternative examples

I chose the option to remove the trailing underscore.
I think that is easier to recognize as a number.

Please tell me if something is wrong.

@takenobu-hs

This comment has been minimized.

Copy link
Contributor Author

commented Oct 14, 2017

I will wait for a few weeks of discussion period :)

By the way, I wrote a patch for my studies [1].
However, I am glad that Carter or someone will write a more suitable patch.
Especially, it would be nice if someone could express Lexer simpler.

[1] ghc/ghc@master...takenobu-hs:wip/numeric-underscores

@cartazio

This comment has been minimized.

Copy link

commented Oct 19, 2017

@takenobu-hs

This comment has been minimized.

Copy link
Contributor Author

commented Oct 20, 2017

@cartazio thank you, I am grateful for your help :)
I am not in a hurry, so please do it when you have time.
Please take care of yourself.

@cartazio

This comment has been minimized.

Copy link

commented Oct 20, 2017

@takenobu-hs

This comment has been minimized.

Copy link
Contributor Author

commented Oct 20, 2017

Thank you, I will add a testcase for failures / disallowed patterns in testsuite/tests/parser/should_fail/ directory.

@takenobu-hs

This comment has been minimized.

Copy link
Contributor Author

commented Oct 21, 2017

Hi, I added the testcase in this [1].
(testsuite/tests/parser/should_fail/NumericUnderscoresFail0.hs)
Please let me know if you have any problems. I am not in a hurry. Thanks.

[1] ghc/ghc@master...takenobu-hs:wip/numeric-underscores

@takenobu-hs

This comment has been minimized.

Copy link
Contributor Author

commented Oct 27, 2017

Four weeks have passed since I submitted this proposal.
I think the discussion has converged.
I would like to submit this for a decision.

cc @nomeata

@nomeata

This comment has been minimized.

Copy link
Contributor

commented Nov 1, 2017

Committee decision process started. Thanks for the ping, @bitemyapp, I missed the earlier one among too many GitHub notifications.

@takenobu-hs

This comment has been minimized.

Copy link
Contributor Author

commented Nov 2, 2017

@nomeata, thank you very much :)

@cartazio

This comment has been minimized.

Copy link

commented Nov 2, 2017

@takenobu-hs

This comment has been minimized.

Copy link
Contributor Author

commented Nov 3, 2017

@cartazio thank you always.
If this proposal is accepted, I will send the patch to phab.
I would be happy if you could give me comments there :)

@yav

This comment has been minimized.

Copy link
Contributor

commented Nov 7, 2017

I think we should allow underscores between the base specifier and the digits for binary, octal and hex literals. For example, I think it would be handy to write 0x_FFFF_FFFF instead of 0xFFFF_FFFF

@takenobu-hs

This comment has been minimized.

Copy link
Contributor Author

commented Nov 8, 2017

@yav thanks for the advice.
I considered it as follows.

Current syntax (Haskell Report 2010 + BinaryLiterals + HexFloatLiterals):

decimal     →  digit{digit}
octal       →  octit{octit}
hexadecimal →  hexit{hexit}
binary      →  binit{binit}                                  -- BinaryLiterals

integer →  decimal
         | 0 (o | O) octal
         | 0 (x | X) hexadecimal
         | 0 (b | B) binary                                  -- BinaryLiterals

float →  decimal . decimal [exponent]
       | decimal exponent
       | 0 (x | X) hexadecimal . hexadecimal [bin_exponent]  -- HexFloatLiterals
       | 0 (x | X) hexadecimal bin_exponent                  -- HexFloatLiterals

exponent     →  (e | E) [+ | -] decimal
bin_exponent →  (p | P) [+ | -] decimal                      -- HexFloatLiterals

This proposal (revised):

decimal     →  digit[{_ | digit} digit]
octal       →  octit[{_ | octit} octit]
hexadecimal →  hexit[{_ | hexit} hexit]
binary      →  binit[{_ | binit} binit]

integer →  decimal
         | 0 (o | O) [_] octal                                      -- *** for 0o_123
         | 0 (x | X) [_] hexadecimal                                -- *** for 0x_ffff
         | 0 (b | B) [_] binary                                     -- *** for 0b_0101

float →  decimal . decimal [_] [exponent]
       | decimal [_] exponent
       | 0 (x | X) [_] hexadecimal . hexadecimal [[_]bin_exponent]  -- *** for 0x_ff.ff
       | 0 (x | X) [_] hexadecimal [_] bin_exponent                 -- *** for 0x_ffp1

The lines of *** are the parts you pointed out.
Please tell me if there is any misunderstanding.
If this is correct, I will update the proposal file.

@cartazio

This comment has been minimized.

Copy link

commented Nov 8, 2017

@yav

This comment has been minimized.

Copy link
Contributor

commented Nov 8, 2017

@takenobu-hs Yep, that's what I was thinking.

@takenobu-hs

This comment has been minimized.

Copy link
Contributor Author

commented Nov 9, 2017

@yav thanks for confirmation.

@cartazio thanks for a clear perspective.
I investigated to make it easy to understand the changed part of the specification.
I revise the proposal as follows. The part of numSpacer are the change points in this proposal.

This proposal (revised):

-- places affected by this proposal
numSpacer = _

-- underscores between digits
decimal     →  digit[{numSpacer | digit} digit]
octal       →  octit[{numSpacer | octit} octit]
hexadecimal →  hexit[{numSpacer | hexit} hexit]
binary      →  binit[{numSpacer | binit} binit]

-- an underscore succeeding the base specifier
integer →  decimal
         | 0 (o | O) [numSpacer] octal
         | 0 (x | X) [numSpacer] hexadecimal
         | 0 (b | B) [numSpacer] binary

float →  decimal . decimal [exponent]
       | decimal exponent
       | 0 (x | X) [numSpacer] hexadecimal . hexadecimal [bin_exponent]
       | 0 (x | X) [numSpacer] hexadecimal bin_exponent

-- an underscore preceding the exponent
exponent     →  [numSpacer] (e | E) [+ | -] decimal
bin_exponent →  [numSpacer] (p | P) [+ | -] decimal

Please let me know if it is insufficient.
If this is sufficient, I will update the proposal file.

@yav

This comment has been minimized.

Copy link
Contributor

commented Nov 9, 2017

@takenobu-hs I think we should allow multiple underscores next to each other, and at least one of your examples in the proposal suggests that you also think so. To accommodate that I think that numSpacer should be _+ (i.e. 1 or more underscores).

Also, would you mind updating the actual proposal with the current version of the syntax, so that we have the whole proposal in a single place.

Here's another suggestion: I would drop the "Current specification of numeric literals" section. It is enough to just have "current syntax" and "new syntax", or even just "new syntax".

Modify the specification for feedback of discussion
Add underscores between the base specifier and the digits. (from yav)
Modify underscore before the exponent and after the base specifier to multiple.
Correspond to HexFloatLiterals extention.
Reconstruction of section on current specification. (from yav)
Reconstruction of new proposal notation (from cartazio)
@takenobu-hs

This comment has been minimized.

Copy link
Contributor Author

commented Nov 10, 2017

@yav thanks for the review.

I think we should allow multiple underscores next to each other, and at least one of your examples in the proposal suggests that you also think so. To accommodate that I think that numSpacer should be _+ (i.e. 1 or more underscores).

In the current proposal, we could allow multiple underscores in decimal, octal, hexadecimal, and binary.
Because {numSpacer | digit} represents zero or more repetitions in Haskell Report 2010.
For example decimal allows 5__6.

In addition, I changed the underscore before the exponent and after the base specifier to multiple in this revision.

Also, would you mind updating the actual proposal with the current version of the syntax, so that we have the whole proposal in a single place.

Thanks. I pushed the actual proposal with the current version of the syntax.

Here's another suggestion: I would drop the "Current specification of numeric literals" section. It is enough to just have "current syntax" and "new syntax", or even just "new syntax".

Thanks. That makes the proposal easier to understand.
I dropped the "Current specification of numeric literals" section. And I reconstructed the specification part.

Please review the revised version.

@takenobu-hs

This comment has been minimized.

Copy link
Contributor Author

commented Nov 10, 2017

@yav I updated numSpacer to make regular expressions more uniform and simple.
I think that the changes from the current specification became easier to understand.
Please review this revised version.

@yav

This comment has been minimized.

Copy link
Contributor

commented Nov 14, 2017

Ok, I've marked this as accepted.

@takenobu-hs you should consider submitting your patch to phabricator for review. I had a quick look at it and I have just one major suggestion: instead of having two sets of definitions for the literals (one with underscores and one without), it might be simpler to have only the definitions with underscores, and then have a separate function that validates the literals, ensuring that there are no underscores unless the extension is enabled.

@nomeata nomeata merged commit e5b5173 into ghc-proposals:master Nov 14, 2017

@takenobu-hs

This comment has been minimized.

Copy link
Contributor Author

commented Nov 15, 2017

Hi everyone thanks for a lot of advice and comments.
@yav, @nomeata, thanks for accept and merge.

@takenobu-hs you should consider submitting your patch to phabricator for review.

I will submit the patch to phabricator after improving the following.

I had a quick look at it and I have just one major suggestion: instead of having two sets of definitions for the literals (one with underscores and one without),

Thanks for valuable suggestions. That was my concern.

it might be simpler to have only the definitions with underscores, and then have a separate function that validates the literals, ensuring that there are no underscores unless the extension is enabled.

It's beautiful. I like your unified definition.
I will investigate how to implement the function for validation.
I have not understood the implementation manner in Lexer.x yet.
I'd be nice if someone tell me hints on how to implement it on trac or ghc-devs ML :)

By the way, should I create a ticket for this proposal?

@nomeata

This comment has been minimized.

Copy link
Contributor

commented Nov 15, 2017

By the way, should I create a ticket for this proposal?

Sure, that’d be helpful.

@takenobu-hs

This comment has been minimized.

Copy link
Contributor Author

commented Nov 16, 2017

I created the ticket (https://ghc.haskell.org/trac/ghc/ticket/14473).

In addition, I added the ticket number to 0009-numeric-underscores.rst and sent a PR (#95).

Thanks.

bgamari added a commit to ghc/ghc that referenced this pull request Jan 22, 2018

Implement underscores in numeric literals (NumericUnderscores extension)
Implement the proposal of underscores in numeric literals.
Underscores in numeric literals are simply ignored.

The specification of the feature is available here:
https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/000
9-numeric-underscores.rst

For a discussion of the various choices:
ghc-proposals/ghc-proposals#76

Implementation detail:

* Added dynamic flag
  * `NumericUnderscores` extension flag is added for this feature.

* Alex "Regular expression macros" in Lexer.x
  * Add `@numspc` (numeric spacer) macro to represent multiple
    underscores.
  * Modify `@decimal`, `@decimal`, `@binary`, `@octal`, `@hexadecimal`,
    `@exponent`, and `@bin_exponent` macros to include `@numspc`.

* Alex "Rules" in Lexer.x
  * To be simpler, we have only the definitions with underscores.
    And then we have a separate function (`tok_integral` and `tok_frac`)
    that validates the literals.

* Validation functions in Lexer.x
  * `tok_integral` and `tok_frac` functions validate
    whether contain underscores or not.
    If `NumericUnderscores` extensions are not enabled,
    check that there are no underscores.
  * `tok_frac` function is created by merging `strtoken` and
    `init_strtoken`.
  * `init_strtoken` is deleted. Because it is no longer used.

* Remove underscores from target literal string
  * `parseUnsignedInteger`, `readRational__`, and `readHexRational} use
    the customized `span'` function to remove underscores.

* Added Testcase
  * testcase for NumericUnderscores enabled.
      NumericUnderscores0.hs and NumericUnderscores1.hs
  * testcase for NumericUnderscores disabled.
      NoNumericUnderscores0.hs and NoNumericUnderscores1.hs
  * testcase to invalid pattern for NumericUnderscores enabled.
      NumericUnderscoresFail0.hs and NumericUnderscoresFail1.hs

Test Plan: `validate` including the above testcase

Reviewers: goldfire, bgamari

Reviewed By: bgamari

Subscribers: carter, rwbarton, thomie

GHC Trac Issues: #14473

Differential Revision: https://phabricator.haskell.org/D4235

@bravit bravit added the Proposal label Nov 11, 2018

@nomeata nomeata added the Implemented label Dec 3, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.