Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse arbitrary precision integers .. #170

Closed
wants to merge 3 commits into from

Conversation

Projects
None yet
@hhugo
Copy link
Contributor

commented Apr 19, 2015

.. and allow any char of [A-Za-z] as modifier for ints (previously 'l','L','n') and floats.
This give more freedom to ppx rewritters (what about a ppx for zarith or literal for complex)

Checks are performed when translating from Parsetree to Typedtree.
Unknow_literal is raised if the modifier is not recognized ([lLn]?)
Integer_overflow is raised as before.

@@ -2096,7 +2079,7 @@ class_longident:
toplevel_directive:
SHARP ident { Ptop_dir($2, Pdir_none) }
| SHARP ident STRING { Ptop_dir($2, Pdir_string (fst $3)) }
| SHARP ident INT { Ptop_dir($2, Pdir_int $3) }
| SHARP ident INT { Ptop_dir($2, Pdir_int (int_of_string $3)) }

This comment has been minimized.

Copy link
@hhugo

hhugo Apr 19, 2015

Author Contributor

This may fail

raise (Error(Literal_overflow "int", Location.curr lexbuf))
}
| int_literal { INT (Lexing.lexeme lexbuf) }
| (int_literal as num) (identchar + as modifier)

This comment has been minimized.

Copy link
@hhugo

hhugo Apr 19, 2015

Author Contributor

a 'modifier' could simply be ['a'-'z''A'-Z']

@Octachron

This comment has been minimized.

Copy link
Contributor

commented Apr 20, 2015

This patch looks really interesting. Would it not make sense to have the same kind of support for float litteral? I can see some applications for qualified float litterals: complex number, single or quadruple precision litteral for instance. Most of these applications are not general enough to be supported within the compiler but would make nice ppx extensions.

@@ -887,7 +886,8 @@ and directive_argument i ppf x =
match x with
| Pdir_none -> line i ppf "Pdir_none\n"
| Pdir_string (s) -> line i ppf "Pdir_string \"%s\"\n" s;
| Pdir_int (i) -> line i ppf "Pdir_int %d\n" i;

This comment has been minimized.

Copy link
@hhugo

hhugo Apr 21, 2015

Author Contributor

There was an error here because the argument i of the function was shadowed by the pattern matching,

This comment has been minimized.

Copy link
@hhugo

hhugo Apr 21, 2015

Author Contributor

15 years old typo !!

@hhugo

This comment has been minimized.

Copy link
Contributor Author

commented Apr 21, 2015

@Octachron, I've updated the PR to add modifier on floats.

@@ -14,6 +14,12 @@

open Asttypes

type constant =
PConst_int of string * char option

This comment has been minimized.

Copy link
@bobot

bobot Apr 21, 2015

Contributor

The minus sign is kept inside the string. So there is a lot of string processing for this first character. Is it not easier to remove at lexing the minus sign from the string and keep the information in a separate bool?

@samoht

This comment has been minimized.

Copy link
Member

commented Apr 21, 2015

It could maybe be helpful if you add a few tests exercising the new code?

@Drup

This comment has been minimized.

Copy link
Contributor

commented May 7, 2015

I like the patch. I read the implementation and I think it's good.

I agree with @bobot on the minus sign.

I think there should be a function from Parsetree.constant to Asttypes.constant that do not need the typing env. It would be very useful for ppx writer. Returning an option would work well.

raise (Error(Literal_overflow "int", Location.curr lexbuf))
| int_literal { INT (Lexing.lexeme lexbuf, None) }
| int_literal literal_modifier
{

This comment has been minimized.

Copy link
@xavierleroy

xavierleroy May 7, 2015

Contributor

This would really benefit from using ocamllex's named sub-strings:

| (int_literal as lit) (literal_modifier as modif) { INT (lit, Some modif) }

@hhugo

This comment has been minimized.

Copy link
Contributor Author

commented May 11, 2015

I would prefer to keep the whole literal (minus sign included) inside the string. In most cases, it does not add any string processing as we already parse every int literal as negative numbers. (prepending "-" if not already negative, https://github.com/ocaml/ocaml/pull/170/files#diff-4408a3671b05f01d8b0dbe9df336e08bL173)

@hhugo hhugo force-pushed the hhugo:lex1 branch 2 times, most recently from 18c0f8a to c0f95b4 May 11, 2015

@hhugo hhugo referenced this pull request May 13, 2015

Closed

Synchronisation with OCaml #91

@hhugo

This comment has been minimized.

Copy link
Contributor Author

commented May 13, 2015

PR ready for camlp4 ocaml/camlp4#91

let cvt_int_aux str neg of_string =
if String.length str = 0 || str.[0]= '-'
then of_string str
else neg (of_string ("-" ^ str))

This comment has been minimized.

Copy link
@bobot

bobot Jun 16, 2015

Contributor

Could you add a comment for why you need this complicated function and a link to the related bts bug?

@@ -261,6 +263,40 @@ let type_constant = function
| Const_int64 _ -> instance_def Predef.type_int64
| Const_nativeint _ -> instance_def Predef.type_nativeint

(* To convert integer literals, allowing max_int + 1 (PR#4210) *)

This comment has been minimized.

Copy link
@bobot

bobot Jun 16, 2015

Contributor

I never understood this comment (which is in the original code) since it allows to fail well on max_int+1. What do you think of it?

@gasche

This comment has been minimized.

Copy link
Member

commented Jul 26, 2015

This one looks ready to merge. What is the status of ocaml/camlp4#90 ?

@hhugo hhugo force-pushed the hhugo:lex1 branch from adaf916 to cabce12 Jul 27, 2015

@hhugo

This comment has been minimized.

Copy link
Contributor Author

commented Jul 27, 2015

Rebased on top of trunk.
Ready to merge unless one wants to add tests as proposed by @samoht

@gasche

This comment has been minimized.

Copy link
Member

commented Jul 27, 2015

Tests are always good. I'd love tests!

| Const_char (c) -> fprintf f "Const_char %02x" (Char.code c);
| Const_string (s, None) -> fprintf f "Const_string(%S,None)" s;
| Const_string (s, Some delim) ->
| PConst_int (i,None) -> fprintf f "Const_int %s" i;

This comment has been minimized.

Copy link
@gasche

gasche Jul 27, 2015

Member

Given that the format here is apparently meant to give re-parseability, you should reprint PConst rather than Const.

@gasche

This comment has been minimized.

Copy link
Member

commented Jul 27, 2015

If you write tests, it would be nice to include one checking that -1073741824 is still correctly parsed as an integer literal (This corresponds to min_int for 32 machines.), to avoid the issue described by this comment in MPR#4210.

@gasche

This comment has been minimized.

Copy link
Member

commented Jul 27, 2015

I have one worry with this patch: there is an ambiguity (resolved by the longest-lexeme rule) between literal modifiers and hexadecimal literals. For example, 32f would be parsed as a literal modifier, but 0x32f is a valid hexadecimal literal.

Wouldn't it be safer (for now) to either restrict literal modifiers to the non-hexadecimal cases (but this wouldn't be consistent with nlL which are valid for hexadecimal literals), or to letters strictly after f or F?

@hhugo

This comment has been minimized.

Copy link
Contributor Author

commented Jul 27, 2015

How safer would that be ?
It seams to be similar to 1.2e and 1.2e3

@dbuenzli

This comment has been minimized.

Copy link
Contributor

commented Jul 27, 2015

.. and allow any char of [A-Za-z] as modifier for ints

This may break existing programs, right ?

@gasche

This comment has been minimized.

Copy link
Member

commented Jul 27, 2015

Not directly, in the sense that programs that relied on 123foo to parse were made illegal by #155 . That said, #155 has not yet been part of a release, so both changes together will break some programs, yes.

@bobzhang

This comment has been minimized.

Copy link
Member

commented Aug 3, 2015

So this patch is divided into two parts:

  1. make parsetree using string literals (more loyal to user input)

  2. extending lexer to support arbitrary modifiers

    With regard to 1, can we make changes to Asttypes.constant directly instead of creating a new type, Asttypes.constant is also used by Lambda, I am currently working on a project compiling lambda to readable javascript in the module level (still in the early days http://bobzhang.github.io/js-demo/), we could benefit such changes as well.

@hhugo hhugo force-pushed the hhugo:lex1 branch from 3b074a6 to 718d552 Oct 27, 2015

@hhugo

This comment has been minimized.

Copy link
Contributor Author

commented Oct 27, 2015

I've just rebased this PR onto trunk. Are people happy the current state ?

@bobzhang, you proposal of updating Asttypes.constant and not introducing a new type does not really make sense to me. One might still want to propagate constant as string to lambda but I would not do it inside this PR.

@hhugo

This comment has been minimized.

Copy link
Contributor Author

commented Nov 5, 2015

@damiendoligez, should I rebase and remove the temporary commit related to opam pin ?

@gasche

This comment has been minimized.

Copy link
Member

commented Nov 5, 2015

Yes, rebasing is always a good idea to keep the commit history clean (we do use bisect etc.), see Contributing: Clean patch series.

hhugo added some commits Apr 19, 2015

Parse arbitrary precision integers ..
.. and allow any identchar as modifier (previously 'l','L','n')
Also allow modifier for floats
This give more freedom to ppx rewritters (what about a ppx for zarith)

Checks are performed when translating from Parsetree to Typedtree.
Invalid_literal is raised if the modifier is not recognized ([lLn]?)
Integer_overflow is raised as before.

Lexer: use a-zA-Z for integer literal modifier

Lexer: Allow modifier on float

Clean wrt previous commits

Lexer: use named substring

Cleanup

typo

doc

fix after rebase

rebase on  trunk

Update typecore.ml

Fix printast.ml
Tests for constants with modifiers
add some tests

more tests

@hhugo hhugo force-pushed the hhugo:lex1 branch from 9f9c28b to 32c9bd1 Nov 6, 2015

@hhugo

This comment has been minimized.

Copy link
Contributor Author

commented Nov 6, 2015

PR rebased and cleared.
Note that the travis will fail because camlp4 is not pinned anymore.

Changes Outdated
@@ -328,6 +328,9 @@ Features wishes:
(Florian Angeletti)
- GPR#252: improve build instructions in MSVC Windows README
(Philip Daian)
* GPR#170: Parse arbitrary precision integers and accept a single [A-Za-z]

This comment has been minimized.

Copy link
@gasche

gasche Nov 6, 2015

Member

Could you maybe add a line to clarify why this can break existing programs? It would probably suffice to just show an example: let a = 1 in (+) 123a was accepted but is now rejected. This would reassure users that, unless they did horrendous stuff, they should be quite safe.

(You should feel free to squash your Changelog commits together.)

update Change log
GPR#170 can break existing programs
especially ppx rewriter as the Parsetree is updated
@gasche

This comment has been minimized.

Copy link
Member

commented Nov 15, 2015

This looks good to merge, but I'm still mildly non-plussed by the interaction between modifiers in the a-f range and hexa literals. I assigned @damiendoligez so that he takes care of merging.

@hhugo

This comment has been minimized.

Copy link
Contributor Author

commented Nov 24, 2015

@gasche, thinking more about it, I now agree with your concern.
How do you feel about restricting the range for modifiers to ['g'-'z''G'-'Z'] ?

@gasche

This comment has been minimized.

Copy link
Member

commented Nov 25, 2015

I think that this would be a good, safe thing to do -- we can always extend the feature later with more experience.

@damiendoligez

This comment has been minimized.

Copy link
Member

commented Dec 3, 2015

Merged with [g-zG-Z].

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Dec 4, 2015

The new test min_int.ml cannot be compiled on 32-bit architectures. I've disabled it for now.

If someone insists on keeping it, please tweak the test suite to exclude this test (or create the 32-bit counterpart).

@alainfrisch alainfrisch reopened this Dec 4, 2015

@gasche

This comment has been minimized.

Copy link
Member

commented Dec 4, 2015

We should have a warning on integer literals that do not fit on 32 machines, in the same spirit as the marshalling flag.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Dec 4, 2015

Note that there is already "ocamlc -compat-32", which triggers an error when the bytecode is not compatible with 32-bit.

@damiendoligez

This comment has been minimized.

Copy link
Member

commented Dec 4, 2015

I've "fixed" the test. Instead of adding heavy shell plumbing to compile a different source depending on the platform, I just changed it to the 32-bit min_int. This means the test is mostly useless on 64-bit, but the 32-bit platforms will test the corner case.

@hhugo hhugo deleted the hhugo:lex1 branch Dec 22, 2015

@hcarty hcarty referenced this pull request Jan 2, 2017

Closed

Underscore in numbers #929

kayceesrk added a commit to kayceesrk/ocaml that referenced this pull request Apr 9, 2018

@bobzhang bobzhang referenced this pull request Jul 16, 2018

Open

Upgrade to 4.06.1 #2755

@bobzhang

This comment has been minimized.

Copy link
Member

commented Nov 3, 2018

curious do we have any ppx make use of such extension

@Drup

This comment has been minimized.

Copy link
Contributor

commented Nov 3, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.