Preserve literal formatting in "Did you mean ...?" #10767

curiousleo · 2021-11-14T14:00:38Z

Implements "Proposal 1" in #10766.

Alternative implementation: see #10818.

User-visible changes

The goal of this PR is to preserve the original formatting of integer literals (int32, int64 and nativeint) in the "Did you mean ...?" hint.

Before

1 | let _ = Int64.(add 0xAA_BBl 2L);;
                       ^^^^^^^^
Error: This expression has type int32 but an expression was expected of type
         int64
  Hint: Did you mean `43707L'?

After

1 | let _ = Int64.(add 0xAA_BBl 2L);;
                       ^^^^^^^^
Error: This expression has type int32 but an expression was expected of type
         int64
  Hint: Did you mean `0xAA_BBL'?

API changes

Asttypes.constant changes: Const_int32, Const_int64 and Const_nativeint are now tuples; the second value is a string containing the original formatting of the literal (but without the suffix).

Note that Asttypes.mli warns about this module being unstable and part of compiler-libs.

curiousleo · 2021-11-14T14:00:54Z

This is a draft. In the first step, I have implemented the change to preserve the original formatting of int32, with tests. Doing the same for int64 and nativeint will probably be quite mechanical. However, I would like a first rough review to confirm that I'm on the right path here before making the same changes for int64 and nativeint.

Edit: the PR now includes changes for int64 and nativeint.

tools/dumpobj.ml

typing/parmatch.ml

curiousleo · 2021-11-14T17:29:37Z

typing/parmatch.ml

+            (function Constant(Const_int32 (i, _)) -> i | _ -> assert false)
+            (function i -> Tpat_constant(Const_int32 (i, Int32.to_string i)))


I don't know how "hot" this function is. If it is "hot", then it may be worth considering changing Const_int32 from Const_int32 of int32 * string to Const_int32 of int32 * string option and using None here instead of Int32.to_string to avoid the overhead of conversion. This would complicate handling of Const_int32 in some other places.

(Same for Const_int64 and Const_nativeint.)

TODO: Run a benchmark to see if this is an issue at all.

This is not really about performance; this function is only called once an appropriate candidate counter-example is found. (In some case this part of the space search may be reached several times due to GADT troubles occurring earlier, but still I think you would be hard-pressed to find a counter-example where the performance actually matter.) On the other hand, None has the benefit of providing an easy way out for constants that are produced by code, instead of parsed from user input, and this may be convenient in other places at well. You would know better, as you have a good grasp of this slice of functionality right now.

Thanks @gasche! For future reference: the function

(function i -> Tpat_constant(Const_int32 (i, Int32.to_string i)))

is passed as the make argument to build_other_constant, and make is only called once the counter-example has been established. I'm no longer worried about performance in this instance.

On the other hand, None has the benefit of providing an easy way out for constants that are produced by code, instead of parsed from user input, and this may be convenient in other places at well.

This makes me wonder where the string comes from in contexts where there is no source code, e.g. in dumpobj. So far in this PR, this place in parmatch.ml is the only place where we have to synthesise the string representation from the integer value. I would have expected there to be other places (e.g. in dumpobj) where this needs to happen as well, but haven't found any.

If there were more places where we would have to "fake" the original string using something like Int32.to_string then that would strengthen the case for string option.

(For reference: #10767 (comment) explains more of what is going on in build_other generally and why synthesising a decimal is a sensible choice here.)

curiousleo · 2021-11-14T17:34:13Z

tools/dumpobj.ml

+  | Const_base(Const_int32 (i, _)) -> printf "%ldl" i
+  | Const_base(Const_nativeint (i, _)) -> printf "%ndn" i
+  | Const_base(Const_int64 (i, _)) -> printf "%LdL" i


In all other places where pretty-printing takes place, the original string formatting (the second tuple element) is now used. I somewhat arbitrarily went for "normalisation to decimal" here, as this is "dumpobj", so the input is a binary, which means that we don't have the original formatting at hand anyway (I have not read the surrounding code in depth, so this is an assumption -- e.g. I'm not sure where the string representation actually comes from in this context).

curiousleo · 2021-11-14T17:45:44Z

typing/printpat.ml

-| Const_int32 i -> Printf.sprintf "%ldl" i
-| Const_int64 i -> Printf.sprintf "%LdL" i
-| Const_nativeint i -> Printf.sprintf "%ndn" i
+| Const_int32 (_, s) -> Printf.sprintf "%sl" s
+| Const_int64 (_, s) -> Printf.sprintf "%sL" s
+| Const_nativeint (_, s) -> Printf.sprintf "%sn" s


Q: Where is this used? Is the old behaviour (conversion to decimal) more useful here?

curiousleo · 2021-11-14T17:46:21Z

lambda/printlambda.ml

-  | Const_base(Const_int32 n) -> fprintf ppf "%lil" n
-  | Const_base(Const_int64 n) -> fprintf ppf "%LiL" n
-  | Const_base(Const_nativeint n) -> fprintf ppf "%nin" n
+  | Const_base(Const_int32 (_, s)) -> fprintf ppf "%sl" s
+  | Const_base(Const_int64 (_, s)) -> fprintf ppf "%sL" s
+  | Const_base(Const_nativeint (_, s)) -> fprintf ppf "%sn" s


Q: Would it be more useful to keep the previous behaviour (conversion to decimal) here?

Before: 1 | let _ = Int64.(add 0xAA_BBl 2L);; ^^^^^^^^ Error: This expression has type int32 but an expression was expected of type int64 Hint: Did you mean `43707L'? After: 1 | let _ = Int64.(add 0xAA_BBl 2L);; ^^^^^^^^ Error: This expression has type int32 but an expression was expected of type int64 Hint: Did you mean `0xAA_BBL'?

typing/typecore.ml

curiousleo · 2021-11-15T10:48:55Z

typing/printtyped.ml

-  | Const_int32 (i) -> fprintf f "Const_int32 %ld" i;
-  | Const_int64 (i) -> fprintf f "Const_int64 %Ld" i;
-  | Const_nativeint (i) -> fprintf f "Const_nativeint %nd" i;
+  | Const_int32 (_, s) -> fprintf f "Const_int32 %sl" s;
+  | Const_int64 (_, s) -> fprintf f "Const_int64 %sL" s;
+  | Const_nativeint (_, s) -> fprintf f "Const_nativeint %sn" s;


Q: Would it be more useful to keep the previous behaviour (conversion to decimal) here?

curiousleo · 2021-11-15T10:55:57Z

Ready for review.

I have left a few questions ("Q: ...?") as comments where I could use feedback. They essentially boil down to this question: given that we now carry the original literal around as a string, whenever we output the literal, we have a choice: we can either use the original literal formatting, or we can normalise to decimal formatting.

As of right now, this PR changes the formatting so instead of decimals without underscores, the original formatting is used everywhere except in dumpobj.

The more conservative approach would be to keep normalising to decimals without underscores everywhere (the current behaviour) except in the type hint (as preserving the literal formatting in the type hint is the goal of this PR).

I don't really have an opinion or good arguments either way, so let me know if you do!

Octachron · 2021-12-06T14:38:51Z

I am not that fond of this PR current approach of modifying the constant type since that type is shared by the parser, typechecker and part of the middle-end. I have the impression that we could keep this change confined to the typechecker by adding an extra field to the Tpat_constant and Texp_constant constructors.

gasche · 2021-12-06T14:59:42Z

So far, existing approaches to keep more information on literals have changed things in the Parsetre.constant type with constructors Pconst_* constructors (the n in 32n is stored there), and on one occasion also in the Asttypes.constant types with the Const_* constructors (the complete location of string literals, coming from #8820). The currently-proposed approach seems consistent, and the idea of storing the information off-band in the T{exp,pat}_constant nodes does not sound enticing.

On the other hand, maybe we want to split the notion of constants used in the typedtree from the one of the middle-end, so we would have a separate type Lambda.constant (for example) with less user-facing information? But then how do we know how deep we want to preserve this information on literal formatting for use in compiler messages? I would guess that other parts lower in the compiler could on some occasions discuss the value of literals with users? (For example a does-not-fit-on-the-current-architecture check.)

curiousleo · 2021-12-06T16:08:03Z

Thanks @Octachron and @gasche for taking a look!

I don't have enough context to judge the pros and cons of

(A) modifying constant (as currently in this PR) vs.
(B) modifying T{exp,pat}_constant vs.
(C) introducing a separate type.

I am willing to prototype though. Would you find it helpful if I had a go at implementing a proof-of-concept PR for the improved error message on top of just a change to T{exp,pat}_constant (variant B)?

I'm afraid that proposal (C) is a bit too abstract for me to go and create a proof-of-concept, I would need more details on that idea.

In short, I appreciate that there are design issues at play here that I don't have much context on. But I would like to keep this PR moving. If there is anything I can do to to move the discussion forward, please let me know.

curiousleo · 2021-12-06T16:08:25Z

Mis-clicked, sorry. I did not mean to close the PR.

gasche · 2021-12-06T16:11:18Z

I'll let @Octachron make a suggestion :-) (Your proposal to prototype the T{pat,exp}_constant approach sounds like a sensible answer to his remark if you don't mind the extra work, but then maybe he is convinced enough by my consistency argument to have changed his mind already?)

Octachron · 2021-12-07T13:56:06Z

I agree that modifying the typedtree for the error message is not ideal.

But then, I would argue that a better fix would be to make sure that the error-handling part of the functions Typecore.unifify_pat and Typecore.unify_expr propagates the right information to the hint literal machinery rather than modifying another AST.

In particular, the Expr_type_clash and Pat_type_clash constructors could be updated to contain a (possibly relevant) parsetree node rather that a typedtree node (which is currently only used by the hint literal machinery).

curiousleo · 2021-12-09T09:53:04Z

@Octachron wrote:

But then, I would argue that a better fix would be to make sure that the error-handling part of the functions Typecore.unifify_pat and Typecore.unify_expr propagates the right information to the hint literal machinery rather than modifying another AST.

In particular, the Expr_type_clash and Pat_type_clash constructors could be updated to contain a (possibly relevant) parsetree node rather that a typedtree node (which is currently only used by the hint literal machinery).

I have an implementation based on this idea in #10818. It is a smaller, more local change that still achieves the goal of preserving the original integer formatting in the type hint.

Let me know what you think -- my hunch is that #10818 is preferable, in which case this PR can be closed.

curiousleo · 2022-07-13T09:38:52Z

Closing in favour of #10818.

curiousleo commented Nov 14, 2021

View reviewed changes

tools/dumpobj.ml Outdated Show resolved Hide resolved

curiousleo commented Nov 14, 2021

View reviewed changes

typing/parmatch.ml Outdated Show resolved Hide resolved

curiousleo commented Nov 14, 2021

View reviewed changes

curiousleo force-pushed the preserve-integer-literal-format branch from fc5fec1 to 480af00 Compare November 14, 2021 18:18

curiousleo changed the title ~~Preserve original formatting in integer literal type hint~~ Preserve literal formatting in "Did you mean ...?" Nov 14, 2021

curiousleo commented Nov 15, 2021

View reviewed changes

typing/typecore.ml Show resolved Hide resolved

curiousleo commented Nov 15, 2021

View reviewed changes

curiousleo marked this pull request as ready for review November 15, 2021 10:56

curiousleo closed this Dec 6, 2021

curiousleo reopened this Dec 6, 2021

curiousleo mentioned this pull request Dec 9, 2021

Preserve integer literal formatting in type hint #10818

Merged

curiousleo closed this Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve literal formatting in "Did you mean ...?" #10767

Preserve literal formatting in "Did you mean ...?" #10767

curiousleo commented Nov 14, 2021 •

edited

curiousleo commented Nov 14, 2021 •

edited

curiousleo Nov 14, 2021

gasche Nov 14, 2021

curiousleo Nov 14, 2021

curiousleo Nov 14, 2021

curiousleo Nov 14, 2021 •

edited

curiousleo Nov 14, 2021 •

edited

curiousleo Nov 14, 2021 •

edited

curiousleo Nov 15, 2021

curiousleo commented Nov 15, 2021 •

edited

Octachron commented Dec 6, 2021

gasche commented Dec 6, 2021

curiousleo commented Dec 6, 2021

curiousleo commented Dec 6, 2021

gasche commented Dec 6, 2021

Octachron commented Dec 7, 2021

curiousleo commented Dec 9, 2021

curiousleo commented Jul 13, 2022

		(function Constant(Const_int32 (i, _)) -> i \| _ -> assert false)
		(function i -> Tpat_constant(Const_int32 (i, Int32.to_string i)))

Preserve literal formatting in "Did you mean ...?" #10767

Preserve literal formatting in "Did you mean ...?" #10767

Conversation

curiousleo commented Nov 14, 2021 • edited

User-visible changes

API changes

curiousleo commented Nov 14, 2021 • edited

curiousleo Nov 14, 2021

Choose a reason for hiding this comment

gasche Nov 14, 2021

Choose a reason for hiding this comment

curiousleo Nov 14, 2021

Choose a reason for hiding this comment

curiousleo Nov 14, 2021

Choose a reason for hiding this comment

curiousleo Nov 14, 2021 • edited

Choose a reason for hiding this comment

curiousleo Nov 14, 2021 • edited

Choose a reason for hiding this comment

curiousleo Nov 14, 2021 • edited

Choose a reason for hiding this comment

curiousleo Nov 15, 2021

Choose a reason for hiding this comment

curiousleo commented Nov 15, 2021 • edited

Octachron commented Dec 6, 2021

gasche commented Dec 6, 2021

curiousleo commented Dec 6, 2021

curiousleo commented Dec 6, 2021

gasche commented Dec 6, 2021

Octachron commented Dec 7, 2021

curiousleo commented Dec 9, 2021

curiousleo commented Jul 13, 2022

curiousleo commented Nov 14, 2021 •

edited

curiousleo commented Nov 14, 2021 •

edited

curiousleo Nov 14, 2021 •

edited

curiousleo Nov 14, 2021 •

edited

curiousleo Nov 14, 2021 •

edited

curiousleo commented Nov 15, 2021 •

edited