Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve literal formatting in "Did you mean ...?" #10767

Closed

Conversation

curiousleo
Copy link
Contributor

@curiousleo curiousleo commented Nov 14, 2021

Implements "Proposal 1" in #10766.

Alternative implementation: see #10818.

User-visible changes

The goal of this PR is to preserve the original formatting of integer literals (int32, int64 and nativeint) in the "Did you mean ...?" hint.

Before

1 | let _ = Int64.(add 0xAA_BBl 2L);;
                       ^^^^^^^^
Error: This expression has type int32 but an expression was expected of type
         int64
  Hint: Did you mean `43707L'?

After

1 | let _ = Int64.(add 0xAA_BBl 2L);;
                       ^^^^^^^^
Error: This expression has type int32 but an expression was expected of type
         int64
  Hint: Did you mean `0xAA_BBL'?

API changes

Asttypes.constant changes: Const_int32, Const_int64 and Const_nativeint are now tuples; the second value is a string containing the original formatting of the literal (but without the suffix).

Note that Asttypes.mli warns about this module being unstable and part of compiler-libs.

@curiousleo
Copy link
Contributor Author

curiousleo commented Nov 14, 2021

This is a draft. In the first step, I have implemented the change to preserve the original formatting of int32, with tests. Doing the same for int64 and nativeint will probably be quite mechanical. However, I would like a first rough review to confirm that I'm on the right path here before making the same changes for int64 and nativeint.

Edit: the PR now includes changes for int64 and nativeint.

tools/dumpobj.ml Outdated Show resolved Hide resolved
typing/parmatch.ml Outdated Show resolved Hide resolved
Comment on lines +1010 to +1011
(function Constant(Const_int32 (i, _)) -> i | _ -> assert false)
(function i -> Tpat_constant(Const_int32 (i, Int32.to_string i)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how "hot" this function is. If it is "hot", then it may be worth considering changing Const_int32 from Const_int32 of int32 * string to Const_int32 of int32 * string option and using None here instead of Int32.to_string to avoid the overhead of conversion. This would complicate handling of Const_int32 in some other places.

(Same for Const_int64 and Const_nativeint.)

TODO: Run a benchmark to see if this is an issue at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really about performance; this function is only called once an appropriate candidate counter-example is found. (In some case this part of the space search may be reached several times due to GADT troubles occurring earlier, but still I think you would be hard-pressed to find a counter-example where the performance actually matter.) On the other hand, None has the benefit of providing an easy way out for constants that are produced by code, instead of parsed from user input, and this may be convenient in other places at well. You would know better, as you have a good grasp of this slice of functionality right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gasche! For future reference: the function

(function i -> Tpat_constant(Const_int32 (i, Int32.to_string i)))

is passed as the make argument to build_other_constant, and make is only called once the counter-example has been established. I'm no longer worried about performance in this instance.

On the other hand, None has the benefit of providing an easy way out for constants that are produced by code, instead of parsed from user input, and this may be convenient in other places at well.

This makes me wonder where the string comes from in contexts where there is no source code, e.g. in dumpobj. So far in this PR, this place in parmatch.ml is the only place where we have to synthesise the string representation from the integer value. I would have expected there to be other places (e.g. in dumpobj) where this needs to happen as well, but haven't found any.

If there were more places where we would have to "fake" the original string using something like Int32.to_string then that would strengthen the case for string option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(For reference: #10767 (comment) explains more of what is going on in build_other generally and why synthesising a decimal is a sensible choice here.)

Comment on lines +92 to +94
| Const_base(Const_int32 (i, _)) -> printf "%ldl" i
| Const_base(Const_nativeint (i, _)) -> printf "%ndn" i
| Const_base(Const_int64 (i, _)) -> printf "%LdL" i
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all other places where pretty-printing takes place, the original string formatting (the second tuple element) is now used. I somewhat arbitrarily went for "normalisation to decimal" here, as this is "dumpobj", so the input is a binary, which means that we don't have the original formatting at hand anyway (I have not read the surrounding code in depth, so this is an assumption -- e.g. I'm not sure where the string representation actually comes from in this context).

Comment on lines -32 to +34
| Const_int32 i -> Printf.sprintf "%ldl" i
| Const_int64 i -> Printf.sprintf "%LdL" i
| Const_nativeint i -> Printf.sprintf "%ndn" i
| Const_int32 (_, s) -> Printf.sprintf "%sl" s
| Const_int64 (_, s) -> Printf.sprintf "%sL" s
| Const_nativeint (_, s) -> Printf.sprintf "%sn" s
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Where is this used? Is the old behaviour (conversion to decimal) more useful here?

Comment on lines -29 to +31
| Const_base(Const_int32 n) -> fprintf ppf "%lil" n
| Const_base(Const_int64 n) -> fprintf ppf "%LiL" n
| Const_base(Const_nativeint n) -> fprintf ppf "%nin" n
| Const_base(Const_int32 (_, s)) -> fprintf ppf "%sl" s
| Const_base(Const_int64 (_, s)) -> fprintf ppf "%sL" s
| Const_base(Const_nativeint (_, s)) -> fprintf ppf "%sn" s
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Would it be more useful to keep the previous behaviour (conversion to decimal) here?

Before:

    1 | let _ = Int64.(add 0xAA_BBl 2L);;
                           ^^^^^^^^
    Error: This expression has type int32 but an expression was expected of type
             int64
      Hint: Did you mean `43707L'?

After:

    1 | let _ = Int64.(add 0xAA_BBl 2L);;
                           ^^^^^^^^
    Error: This expression has type int32 but an expression was expected of type
             int64
      Hint: Did you mean `0xAA_BBL'?
@curiousleo curiousleo force-pushed the preserve-integer-literal-format branch from fc5fec1 to 480af00 Compare November 14, 2021 18:18
@curiousleo curiousleo changed the title Preserve original formatting in integer literal type hint Preserve literal formatting in "Did you mean ...?" Nov 14, 2021
Comment on lines -72 to +74
| Const_int32 (i) -> fprintf f "Const_int32 %ld" i;
| Const_int64 (i) -> fprintf f "Const_int64 %Ld" i;
| Const_nativeint (i) -> fprintf f "Const_nativeint %nd" i;
| Const_int32 (_, s) -> fprintf f "Const_int32 %sl" s;
| Const_int64 (_, s) -> fprintf f "Const_int64 %sL" s;
| Const_nativeint (_, s) -> fprintf f "Const_nativeint %sn" s;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Would it be more useful to keep the previous behaviour (conversion to decimal) here?

@curiousleo
Copy link
Contributor Author

curiousleo commented Nov 15, 2021

Ready for review.

I have left a few questions ("Q: ...?") as comments where I could use feedback. They essentially boil down to this question: given that we now carry the original literal around as a string, whenever we output the literal, we have a choice: we can either use the original literal formatting, or we can normalise to decimal formatting.

As of right now, this PR changes the formatting so instead of decimals without underscores, the original formatting is used everywhere except in dumpobj.

The more conservative approach would be to keep normalising to decimals without underscores everywhere (the current behaviour) except in the type hint (as preserving the literal formatting in the type hint is the goal of this PR).

I don't really have an opinion or good arguments either way, so let me know if you do!

@curiousleo curiousleo marked this pull request as ready for review November 15, 2021 10:56
@Octachron
Copy link
Member

I am not that fond of this PR current approach of modifying the constant type since that type is shared by the parser, typechecker and part of the middle-end. I have the impression that we could keep this change confined to the typechecker by adding an extra field to the Tpat_constant and Texp_constant constructors.

@gasche
Copy link
Member

gasche commented Dec 6, 2021

So far, existing approaches to keep more information on literals have changed things in the Parsetre.constant type with constructors Pconst_* constructors (the n in 32n is stored there), and on one occasion also in the Asttypes.constant types with the Const_* constructors (the complete location of string literals, coming from #8820). The currently-proposed approach seems consistent, and the idea of storing the information off-band in the T{exp,pat}_constant nodes does not sound enticing.

On the other hand, maybe we want to split the notion of constants used in the typedtree from the one of the middle-end, so we would have a separate type Lambda.constant (for example) with less user-facing information? But then how do we know how deep we want to preserve this information on literal formatting for use in compiler messages? I would guess that other parts lower in the compiler could on some occasions discuss the value of literals with users? (For example a does-not-fit-on-the-current-architecture check.)

@curiousleo
Copy link
Contributor Author

Thanks @Octachron and @gasche for taking a look!

I don't have enough context to judge the pros and cons of

  • (A) modifying constant (as currently in this PR) vs.
  • (B) modifying T{exp,pat}_constant vs.
  • (C) introducing a separate type.

I am willing to prototype though. Would you find it helpful if I had a go at implementing a proof-of-concept PR for the improved error message on top of just a change to T{exp,pat}_constant (variant B)?

I'm afraid that proposal (C) is a bit too abstract for me to go and create a proof-of-concept, I would need more details on that idea.

In short, I appreciate that there are design issues at play here that I don't have much context on. But I would like to keep this PR moving. If there is anything I can do to to move the discussion forward, please let me know.

@curiousleo curiousleo closed this Dec 6, 2021
@curiousleo curiousleo reopened this Dec 6, 2021
@curiousleo
Copy link
Contributor Author

Mis-clicked, sorry. I did not mean to close the PR.

@gasche
Copy link
Member

gasche commented Dec 6, 2021

I'll let @Octachron make a suggestion :-) (Your proposal to prototype the T{pat,exp}_constant approach sounds like a sensible answer to his remark if you don't mind the extra work, but then maybe he is convinced enough by my consistency argument to have changed his mind already?)

@Octachron
Copy link
Member

I agree that modifying the typedtree for the error message is not ideal.

But then, I would argue that a better fix would be to make sure that the error-handling part of the functions Typecore.unifify_pat and Typecore.unify_expr propagates the right information to the hint literal machinery rather than modifying another AST.

In particular, the Expr_type_clash and Pat_type_clash constructors could be updated to contain a (possibly relevant) parsetree node rather that a typedtree node (which is currently only used by the hint literal machinery).

@curiousleo
Copy link
Contributor Author

@Octachron wrote:

But then, I would argue that a better fix would be to make sure that the error-handling part of the functions Typecore.unifify_pat and Typecore.unify_expr propagates the right information to the hint literal machinery rather than modifying another AST.

In particular, the Expr_type_clash and Pat_type_clash constructors could be updated to contain a (possibly relevant) parsetree node rather that a typedtree node (which is currently only used by the hint literal machinery).

I have an implementation based on this idea in #10818. It is a smaller, more local change that still achieves the goal of preserving the original integer formatting in the type hint.

Let me know what you think -- my hunch is that #10818 is preferable, in which case this PR can be closed.

@curiousleo
Copy link
Contributor Author

Closing in favour of #10818.

@curiousleo curiousleo closed this Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants