Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix printing of type variables with a quote on 2nd character #2130

Merged
merged 2 commits into from Nov 6, 2018

Conversation

Projects
None yet
4 participants
@alainfrisch
Copy link
Contributor

commented Nov 2, 2018

Type variables whose name has a single quote as the second character (such as ' a' or ' a'b) need to be printed with a whitespace after the initial quote symbol, in order not to be confused with a character literal. Of course, it would be even better for users not to use such variable names altogether, and they likely don't it do it in practice, but the printer should be as correct as possible, in particular (1) to support future automated rountrip testing between abstract and concrete syntax and (2) to support for code-generation tools (which might for instance add a "prime" suffix to existing names).

In #2034, it was suggested instead to reject such type variables, but I find it better to adjust the printer rather than change and complicate the specification of what a "valid name" for a type variable is in the Parsetree. A different syntactic front-end (such as Reason) could for instance use a completely different prefix symbol for type variables, or a different syntax for character literals, and the rather ad hoc constraint one would add ("single quote not allowed in second position") would feel weird.

Show resolved Hide resolved parsing/printast.ml Outdated
@trefis

This comment has been minimized.

Copy link
Contributor

commented Nov 2, 2018

In #2024, it was suggested instead to reject such type variables, but I find it better to adjust the printer rather than change and complicate the specification of what a "valid name" for a type variable is in the Parsetree. A different syntactic front-end (such as Reason) could for instance use a completely different prefix symbol for type variables, or a different syntax for character literals, and the rather ad hoc constraint one would add ("single quote not allowed in second position") would feel weird.

I think you meant #2034.
And if I understand #2034 correctly, there weren't change to the parsetree (or anything after it). It was simply changing the lexer and parser (i.e. exactly the thing you imagine reason could do), which should have absolute no impact on Reason or any other frontend.

I personally like #2034, but if people really dislike it then indeed this PR is "needed".

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented Nov 2, 2018

#2034 suggested to change the parser, hereby creating valid Parsetree (which could be obtained by PPX or Reason) that couldn't be represented by valid syntax. This would break tools that show in source form the output of a PPX, or that format Reason syntax back into the normal one. So even if #2034 did not formally specify the new invariant on the Parsetree, if should have done so, and it would then impact e.g. the ability of alternative front-ends and PPX to use/generate those now-invalid names.

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented Nov 2, 2018

To be clear: I'm well aware that there are plenty of other, more important cases, of formally valid Parsetree which cannot be represented by valid syntax and are not even rejected by Ast_invariant. I think one should go into the direction of documenting and enforcing "nice" invariants in Ast_invariant (e.g. the fact that an identifier name cannot be the empty string, etc) and relax the concrete syntax to allow writting in some way all the other valid Parsetrees. For instance, the name of an identifier cannot currently be a keyword and this restricts the ability of alternative front-end to use a different set of keywords while preserving full interoperability with OCaml. This could be addressed by providing an escape hatch in the lexer to use keywords (or even perhaps identifiers with non-identifier characters?) as identifiers (a la verbatim identifiers in C#).

@gasche
Copy link
Member

left a comment

I agree with Alain's (and Jacques', iirc?) position that ruling out type variables whose second character is a single quote makes our specification uglier for no good reason, and I prefer the current proposal.

On the implementation, I don't like the fact that the correct tyvar-printing logic is defined in Syntaxerr; this particular choice of module does not make sense to me. Maybe there is a better choice in the typing+parsing dependency graph, or maybe a new module needs to be introduced.

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented Nov 2, 2018

On the implementation, I don't like the fact that the correct tyvar-printing logic is defined in Syntaxerr

Me neither, but I couldn't find a better place considering the current dependency graph; and adding a new module just for this function seemed a bit heavy. Do you think it's worth it?

@gasche

This comment has been minimized.

Copy link
Member

commented Nov 2, 2018

I think it would be fine to have a TypesUtils modules into which more logic would move over time. Having stuff in the wrong module creates costs down the line, once I implemented a function to discover later it already existed in another module.

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented Nov 2, 2018

The current function needs to be part of the "parsing" part. Perhaps parsing/ast_utils?

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented Nov 2, 2018

We have parsing/ast_helper, but it needs to come after Syntaxerr (and Syntaxerr needs to print type variables as well).

@gasche

This comment has been minimized.

Copy link
Member

commented Nov 2, 2018

Another solution would be to split Syntaxerr into Syntaxerr and Syntaxerr_report, and have the second module depend on Pprintast.tyvar. The current Syntaxerr situation is a looming dependency crisis: the data part of the module is reused by all module raising structured error messages (including ast-helper stuff), so it wants to be at one end of the dependency graph, and the reporting part of the module is bound to call more and more sophisticated pretty-printing functions, so it wants to be at the opposite end of the dependency graph.

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented Nov 5, 2018

The printing code in Syntaxerr (which is "registered" by side-effect in Location) could just go into e.g. Pprintast.ml. The only problem with that is that one could end up with Pprintast not being linked when using compiler-libs without -linkall (but it would even worse with creating a new module). Would that work for you?

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented Nov 5, 2018

Note that in testsuite/tests/typing-gadts/term-conv.ml, we see that a locally abstract type t' produces a type variable of the same name, so following #2034 one would need to reject such type names as well (or do something specific to avoid keeping the name in that case).

@gasche

This comment has been minimized.

Copy link
Member

commented Nov 5, 2018

Having an error-reporter in Pprintast seems fairly strange to me (I would say, about as strange as having a pretty-printer in Syntaxerr); is there not a module that can depend on Pprintast in which the reporter would make more sense?

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented Nov 5, 2018

Perhaps Parse (="Entry points in the parser")?

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented Nov 5, 2018

@gasche cf last commit

@gasche

gasche approved these changes Nov 5, 2018

Copy link
Member

left a comment

The current proposal looks good to me. (Apologies for the copious nitpicking.) Feel free to merge when you are satisfied with the patch state (maybe clean up the commit history and consider having a Changes entry -- which I think could credit the authors of the alternate proposal as well).

@alainfrisch alainfrisch force-pushed the alainfrisch:fix_tyvar_print branch 2 times, most recently from 73a78d9 to 3799a17 Nov 6, 2018

alainfrisch added some commits Nov 2, 2018

Fix printing of type variables with a quote on 2nd character
Type variables whose name has a single quote as the second character
(such as ' a' or ' a'b) need to be printed with a whitespace after the
initial quote symbol, in order not to be confused with a character
literal.
Move variable printer to Pprintast
- The code responsible for printing Syntaxerr errors is moved to the
  Parse module (so that it can depend on the variable printer in
  Pprintast).

- Pprintast becomes a dependency for a few tools that link some
  compiler modules in an ad hoc way (they would better be implemented
  in terms of compiler-libs).

@alainfrisch alainfrisch force-pushed the alainfrisch:fix_tyvar_print branch from 3799a17 to 4bf7360 Nov 6, 2018

@alainfrisch alainfrisch merged commit 0d968e3 into ocaml:trunk Nov 6, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.