-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quoted strings #1066
Quoted strings #1066
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason not to allow #
- it's permitted in operators as well, but under more restricted rules.
parsing/lexer.mll
Outdated
@@ -290,6 +290,9 @@ let identchar_latin1 = | |||
['A'-'Z' 'a'-'z' '_' '\192'-'\214' '\216'-'\246' '\248'-'\255' '\'' '0'-'9'] | |||
let symbolchar = | |||
['!' '$' '%' '&' '*' '+' '-' '.' '/' ':' '<' '=' '>' '?' '@' '^' '|' '~'] | |||
let quoted_string_id_char = | |||
lowercase | | |||
['!' '$' '%' '&' '*' '+' '-' '.' '/' ':' '<' '=' '>' '?' '@' '^' '~'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be more clearly defined as:
let common_symbolchar =
['!' '$' '%' '&' '*' '+' '-' '.' '/' ':' '<' '=' '>' '?' '@' '^' '~']
let symbolchar =
common_symbolchar | '|'
let quoted_string_id_char =
lowercase | common_symbol_char
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular reason, I just took the existing symbolchars characters as a reasonable starting set. I can respin with '#' if that's wanted.
Patch looks basically fine to me, apart from needing a Changes entry (nice to have test cases too). I don't have a particular opinion as to the desirability of the change. /cc @alainfrisch? |
It would be nice to also have a test that pretty-printing the strings back works as expected. (Note that changing the lexical syntax requires changes in OCaml-manipulating tools, so it comes at a higher cost than other sorts of simple changes. This means that the value of the proposed change has to be very convincing.) |
Respun the patches:
@gasche: If you could point me to an example of a pretty-printing test, I'll add one for the quoted strings. |
I believe that the easiest way is to also add your examples to |
Changes
Outdated
@@ -5,6 +5,9 @@ Working version | |||
|
|||
### Language features: | |||
|
|||
- GPR#1066: Allow symbol characters in delimiters for quoted | |||
strings. (Matthew Wahab) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow the style of other entries - there should be a newline before the left parenthesis.
{@| r |@}; | ||
{^| s |^}; | ||
{~| t |~}; | ||
{#| u |#}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly also worth a case which is a mix of lowercase and symbols?
The justification for the change is a little subjective. At the moment, I embed text into OCaml code that is intended to be use by library code. Camlp4 allows this to be done using the quotations. For instance The alternatives using the current OCaml extensions and ppx filters seem to be:
So the justification would be that there is currently no equivalent to the Camlp4 quotations that is as concise (and the counter-argument is that there are simple alternatives). |
Quoted strings using the {<name>| .. |<name>} syntax can be used to embed text to be expanded by ppx preprocessors. This patch adds a set of symbols to the characters allowed in <name> and updates the documentation and tests. The added symbols are: '!', '$', '%', '&', '*', '+', '-', '.', '/', ':', '<', '=', '>', '?', '@', '^', '~' and '#'.
Updated for most comments. Pretty-printing tests still to come. |
The motivation for this change is built on a misunderstanding of what the That said, I see no harm in allowing symbol characters inside |
The current documentation seems to suggest that quoted strings are intended, or a least expected, to be used with ppx filters. Delimiters aside, they seem to be well suited for that usage. |
Yeah, I can see how it gives that impression. They are intended for use with ppx extensions, it's just that the way they are intended to be used is: |
By the way, the reasons
|
How about admitting module-qualified strings like Expr.{|x + 1|}
Uri.{|http://www.example.org/|}
Re_pcre.{|^["]*\(foo*\)|} The default behaviour would be that these are replaced by a global variable which is initialised at startup by calling a well-known function from the module, say |
|
As I said previously, any change to the lexical syntax is costly as tool that embed their own parsers have to be changed. Given that the currently proposed ones seems to be weakly motivated (we precisely do not want to use strings as quotations, extesnsions as there for this purpose), I vote to reject the change and close the PR -- despite the fact that I appreciate the care with which the patch was prepared. I think that the underlying idea that the quotation mechanism allowed by extensions today are a bit too heavy to become as widespread as Camlp4's |
That could be the interpretation if Added: I get the point about not extending the lexer, though, and the above scheme is admittedly not obvious. |
This change was based on my understanding that quoted strings were intended to be used for embedding foreign syntax for expansion by a ppx filter (because that's what the documentation said). Since that's incorrect, I'll close this PR. The best approach for my use-case seems to be a global parsing function rather than a ppx filter. |
The document is correct in saying that quoted strings are useful for embedding foreign syntax, but this embedding should happen within an extension node (as everything else that is foreign). Without the quoted string, you cannot embed foreign syntax without annoying escapes, and without extension nodes the programmer cannot tell which part of their programs have a different semantics. |
This manual clarification is intended to lift up the misunderstanding that is the basis for ocaml#1066.
I propose a clarification of the documentation in #1082. |
This manual clarification is intended to lift up the misunderstanding that is the basis for ocaml#1066.
This manual clarification is intended to lift up the misunderstanding that is the basis for ocaml#1066.
This manual clarification is intended to lift up the misunderstanding that is the basis for #1066.
This manual clarification is intended to lift up the misunderstanding that is the basis for #1066.
This manual clarification is intended to lift up the misunderstanding that is the basis for ocaml#1066.
Allow symbols in delimiters for quoted strings.
Quoted strings can be used to embed foreign syntax in OCaml code but the syntactic form is cumbersome. The {| .. |} form has a distinct meaning in OCaml (as an uninterpreted string) so the only safe approach is to use the {| .. |} form. Since is limited to lower-case characters and '_', this leads to some verbose code, e.g. {filter| some text |filter}.
Camlp4 uses the << .. >> quotation marks for text to be filtered by a preprocessor. Something similar could be constructed using quoted strings if symbols were allowed in . E.g. using
*
forfilter
,{*| some text |*}
.This patch set implements that support. The first patch adds tests for the existing implementation of quoted strings. The second patch adds the symbols from the operator-char class (excluding '|') to the characters allowed in the delimiiter of a quoted string and updates the documentation and tests. Both patches were tested with 'make tests'.
A better alternative might be to adopt the << .. >> symbols as quotation marks for strings intended to be expanded by a filter but that would be a larger change.