Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quoted strings #1066

Closed
wants to merge 2 commits into from
Closed

Quoted strings #1066

wants to merge 2 commits into from

Conversation

mwb-cde
Copy link

@mwb-cde mwb-cde commented Feb 26, 2017

Allow symbols in delimiters for quoted strings.

Quoted strings can be used to embed foreign syntax in OCaml code but the syntactic form is cumbersome. The {| .. |} form has a distinct meaning in OCaml (as an uninterpreted string) so the only safe approach is to use the {| .. |} form. Since is limited to lower-case characters and '_', this leads to some verbose code, e.g. {filter| some text |filter}.

Camlp4 uses the << .. >> quotation marks for text to be filtered by a preprocessor. Something similar could be constructed using quoted strings if symbols were allowed in . E.g. using * for filter, {*| some text |*}.

This patch set implements that support. The first patch adds tests for the existing implementation of quoted strings. The second patch adds the symbols from the operator-char class (excluding '|') to the characters allowed in the delimiiter of a quoted string and updates the documentation and tests. Both patches were tested with 'make tests'.

A better alternative might be to adopt the << .. >> symbols as quotation marks for strings intended to be expanded by a filter but that would be a larger change.

Copy link
Member

@dra27 dra27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason not to allow # - it's permitted in operators as well, but under more restricted rules.

@@ -290,6 +290,9 @@ let identchar_latin1 =
['A'-'Z' 'a'-'z' '_' '\192'-'\214' '\216'-'\246' '\248'-'\255' '\'' '0'-'9']
let symbolchar =
['!' '$' '%' '&' '*' '+' '-' '.' '/' ':' '<' '=' '>' '?' '@' '^' '|' '~']
let quoted_string_id_char =
lowercase |
['!' '$' '%' '&' '*' '+' '-' '.' '/' ':' '<' '=' '>' '?' '@' '^' '~']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be more clearly defined as:

let common_symbolchar =
  ['!' '$' '%' '&' '*' '+' '-' '.' '/' ':' '<' '=' '>' '?' '@' '^' '~']
let symbolchar =
  common_symbolchar | '|'
let quoted_string_id_char =
  lowercase | common_symbol_char

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason, I just took the existing symbolchars characters as a reasonable starting set. I can respin with '#' if that's wanted.

@dra27
Copy link
Member

dra27 commented Feb 26, 2017

Patch looks basically fine to me, apart from needing a Changes entry (nice to have test cases too). I don't have a particular opinion as to the desirability of the change. /cc @alainfrisch?

@gasche
Copy link
Member

gasche commented Feb 26, 2017

It would be nice to also have a test that pretty-printing the strings back works as expected.

(Note that changing the lexical syntax requires changes in OCaml-manipulating tools, so it comes at a higher cost than other sorts of simple changes. This means that the value of the proposed change has to be very convincing.)

@mwb-cde
Copy link
Author

mwb-cde commented Feb 26, 2017

Respun the patches:

  • Added '#' to the characters.
  • Updated the Changes file, adding an item to 'Language features'.

@gasche: If you could point me to an example of a pretty-printing test, I'll add one for the quoted strings.

@gasche
Copy link
Member

gasche commented Feb 26, 2017

I believe that the easiest way is to also add your examples to testsuite/tests/parsetree/source.ml.

Changes Outdated
@@ -5,6 +5,9 @@ Working version

### Language features:

- GPR#1066: Allow symbol characters in delimiters for quoted
strings. (Matthew Wahab)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the style of other entries - there should be a newline before the left parenthesis.

{@| r |@};
{^| s |^};
{~| t |~};
{#| u |#};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly also worth a case which is a mix of lowercase and symbols?

@mwb-cde
Copy link
Author

mwb-cde commented Feb 26, 2017

The justification for the change is a little subjective. At the moment, I embed text into OCaml code that is intended to be use by library code. Camlp4 allows this to be done using the quotations. For instance << ... >> or <:expr< .. >> can be expanded to (Expr.parse ".."). These quotations are both concise and easy to spot in OCaml code

The alternatives using the current OCaml extensions and ppx filters seem to be:

  • Quoted strings: Something like {expr| .. |expr}.
  • Extension nodes: [%expr ".."].
  • Attributes: ".." [%expr]
    These don't seem to offer much over just defining a a global function let expr = Expr.parse and using it as (expr ".."), which also has the benefit of not needing a preprocessor.

So the justification would be that there is currently no equivalent to the Camlp4 quotations that is as concise (and the counter-argument is that there are simple alternatives).

Quoted strings using the {<name>| .. |<name>} syntax can be used to embed
text to be expanded by ppx preprocessors. This patch adds a set of symbols
to the characters allowed in <name> and updates the documentation and tests.

The added symbols are: '!', '$', '%', '&', '*', '+', '-', '.', '/', ':',
'<', '=', '>', '?', '@', '^', '~' and '#'.
@mwb-cde
Copy link
Author

mwb-cde commented Feb 26, 2017

Updated for most comments. Pretty-printing tests still to come.

@lpw25
Copy link
Contributor

lpw25 commented Feb 27, 2017

The motivation for this change is built on a misunderstanding of what the foo inside {foo| is for. It is not for marking strings to be processed by a pre-processor, it is to avoid the need for escaping |} inside the quote. For pre-processed quotes you should use: [%foo {| ... |}] it is a little verbose but it makes it clear that you are using a pre-processor.

That said, I see no harm in allowing symbol characters inside {| even though you should basically never need to put anything inside of {|.

@mwb-cde
Copy link
Author

mwb-cde commented Feb 27, 2017

The current documentation seems to suggest that quoted strings are intended, or a least expected, to be used with ppx filters. Delimiters aside, they seem to be well suited for that usage.

@lpw25
Copy link
Contributor

lpw25 commented Feb 27, 2017

Yeah, I can see how it gives that impression. They are intended for use with ppx extensions, it's just that the way they are intended to be used is: [%foo {| ... |}] rather than {foo| ... |foo}. I'll try and make that clearer in the docs when I have a minute.

@lpw25
Copy link
Contributor

lpw25 commented Feb 27, 2017

By the way, the reasons [%foo {| ... |}] is better than {foo| ... |foo} are:

  • With the second one it becomes impossible to ever write some "foo" code which contains |foo}.
  • The first one is obviously a syntax extension. The second one is useful for quoting without syntax extensions (e.g. Re.compile {|^["]*\(foo*\)|}) so you can't be sure which are ordinary strings and which are pre-processed quotations.

@paurkedal
Copy link

How about admitting module-qualified strings like

Expr.{|x + 1|}
Uri.{|http://www.example.org/|}
Re_pcre.{|^["]*\(foo*\)|}

The default behaviour would be that these are replaced by a global variable which is initialised at startup by calling a well-known function from the module, say of_string, compile, or literal, or a new operator-like name ({..||..}). This should obviate the need for a syntax extension for cases like these.

@gasche
Copy link
Member

gasche commented Mar 4, 2017

M.{|...|} clearly means let open M in {|...|}, and allowing any other interpretation would be very wrong.

@gasche
Copy link
Member

gasche commented Mar 4, 2017

As I said previously, any change to the lexical syntax is costly as tool that embed their own parsers have to be changed. Given that the currently proposed ones seems to be weakly motivated (we precisely do not want to use strings as quotations, extesnsions as there for this purpose), I vote to reject the change and close the PR -- despite the fact that I appreciate the care with which the patch was prepared.

I think that the underlying idea that the quotation mechanism allowed by extensions today are a bit too heavy to become as widespread as Camlp4's << .. >> is correct. But repurposing a valid syntactic construction (string) with another meaning cannot be the solution,

@paurkedal
Copy link

paurkedal commented Mar 4, 2017

M.{|...|} clearly means let open M in {|...|}, and allowing any other interpretation would be very wrong.

That could be the interpretation if {|...|} was always translated to a global initialised by a function which in Pervasives is defined as the identify, but which could be shadowed by the local open. That would exclude a simple function name though, to avoid accidental shadowing.

Added: I get the point about not extending the lexer, though, and the above scheme is admittedly not obvious.

@mwb-cde
Copy link
Author

mwb-cde commented Mar 5, 2017

This change was based on my understanding that quoted strings were intended to be used for embedding foreign syntax for expansion by a ppx filter (because that's what the documentation said). Since that's incorrect, I'll close this PR. The best approach for my use-case seems to be a global parsing function rather than a ppx filter.

@mwb-cde mwb-cde closed this Mar 5, 2017
@gasche
Copy link
Member

gasche commented Mar 5, 2017

The document is correct in saying that quoted strings are useful for embedding foreign syntax, but this embedding should happen within an extension node (as everything else that is foreign). Without the quoted string, you cannot embed foreign syntax without annoying escapes, and without extension nodes the programmer cannot tell which part of their programs have a different semantics.

gasche added a commit to gasche/ocaml that referenced this pull request Mar 5, 2017
This manual clarification is intended to lift up the misunderstanding
that is the basis for ocaml#1066.
@gasche
Copy link
Member

gasche commented Mar 5, 2017

I propose a clarification of the documentation in #1082.

gasche added a commit to gasche/ocaml that referenced this pull request Mar 5, 2017
This manual clarification is intended to lift up the misunderstanding
that is the basis for ocaml#1066.
gasche added a commit to gasche/ocaml that referenced this pull request Mar 5, 2017
This manual clarification is intended to lift up the misunderstanding
that is the basis for ocaml#1066.
gasche added a commit that referenced this pull request Mar 5, 2017
This manual clarification is intended to lift up the misunderstanding
that is the basis for #1066.
gasche added a commit that referenced this pull request Mar 5, 2017
This manual clarification is intended to lift up the misunderstanding
that is the basis for #1066.
camlspotter pushed a commit to camlspotter/ocaml that referenced this pull request Oct 17, 2017
This manual clarification is intended to lift up the misunderstanding
that is the basis for ocaml#1066.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants