New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quoted strings #1066

Closed
wants to merge 2 commits into
base: trunk
from

Conversation

Projects
None yet
5 participants
@mwb-cde

mwb-cde commented Feb 26, 2017

Allow symbols in delimiters for quoted strings.

Quoted strings can be used to embed foreign syntax in OCaml code but the syntactic form is cumbersome. The {| .. |} form has a distinct meaning in OCaml (as an uninterpreted string) so the only safe approach is to use the {| .. |} form. Since is limited to lower-case characters and '_', this leads to some verbose code, e.g. {filter| some text |filter}.

Camlp4 uses the << .. >> quotation marks for text to be filtered by a preprocessor. Something similar could be constructed using quoted strings if symbols were allowed in . E.g. using * for filter, {*| some text |*}.

This patch set implements that support. The first patch adds tests for the existing implementation of quoted strings. The second patch adds the symbols from the operator-char class (excluding '|') to the characters allowed in the delimiiter of a quoted string and updates the documentation and tests. Both patches were tested with 'make tests'.

A better alternative might be to adopt the << .. >> symbols as quotation marks for strings intended to be expanded by a filter but that would be a larger change.

@dra27

Any particular reason not to allow # - it's permitted in operators as well, but under more restricted rules.

Show outdated Hide outdated parsing/lexer.mll
@@ -290,6 +290,9 @@ let identchar_latin1 =
['A'-'Z' 'a'-'z' '_' '\192'-'\214' '\216'-'\246' '\248'-'\255' '\'' '0'-'9']
let symbolchar =
['!' '$' '%' '&' '*' '+' '-' '.' '/' ':' '<' '=' '>' '?' '@' '^' '|' '~']
let quoted_string_id_char =
lowercase |
['!' '$' '%' '&' '*' '+' '-' '.' '/' ':' '<' '=' '>' '?' '@' '^' '~']

This comment has been minimized.

@dra27

dra27 Feb 26, 2017

Contributor

This could be more clearly defined as:

let common_symbolchar =
  ['!' '$' '%' '&' '*' '+' '-' '.' '/' ':' '<' '=' '>' '?' '@' '^' '~']
let symbolchar =
  common_symbolchar | '|'
let quoted_string_id_char =
  lowercase | common_symbol_char
@dra27

dra27 Feb 26, 2017

Contributor

This could be more clearly defined as:

let common_symbolchar =
  ['!' '$' '%' '&' '*' '+' '-' '.' '/' ':' '<' '=' '>' '?' '@' '^' '~']
let symbolchar =
  common_symbolchar | '|'
let quoted_string_id_char =
  lowercase | common_symbol_char

This comment has been minimized.

@mwb-cde

mwb-cde Feb 26, 2017

No particular reason, I just took the existing symbolchars characters as a reasonable starting set. I can respin with '#' if that's wanted.

@mwb-cde

mwb-cde Feb 26, 2017

No particular reason, I just took the existing symbolchars characters as a reasonable starting set. I can respin with '#' if that's wanted.

@dra27

This comment has been minimized.

Show comment
Hide comment
@dra27

dra27 Feb 26, 2017

Contributor

Patch looks basically fine to me, apart from needing a Changes entry (nice to have test cases too). I don't have a particular opinion as to the desirability of the change. /cc @alainfrisch?

Contributor

dra27 commented Feb 26, 2017

Patch looks basically fine to me, apart from needing a Changes entry (nice to have test cases too). I don't have a particular opinion as to the desirability of the change. /cc @alainfrisch?

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Feb 26, 2017

Member

It would be nice to also have a test that pretty-printing the strings back works as expected.

(Note that changing the lexical syntax requires changes in OCaml-manipulating tools, so it comes at a higher cost than other sorts of simple changes. This means that the value of the proposed change has to be very convincing.)

Member

gasche commented Feb 26, 2017

It would be nice to also have a test that pretty-printing the strings back works as expected.

(Note that changing the lexical syntax requires changes in OCaml-manipulating tools, so it comes at a higher cost than other sorts of simple changes. This means that the value of the proposed change has to be very convincing.)

@mwb-cde

This comment has been minimized.

Show comment
Hide comment
@mwb-cde

mwb-cde Feb 26, 2017

Respun the patches:

  • Added '#' to the characters.
  • Updated the Changes file, adding an item to 'Language features'.

@gasche: If you could point me to an example of a pretty-printing test, I'll add one for the quoted strings.

mwb-cde commented Feb 26, 2017

Respun the patches:

  • Added '#' to the characters.
  • Updated the Changes file, adding an item to 'Language features'.

@gasche: If you could point me to an example of a pretty-printing test, I'll add one for the quoted strings.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Feb 26, 2017

Member

I believe that the easiest way is to also add your examples to testsuite/tests/parsetree/source.ml.

Member

gasche commented Feb 26, 2017

I believe that the easiest way is to also add your examples to testsuite/tests/parsetree/source.ml.

Show outdated Hide outdated Changes
@@ -5,6 +5,9 @@ Working version
### Language features:
- GPR#1066: Allow symbol characters in delimiters for quoted
strings. (Matthew Wahab)

This comment has been minimized.

@dra27

dra27 Feb 26, 2017

Contributor

Please follow the style of other entries - there should be a newline before the left parenthesis.

@dra27

dra27 Feb 26, 2017

Contributor

Please follow the style of other entries - there should be a newline before the left parenthesis.

{@| r |@};
{^| s |^};
{~| t |~};
{#| u |#};

This comment has been minimized.

@dra27

dra27 Feb 26, 2017

Contributor

Possibly also worth a case which is a mix of lowercase and symbols?

@dra27

dra27 Feb 26, 2017

Contributor

Possibly also worth a case which is a mix of lowercase and symbols?

@mwb-cde

This comment has been minimized.

Show comment
Hide comment
@mwb-cde

mwb-cde Feb 26, 2017

The justification for the change is a little subjective. At the moment, I embed text into OCaml code that is intended to be use by library code. Camlp4 allows this to be done using the quotations. For instance << ... >> or <:expr< .. >> can be expanded to (Expr.parse ".."). These quotations are both concise and easy to spot in OCaml code

The alternatives using the current OCaml extensions and ppx filters seem to be:

  • Quoted strings: Something like {expr| .. |expr}.
  • Extension nodes: [%expr ".."].
  • Attributes: ".." [%expr]
    These don't seem to offer much over just defining a a global function let expr = Expr.parse and using it as (expr ".."), which also has the benefit of not needing a preprocessor.

So the justification would be that there is currently no equivalent to the Camlp4 quotations that is as concise (and the counter-argument is that there are simple alternatives).

mwb-cde commented Feb 26, 2017

The justification for the change is a little subjective. At the moment, I embed text into OCaml code that is intended to be use by library code. Camlp4 allows this to be done using the quotations. For instance << ... >> or <:expr< .. >> can be expanded to (Expr.parse ".."). These quotations are both concise and easy to spot in OCaml code

The alternatives using the current OCaml extensions and ppx filters seem to be:

  • Quoted strings: Something like {expr| .. |expr}.
  • Extension nodes: [%expr ".."].
  • Attributes: ".." [%expr]
    These don't seem to offer much over just defining a a global function let expr = Expr.parse and using it as (expr ".."), which also has the benefit of not needing a preprocessor.

So the justification would be that there is currently no equivalent to the Camlp4 quotations that is as concise (and the counter-argument is that there are simple alternatives).

Allow symbol characters in delimiters for quoted strings.
Quoted strings using the {<name>| .. |<name>} syntax can be used to embed
text to be expanded by ppx preprocessors. This patch adds a set of symbols
to the characters allowed in <name> and updates the documentation and tests.

The added symbols are: '!', '$', '%', '&', '*', '+', '-', '.', '/', ':',
'<', '=', '>', '?', '@', '^', '~' and '#'.
@mwb-cde

This comment has been minimized.

Show comment
Hide comment
@mwb-cde

mwb-cde Feb 26, 2017

Updated for most comments. Pretty-printing tests still to come.

mwb-cde commented Feb 26, 2017

Updated for most comments. Pretty-printing tests still to come.

@lpw25

This comment has been minimized.

Show comment
Hide comment
@lpw25

lpw25 Feb 27, 2017

Contributor

The motivation for this change is built on a misunderstanding of what the foo inside {foo| is for. It is not for marking strings to be processed by a pre-processor, it is to avoid the need for escaping |} inside the quote. For pre-processed quotes you should use: [%foo {| ... |}] it is a little verbose but it makes it clear that you are using a pre-processor.

That said, I see no harm in allowing symbol characters inside {| even though you should basically never need to put anything inside of {|.

Contributor

lpw25 commented Feb 27, 2017

The motivation for this change is built on a misunderstanding of what the foo inside {foo| is for. It is not for marking strings to be processed by a pre-processor, it is to avoid the need for escaping |} inside the quote. For pre-processed quotes you should use: [%foo {| ... |}] it is a little verbose but it makes it clear that you are using a pre-processor.

That said, I see no harm in allowing symbol characters inside {| even though you should basically never need to put anything inside of {|.

@mwb-cde

This comment has been minimized.

Show comment
Hide comment
@mwb-cde

mwb-cde Feb 27, 2017

The current documentation seems to suggest that quoted strings are intended, or a least expected, to be used with ppx filters. Delimiters aside, they seem to be well suited for that usage.

mwb-cde commented Feb 27, 2017

The current documentation seems to suggest that quoted strings are intended, or a least expected, to be used with ppx filters. Delimiters aside, they seem to be well suited for that usage.

@lpw25

This comment has been minimized.

Show comment
Hide comment
@lpw25

lpw25 Feb 27, 2017

Contributor

Yeah, I can see how it gives that impression. They are intended for use with ppx extensions, it's just that the way they are intended to be used is: [%foo {| ... |}] rather than {foo| ... |foo}. I'll try and make that clearer in the docs when I have a minute.

Contributor

lpw25 commented Feb 27, 2017

Yeah, I can see how it gives that impression. They are intended for use with ppx extensions, it's just that the way they are intended to be used is: [%foo {| ... |}] rather than {foo| ... |foo}. I'll try and make that clearer in the docs when I have a minute.

@lpw25

This comment has been minimized.

Show comment
Hide comment
@lpw25

lpw25 Feb 27, 2017

Contributor

By the way, the reasons [%foo {| ... |}] is better than {foo| ... |foo} are:

  • With the second one it becomes impossible to ever write some "foo" code which contains |foo}.
  • The first one is obviously a syntax extension. The second one is useful for quoting without syntax extensions (e.g. Re.compile {|^["]*\(foo*\)|}) so you can't be sure which are ordinary strings and which are pre-processed quotations.
Contributor

lpw25 commented Feb 27, 2017

By the way, the reasons [%foo {| ... |}] is better than {foo| ... |foo} are:

  • With the second one it becomes impossible to ever write some "foo" code which contains |foo}.
  • The first one is obviously a syntax extension. The second one is useful for quoting without syntax extensions (e.g. Re.compile {|^["]*\(foo*\)|}) so you can't be sure which are ordinary strings and which are pre-processed quotations.
@paurkedal

This comment has been minimized.

Show comment
Hide comment
@paurkedal

paurkedal Mar 4, 2017

How about admitting module-qualified strings like

Expr.{|x + 1|}
Uri.{|http://www.example.org/|}
Re_pcre.{|^["]*\(foo*\)|}

The default behaviour would be that these are replaced by a global variable which is initialised at startup by calling a well-known function from the module, say of_string, compile, or literal, or a new operator-like name ({..||..}). This should obviate the need for a syntax extension for cases like these.

paurkedal commented Mar 4, 2017

How about admitting module-qualified strings like

Expr.{|x + 1|}
Uri.{|http://www.example.org/|}
Re_pcre.{|^["]*\(foo*\)|}

The default behaviour would be that these are replaced by a global variable which is initialised at startup by calling a well-known function from the module, say of_string, compile, or literal, or a new operator-like name ({..||..}). This should obviate the need for a syntax extension for cases like these.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Mar 4, 2017

Member

M.{|...|} clearly means let open M in {|...|}, and allowing any other interpretation would be very wrong.

Member

gasche commented Mar 4, 2017

M.{|...|} clearly means let open M in {|...|}, and allowing any other interpretation would be very wrong.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Mar 4, 2017

Member

As I said previously, any change to the lexical syntax is costly as tool that embed their own parsers have to be changed. Given that the currently proposed ones seems to be weakly motivated (we precisely do not want to use strings as quotations, extesnsions as there for this purpose), I vote to reject the change and close the PR -- despite the fact that I appreciate the care with which the patch was prepared.

I think that the underlying idea that the quotation mechanism allowed by extensions today are a bit too heavy to become as widespread as Camlp4's << .. >> is correct. But repurposing a valid syntactic construction (string) with another meaning cannot be the solution,

Member

gasche commented Mar 4, 2017

As I said previously, any change to the lexical syntax is costly as tool that embed their own parsers have to be changed. Given that the currently proposed ones seems to be weakly motivated (we precisely do not want to use strings as quotations, extesnsions as there for this purpose), I vote to reject the change and close the PR -- despite the fact that I appreciate the care with which the patch was prepared.

I think that the underlying idea that the quotation mechanism allowed by extensions today are a bit too heavy to become as widespread as Camlp4's << .. >> is correct. But repurposing a valid syntactic construction (string) with another meaning cannot be the solution,

@paurkedal

This comment has been minimized.

Show comment
Hide comment
@paurkedal

paurkedal Mar 4, 2017

M.{|...|} clearly means let open M in {|...|}, and allowing any other interpretation would be very wrong.

That could be the interpretation if {|...|} was always translated to a global initialised by a function which in Pervasives is defined as the identify, but which could be shadowed by the local open. That would exclude a simple function name though, to avoid accidental shadowing.

Added: I get the point about not extending the lexer, though, and the above scheme is admittedly not obvious.

paurkedal commented Mar 4, 2017

M.{|...|} clearly means let open M in {|...|}, and allowing any other interpretation would be very wrong.

That could be the interpretation if {|...|} was always translated to a global initialised by a function which in Pervasives is defined as the identify, but which could be shadowed by the local open. That would exclude a simple function name though, to avoid accidental shadowing.

Added: I get the point about not extending the lexer, though, and the above scheme is admittedly not obvious.

@mwb-cde

This comment has been minimized.

Show comment
Hide comment
@mwb-cde

mwb-cde Mar 5, 2017

This change was based on my understanding that quoted strings were intended to be used for embedding foreign syntax for expansion by a ppx filter (because that's what the documentation said). Since that's incorrect, I'll close this PR. The best approach for my use-case seems to be a global parsing function rather than a ppx filter.

mwb-cde commented Mar 5, 2017

This change was based on my understanding that quoted strings were intended to be used for embedding foreign syntax for expansion by a ppx filter (because that's what the documentation said). Since that's incorrect, I'll close this PR. The best approach for my use-case seems to be a global parsing function rather than a ppx filter.

@mwb-cde mwb-cde closed this Mar 5, 2017

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Mar 5, 2017

Member

The document is correct in saying that quoted strings are useful for embedding foreign syntax, but this embedding should happen within an extension node (as everything else that is foreign). Without the quoted string, you cannot embed foreign syntax without annoying escapes, and without extension nodes the programmer cannot tell which part of their programs have a different semantics.

Member

gasche commented Mar 5, 2017

The document is correct in saying that quoted strings are useful for embedding foreign syntax, but this embedding should happen within an extension node (as everything else that is foreign). Without the quoted string, you cannot embed foreign syntax without annoying escapes, and without extension nodes the programmer cannot tell which part of their programs have a different semantics.

gasche added a commit to gasche/ocaml that referenced this pull request Mar 5, 2017

Quoted string and extension nodes: manual clarification
This manual clarification is intended to lift up the misunderstanding
that is the basis for #1066.
@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Mar 5, 2017

Member

I propose a clarification of the documentation in #1082.

Member

gasche commented Mar 5, 2017

I propose a clarification of the documentation in #1082.

gasche added a commit to gasche/ocaml that referenced this pull request Mar 5, 2017

Quoted string and extension nodes: manual clarification
This manual clarification is intended to lift up the misunderstanding
that is the basis for #1066.

gasche added a commit to gasche/ocaml that referenced this pull request Mar 5, 2017

Quoted string and extension nodes: manual clarification
This manual clarification is intended to lift up the misunderstanding
that is the basis for #1066.

gasche added a commit that referenced this pull request Mar 5, 2017

Quoted string and extension nodes: manual clarification
This manual clarification is intended to lift up the misunderstanding
that is the basis for #1066.

gasche added a commit that referenced this pull request Mar 5, 2017

Quoted string and extension nodes: manual clarification
This manual clarification is intended to lift up the misunderstanding
that is the basis for #1066.

camlspotter pushed a commit to camlspotter/ocaml that referenced this pull request Oct 17, 2017

Quoted string and extension nodes: manual clarification
This manual clarification is intended to lift up the misunderstanding
that is the basis for #1066.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment