Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quoted strings and octal character literals in ocamllex actions #1912

Merged
merged 8 commits into from Sep 17, 2018

Conversation

Projects
None yet
4 participants
@314eter
Copy link
Contributor

commented Jul 17, 2018

This contains the same fix as #1901, but for ocamllex actions. Without this, actions like

rule token = parse
  | 'a' { f' '"' }
  | 'b' { f2 '\o170' '"' }
  | 'c' { f1 "\u{1F42B}" }
  | 'd' { f1 {|}|} }
  | 'e' { (* " *) } (* " *) }

would not compile, while they contain valid ocaml code.

The changes are

  • recognize identifiers containing single quotes
  • recognize octal character literals
  • recognize unicode character escape sequences
  • recognize quoted strings

I tried to keep it a bugfix, not a language change. But as a side effect, octal and unicode escape sequences are now allowed in ocamllex strings too.

@diml

This comment has been minimized.

Copy link
Member

commented Jul 18, 2018

One thing I wonder is if we could simply reuse parsing/lexer.mll directly. That would avoid having to maintain two lexers and would ensure that the lexical conventions are always the same.

@damiendoligez

This comment has been minimized.

Copy link
Member

commented Jul 20, 2018

But as a side effect, octal and unicode escape sequences are now allowed in ocamllex strings too.

That counts as a bug-fix.

@damiendoligez
Copy link
Member

left a comment

Pretty good, but the assert false question needs to be addressed.

warning lexbuf
(Printf.sprintf
"illegal uchar escape in string: '\\u{%s}'" s);
store_string_uchar (Uchar.unsafe_of_int v);

This comment has been minimized.

Copy link
@damiendoligez

damiendoligez Jul 20, 2018

Member

Unless I'm mistaken, this will assert false in buffer.ml (after printing the warning) if v is too large or negative.

This comment has been minimized.

Copy link
@314eter

314eter Jul 22, 2018

Author Contributor

I'll make the warning an error. Decimal escape sequences have the same problem (e.g. "\256").

quoted_string delim lexbuf }
| _
{ quoted_string delim lexbuf }
(*
Lexers comment and action are quite similar,
they should lex both strings and characters,

This comment has been minimized.

Copy link
@damiendoligez

damiendoligez Jul 20, 2018

Member

You should update this comment.

| '{' (['a'-'z' '_'] * as delim) '|'
{ quoted_string delim lexbuf;
comment lexbuf }
| '\''

This comment has been minimized.

Copy link
@damiendoligez

damiendoligez Jul 20, 2018

Member

You change this one from "'" to '\'' but not the one in [action] (line 321). Why?

This comment has been minimized.

Copy link
@314eter

314eter Jul 22, 2018

Author Contributor

I found it more logical, but apparently "'" is more used in this file.

@314eter

This comment has been minimized.

Copy link
Contributor Author

commented Jul 22, 2018

That counts as a bug-fix.

In that case, I suppose octal escape sequences should be allowed in characters too?

314eter added some commits Jul 22, 2018

@damiendoligez

This comment has been minimized.

Copy link
Member

commented Sep 11, 2018

Looks good now.

@314eter 314eter referenced this pull request Sep 16, 2018

Merged

Fixes in lexers #2049

@gasche gasche merged commit 632df7c into ocaml:trunk Sep 17, 2018

0 of 2 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details

damiendoligez added a commit to damiendoligez/ocaml that referenced this pull request Nov 5, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.