Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR#6604: Only allow directives with filename and at the beginning of the line #931

Merged
merged 7 commits into from Oct 4, 2017

Conversation

Projects
None yet
7 participants
@tadeuzagallo
Copy link
Contributor

tadeuzagallo commented Nov 20, 2016

This updates the lexer to enforce that directives have a filename and are placed at the beginning of the line, as suggested in the comments in the Mantis PR referred to in the title.

I had to update one existing test, and added 2 more to verify the new intended behaviour.

@bobzhang

This comment has been minimized.

Copy link
Member

bobzhang commented Nov 20, 2016

would you add a test case for the first line?

# 1 "xx.ml" (* this should still be fine *)
@tadeuzagallo

This comment has been minimized.

Copy link
Contributor Author

tadeuzagallo commented Nov 20, 2016

Added it. Also included a couple more directives to test the non-first line case and two consecutive directives (which was the reason I had to add the code to "put back" the newline at the end of a directive).

@dra27

This comment has been minimized.

Copy link
Contributor

dra27 commented Nov 20, 2016

Could you add an entry to the Changes file (worth heading it with both "PR#6604, GPR#931")?

@tadeuzagallo

This comment has been minimized.

Copy link
Contributor Author

tadeuzagallo commented Nov 20, 2016

Oops, forgot about it, sorry. I was unsure under which section to add it, so I just added to "Bug fixes", let me know if that's not the best place for it. I'm also not sure whether the name of the reporter is correct, from Mantis I can only see the username, yet everyone seems to use real names in the changelog.

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Nov 20, 2016

Regarding the Changelog:

  • You should use a bullet-point of the form * rather than -, as this change can break working programs.
  • Entries should be sorted by MPR number first, then by GPR number. As all other entries in this section have no MPR number, your entry should go first.

Regarding the patch itself: the position handling is non-trivial and would need a careful review. In particular, we should make sure that the change does not affect the lexing positions of other tokens (than the lexer directives) in the source.

@tadeuzagallo tadeuzagallo force-pushed the tadeuzagallo:pr6604 branch from b8ca15f to 3f8400c Nov 20, 2016

@tadeuzagallo

This comment has been minimized.

Copy link
Contributor Author

tadeuzagallo commented Nov 20, 2016

I've updated the changelog, thanks.

With regards to the position handling, my thinking was that: as long as the Invalid_directive errors were reported correctly it'd be correct, as the directive will override the position. Would that be correct?

@dra27

This comment has been minimized.

Copy link
Contributor

dra27 commented Nov 20, 2016

Agreed - I've been staring at the position manipulation code for a while, trying to convince myself it's definitely correct!

@tadeuzagallo tadeuzagallo force-pushed the tadeuzagallo:pr6604 branch from 5007b8a to 79a8eb2 Nov 20, 2016

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Dec 4, 2016

I'm not a big fan of pushing back already-read data by tweaking fields of a lexbuf. Is there no other way?

I was not expecting to have an error if a directive is recognized not at beginning of line. It would feel more natural to me to just analyze it as if it was not a directive, but instead # followed by a number, etc.

@tadeuzagallo tadeuzagallo force-pushed the tadeuzagallo:pr6604 branch from 3177c95 to a09625a Dec 4, 2016

@tadeuzagallo

This comment has been minimized.

Copy link
Contributor Author

tadeuzagallo commented Dec 4, 2016

Thanks for the feedback. I think I found a cleaner solution, the only problem is that the "line out of bounds" errors will be reported as starting on column 1 rather than 0, as the hash was matched first. Not sure if this is a problem though.

@mshinwell

This comment has been minimized.

Copy link
Contributor

mshinwell commented Aug 10, 2017

@tadeuzagallo Is this ready for review? If so please rebase.

@damiendoligez

This comment has been minimized.

Copy link
Member

damiendoligez commented Aug 21, 2017

It's really not a problem that the error report is off by 1 character.

@damiendoligez damiendoligez added this to the 4.06.0 milestone Sep 29, 2017

@xavierleroy
Copy link
Contributor

xavierleroy left a comment

This looks good to me and could be merged as is. I wonder if a slightly simpler implementation is possible, see comment in code.

| newline "#"
{ update_loc lexbuf None 1 false 1;
try directive lexbuf
with Failure(_) -> HASH
}

This comment has been minimized.

@xavierleroy

xavierleroy Oct 3, 2017

Contributor

This looks reasonable, but is there any chance of unifying these two cases? Something like

    if not (at_beginning_of_line lexbuf.lex_start_p)
    then HASH
    else try directive lexbuf with Failure _ -> HASH

where, tentatively,

let at_beginning_of_line pos = (pos.pos_cnum = pos.pos_bol)

Just a suggestion: I didn't try and I'm not sure it would work.

This comment has been minimized.

@tadeuzagallo

tadeuzagallo Oct 3, 2017

Author Contributor

This worked perfectly, thanks.

@tadeuzagallo tadeuzagallo force-pushed the tadeuzagallo:pr6604 branch from a09625a to 7bf6d5d Oct 3, 2017

@tadeuzagallo

This comment has been minimized.

Copy link
Contributor Author

tadeuzagallo commented Oct 3, 2017

Sorry for the delay, I have rebase and updated the code with the suggestions.

@dra27

This comment has been minimized.

Copy link
Contributor

dra27 commented Oct 4, 2017

Looks good to me

@gasche/@damiendoligez - is this definitely OK for 4.06?

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Oct 4, 2017

I appreciate the change, but I'm not convinced that it is totally without risk, and I think the issue it is fixing can tolerate a delay. (I think the cost of a potential regression noticed only after the second beta offset the gains.) I would personally vote for inclusion (now) in trunk, but not in the release branch.

@dra27

This comment has been minimized.

Copy link
Contributor

dra27 commented Oct 4, 2017

@gasche - that was my feeling too. So let's remove it from the milestone but merge to trunk now

@dra27 dra27 removed this from the 4.06.0 milestone Oct 4, 2017

@dra27 dra27 merged commit 74ca5ee into ocaml:trunk Oct 4, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@gasche

This comment has been minimized.

Copy link
Member

gasche commented Oct 4, 2017

Thanks! And thanks again @tadeuzagallo.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Oct 7, 2017

Since this PR was merged the CI at Inria is broken but only under OpenBSD (i386). The breakage is in ocamldoc/:

../boot/ocamlrun ../ocamlc -nostdlib -I ../stdlib -pp 'sh ./remove_DEBUG' -I ../parsing -I ../utils -I ../typing -I ../driver -I ../bytecomp -I ../toplevel -I ../stdlib -I ../compilerlibs -I ../otherlibs/str -I ../otherlibs/dynlink -I ../otherlibs/unix -I ../otherlibs/graph -absname -w +a-4-9-41-42-44-45-48 -warn-error A -safe-string -strict-sequence -strict-formats -bin-annot -c odoc_text_parser.ml
File "/home/ci/workspace/main/flambda/false/slave/ocaml-openbsd-32/ocamldoc/odoc_text_parser.mly", line 27, characters 31-33:
Error: Syntax error

Probably a bad interaction with the remove_DEBUG script and OpenBSD's implementation of sed. To be investigated.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Oct 7, 2017

This is what the file looks like after sed preprocessing:

(* DEBUG statement removed *)# 82 "odoc_text_parser.ml"
let yytransl_const = [|
@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Oct 7, 2017

Plausible fix in [trunk 801993d] . Waiting for a round of CI to confirm.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Oct 7, 2017

CI looks happy.

@dra27

This comment has been minimized.

Copy link
Contributor

dra27 commented Oct 7, 2017

Excellent - though unexpected, it's sort of evidence that it's working, right?

Before long, that script needs to be changed anyway - when we start building OCaml using the Linux subsystem on Windows, using scripts with -pp will become "non-portable" behaviour, so we'll probabaly want to preprocess the files manually.

314eter added a commit to 314eter/tree-sitter-ocaml that referenced this pull request Jan 15, 2019

314eter added a commit to 314eter/tree-sitter-ocaml that referenced this pull request Jan 19, 2019

maxbrunsfeld added a commit to tree-sitter/tree-sitter-ocaml that referenced this pull request Feb 7, 2019

Support new features (#22)
* Fix problems with character literals in comments

* Do not parse unclosed comments and quoted strings

* Update known failures

* Only allow line number directives with filename (ocaml/ocaml#931)

* Rename dot_operator to indexing_operator

* Disallow .~ in indexing operator (ocaml/ocaml#2106)

* Add test for indexing operator

* Support empty variants (ocaml/ocaml#1546)

* Support binding operators (ocaml/ocaml#1947)

* Use tree-sitter 0.14.0

* Cleanup trvis config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.