Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/ast: comment text changes newline type #35803

Closed
aykevl opened this issue Nov 24, 2019 · 5 comments
Closed

go/ast: comment text changes newline type #35803

aykevl opened this issue Nov 24, 2019 · 5 comments
Milestone

Comments

@aykevl
Copy link

@aykevl aykevl commented Nov 24, 2019

What version of Go are you using (go version)?

$ go version
go version go1.13 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What did you do?

I am trying to print accurate source locations for AST comments.
In this specific case, I'm trying to construct a token.Pos at a given byte offset in the .Text field of a *Comment type in the AST. I'm doing that simply by adding the byte offset to the .Slash field:

var comment *ast.Comment = ...
byteOffset := strings.Index(comment.Text, "foo")
pos := comment.Slash + token.Pos(byteOffset)

This works well on Unix-like systems (tested on Linux and macOS). However, Windows is a problem because it uses CRLF line endings by default instead of LF line endings. That wouldn't be a problem if the .Text field also kept the comment in source form (with CRLF line endings) but unfortunately it appears that the parser converts the line endings to LF, thus losing track of exact source locations.

A full program that shows this behavior:
https://play.golang.org/p/8AP05J5rQX8
Replace if true { with if false { to get the expected output.

What did you expect to see?

I would expect the Text field to keep the original line endings as they appear in the source code.
Alternatively, token.FileSet should somehow keep track of the changed line endings but I think that would be much more complicated (and probably be incorrect in other cases).

I don't know whether changing the behavior of the parser (to not convert line endings) would break existing tools, but it would fix the issue for me.

What did you see instead?

The parser appears to convert line endings, thus breaking accurate source location tracking inside comments.

@toothrot toothrot changed the title go/ast comment text changes newline type go/ast: comment text changes newline type Nov 26, 2019
@toothrot toothrot added this to the Backlog milestone Nov 26, 2019
@toothrot
Copy link
Contributor

@toothrot toothrot commented Nov 26, 2019

@griesemer
Copy link
Contributor

@griesemer griesemer commented Nov 26, 2019

Generally, the go/scanner removes CRs in comments (not just at the end, also inside the comment) and elsewhere. Changing this (in the go/scanner) is likely going to break existing tools in subtle ways.

We also want tools such as gofmt to produce the same result independent of whether one runs on a Windows machine or not, which requires that newlines are always represented in one way, using LF.

I would expect this to work if you run the code on gofmt-ed code. Does it not?

@gopherbot
Copy link

@gopherbot gopherbot commented Dec 26, 2019

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

@gopherbot gopherbot closed this Dec 26, 2019
@aykevl
Copy link
Author

@aykevl aykevl commented Dec 30, 2019

We also want tools such as gofmt to produce the same result independent of whether one runs on a Windows machine or not, which requires that newlines are always represented in one way, using LF.

Definitely agree here: gofmt should produce a consistent output on all platforms.

Generally, the go/scanner removes CRs in comments (not just at the end, also inside the comment) and elsewhere. Changing this (in the go/scanner) is likely going to break existing tools in subtle ways.

I understand the reasoning here.

I would expect this to work if you run the code on gofmt-ed code. Does it not?

Probably, I haven't tested it but I guess it will.
In my case, only the Windows CI was failing because indices into parsed comments could not be added to token.Pos, thus breaking error locations which are checked in CI. Errors suddenly pointed to a few characters from the real location, which is at least confusing.

The cause is most likely git: it "helpfully" coverts newlines to CRLF. So while it is true that a simple gofmt would fix it, the files were already formatted but simply changed slightly by git. Therefore I worked around the problem by manually adjusting error locations.

I would understand if you closed this as "working as intended" even though it caused some headaches for me.

@griesemer
Copy link
Contributor

@griesemer griesemer commented Jan 3, 2020

gopherbot closed this - but yes, this is "working as intended" as far as I can tell. If you have any information indicating this is not the case, feel free to reopen with the relevant information. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.