Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

comments are not properly preserved following folded/literal string #326

Closed
efd6 opened this issue Nov 23, 2022 · 1 comment · Fixed by #330
Closed

comments are not properly preserved following folded/literal string #326

efd6 opened this issue Nov 23, 2022 · 1 comment · Fixed by #330

Comments

@efd6
Copy link
Contributor

efd6 commented Nov 23, 2022

If a comment follows a field using folded or literal style strings, the first line of the comment is lost, for example (https://play.golang.com/p/mN4v86pSwP8 https://play.golang.com/p/xHeGkd9bxDu https://play.golang.com/p/uzQ8WGzA4gs https://play.golang.com/p/wDYKVpHPdSL)

document:
  - key1:
      key2: |
        Some text.

  # This is an important comment explaining key3.
  # Second line of comment.
  - key3:
      value: 1

becomes

document:
  - key1:
      key2: |
        Some text.
  # Second line of comment.
  - key3:
      value: 1

This does not happen with quoted or unquoted strings (https://play.golang.com/p/dc2RCZmRrIR https://play.golang.com/p/tjBLgB8OaG9).

This appears to be due to incorrect association of comments with nodes in the AST during parsing.

The YAML document above has the following AST comment associations (https://play.golang.com/p/3WtBBjARGXF)

comment: # This is an important comment explaining key3.
associated node:
      key2: |
        Some text.

comment: # Second line of comment.
associated node:
document:
  - key1:
      key2: |
        Some text.
  # Second line of comment.
  - key3:
      value: 1

Where we can see that the first line of the comments is incorrectly associated with the key2 node, but is not emitted, and the second line is correctly associated with the first element of the sequence via the ValueComments slice.

@efd6
Copy link
Contributor Author

efd6 commented Dec 11, 2022

I have done some more digging for this and the issue lies in the scanner; it does not correctly mark token positions for the folded/literal string types.

This can be seen in the token stream for the YAML text in the issue. We can see that the literal token has both a negative column and the line corresponds to the token end (line 7) rather than the token start (line 5).

https://go.dev/play/p/Yp6FW_SxuXh

                                             "document" [level:0,line:2,column:1,offset:2]
                                             ":" [level:0,line:2,column:9,offset:10]
                                             "-" [level:1,line:3,column:3,offset:14]
                                             "key1" [level:1,line:3,column:5,offset:16]
                                             ":" [level:1,line:3,column:9,offset:20]
                                             "key2" [level:2,line:4,column:7,offset:28]
                                             ":" [level:2,line:4,column:11,offset:32]
                                             "|" [level:2,line:4,column:13,offset:34]
negative column >>>>>> wrong line (5) >>>>>> "Some text.\n\n" [level:1,line:7,column:-9,offset:45]
                                             " This is an important comment explaining key 3." [level:1,line:7,column:3,offset:57]
                                             " Second line of comment." [level:1,line:8,column:3,offset:108]
                                             "-" [level:1,line:9,column:3,offset:136]
                                             "key3" [level:1,line:9,column:5,offset:138]
                                             ":" [level:1,line:9,column:9,offset:142]
                                             "value" [level:2,line:10,column:7,offset:150]
                                             ":" [level:2,line:10,column:12,offset:155]
                                             "1" [level:2,line:10,column:14,offset:158]

This looks like it is happening here because the buffer size is being used to calculate the column and the function assumes that the token is a single line (it looks like this is touched elsewhere, but that code doesn't seem to repair the issue).

Column: s.column - size,

func (s *Scanner) bufferedToken(ctx *Context) *token.Token {
	if s.savedPos != nil {
		tk := ctx.bufferedToken(s.savedPos)
		s.savedPos = nil
		return tk
	}
	size := len(ctx.buf)
	return ctx.bufferedToken(&token.Position{
		Line:        s.line,
		Column:      s.column - size,
		Offset:      s.offset - size,
		IndentNum:   s.indentNum,
		IndentLevel: s.indentLevel,
	})
}

Ultimately, the association of the comment with the wrong node happens becuase here the lines match incorrectly.

return tk.Position.Line == node.GetToken().Position.Line

func (p *parser) isSameLineComment(tk *token.Token, node ast.Node) bool {
	if tk == nil {
		return false
	}
	if tk.Type != token.CommentType {
		return false
	}
	return tk.Position.Line == node.GetToken().Position.Line
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant