comments are not properly preserved following folded/literal string #326

efd6 · 2022-11-23T00:34:15Z

If a comment follows a field using folded or literal style strings, the first line of the comment is lost, for example (https://play.golang.com/p/mN4v86pSwP8 https://play.golang.com/p/xHeGkd9bxDu https://play.golang.com/p/uzQ8WGzA4gs https://play.golang.com/p/wDYKVpHPdSL)

document:
  - key1:
      key2: |
        Some text.

  # This is an important comment explaining key3.
  # Second line of comment.
  - key3:
      value: 1

becomes

document:
  - key1:
      key2: |
        Some text.
  # Second line of comment.
  - key3:
      value: 1

This does not happen with quoted or unquoted strings (https://play.golang.com/p/dc2RCZmRrIR https://play.golang.com/p/tjBLgB8OaG9).

This appears to be due to incorrect association of comments with nodes in the AST during parsing.

The YAML document above has the following AST comment associations (https://play.golang.com/p/3WtBBjARGXF)

comment: # This is an important comment explaining key3.
associated node:
      key2: |
        Some text.

comment: # Second line of comment.
associated node:
document:
  - key1:
      key2: |
        Some text.
  # Second line of comment.
  - key3:
      value: 1

Where we can see that the first line of the comments is incorrectly associated with the key2 node, but is not emitted, and the second line is correctly associated with the first element of the sequence via the ValueComments slice.

The text was updated successfully, but these errors were encountered:

efd6 · 2022-12-11T04:28:37Z

I have done some more digging for this and the issue lies in the scanner; it does not correctly mark token positions for the folded/literal string types.

This can be seen in the token stream for the YAML text in the issue. We can see that the literal token has both a negative column and the line corresponds to the token end (line 7) rather than the token start (line 5).

https://go.dev/play/p/Yp6FW_SxuXh

                                             "document" [level:0,line:2,column:1,offset:2]
                                             ":" [level:0,line:2,column:9,offset:10]
                                             "-" [level:1,line:3,column:3,offset:14]
                                             "key1" [level:1,line:3,column:5,offset:16]
                                             ":" [level:1,line:3,column:9,offset:20]
                                             "key2" [level:2,line:4,column:7,offset:28]
                                             ":" [level:2,line:4,column:11,offset:32]
                                             "|" [level:2,line:4,column:13,offset:34]
negative column >>>>>> wrong line (5) >>>>>> "Some text.\n\n" [level:1,line:7,column:-9,offset:45]
                                             " This is an important comment explaining key 3." [level:1,line:7,column:3,offset:57]
                                             " Second line of comment." [level:1,line:8,column:3,offset:108]
                                             "-" [level:1,line:9,column:3,offset:136]
                                             "key3" [level:1,line:9,column:5,offset:138]
                                             ":" [level:1,line:9,column:9,offset:142]
                                             "value" [level:2,line:10,column:7,offset:150]
                                             ":" [level:2,line:10,column:12,offset:155]
                                             "1" [level:2,line:10,column:14,offset:158]

This looks like it is happening here because the buffer size is being used to calculate the column and the function assumes that the token is a single line (it looks like this is touched elsewhere, but that code doesn't seem to repair the issue).

go-yaml/scanner/scanner.go

Line 67 in dc3fed8

Column: s.column - size,

func (s *Scanner) bufferedToken(ctx *Context) *token.Token {
	if s.savedPos != nil {
		tk := ctx.bufferedToken(s.savedPos)
		s.savedPos = nil
		return tk
	}
	size := len(ctx.buf)
	return ctx.bufferedToken(&token.Position{
		Line:        s.line,
		Column:      s.column - size,
		Offset:      s.offset - size,
		IndentNum:   s.indentNum,
		IndentLevel: s.indentLevel,
	})
}

Ultimately, the association of the comment with the wrong node happens becuase here the lines match incorrectly.

go-yaml/parser/parser.go

Line 483 in dc3fed8

return tk.Position.Line == node.GetToken().Position.Line

func (p *parser) isSameLineComment(tk *token.Token, node ast.Node) bool {
	if tk == nil {
		return false
	}
	if tk.Type != token.CommentType {
		return false
	}
	return tk.Position.Line == node.GetToken().Position.Line
}

efd6 mentioned this issue Dec 11, 2022

scanner: fix position calculation for literal, folded and raw folded strings #330

Merged

goccy closed this as completed in #330 Dec 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

comments are not properly preserved following folded/literal string #326

comments are not properly preserved following folded/literal string #326

efd6 commented Nov 23, 2022

efd6 commented Dec 11, 2022