-
-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix handle double quoted string #342
Conversation
Thank you for your great PR . I've commented |
Thank you for your review. Where can I find your comments? |
return c == ' ' || c == '\t' | ||
} | ||
|
||
func (s *Scanner) isOnlyWhiteToLineEnds(src []rune, size, idx int) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not get the intent of the name of this function from the process. Is there a more appropriate name ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for commenting.
Before begin, in line 126, maybe isWhiteSpaceChar
is more clear.
- From YAML spec, blank space + tab characters combined symbol is called
White Space Characters
. - This function aims to detect whether a single char is included in the White Space Characters or not.
- Also,
isNewLineChar
is already exists. FollowingisXxxxChar
is better.
And then, How about isWhiteSpaceCharUntilNewLineChar
on this function?
- Just combined: isWhiteSpaceChar until NewLineChar?
- BTW, from YAML spec, it says
Line Break Characters
, but there is already existsisNewLineChar
. So NewLine is more clear instead BreakLine.
I can provide some more ideas with these combinations:
- is/are
- WhiteSpaceChar/WhiteSpaceChars
- To/Until
- NewLine/LineBreak/LineEnds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this name ( isWhiteSpaceCharUntilNewLineChar
or isOnlyWhiteToLineEnds
) itself is good. What I am wondering is that with this English meaning, the following string would be true
. But in fact, it is false
.
<white-space><white-space><new-line>
Specifically, I don't understand why it is returning false when isWhiteChar 🤔 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry. I remembered the details wrong actually this PR was created 2 months ago. And I found there are some mistakes such as for i := idx + 1; idx < size; idx++
(i
is not incrementing) 🙇
YAML spec says...
All leading and trailing white space characters on each line are excluded from the content.
ref https://yaml.org/spec/1.2.2/#example-double-quoted-line-breaks
There is already covered "leading" part.
} else if s.isWhiteChar(c) && isFirstLineChar {
continue
This isOnlyWhiteToLineEnds
function aims to cover "trailing" part.
Now, I have another idea for this topic:
- Create a new process to support the YAML spec instead this
isOnlyWhiteToLineEnds
discussion. - In
scanDoubleQuote
function, if whitespace is found, try to scan the next whitespaces until another char.- If there are multiple whitespaces, keep them as the buffer.
- If another char is newline, drop those whitespaces and continue to scanDoubleQuote loop.
- If another char is not newline, the buffer is joined to
value
to handle as whitespace chars and continue to scanDoubleQuote loop. - If reached the size limit, same as "another char is not newline" case. (btw, this is the unexpected case because it should have
"
as end of double quote)
- notes: This logic can be support "leading" and "trailing" both part.
The reason for why "scan" instead "check", it avoids multiple times check whitespaces like this:
aaa<whites><whites><whites><newline>bbb
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coding image:
} else if s.isWhiteChar(c) {
// scan multiple whitespaces
buf := []rune{}
for ; idx < size; idx++ {
c := src[idx]
if !s.isWhiteChar(c) {
break
}
buf = append(buf, c)
}
// skip if whitespaces are in leading or trailing
if isFirstLineChar || s.isNewLineChar(src[idx]) {
idx-- // handle latest char in main scan loop.
continue
}
// handle as whitespace chars if intermediate whitespaces
value = append(value, buf...)
idx-- // handle latest char in main scan loop.
continue
Thanks for reviewing this PR. I don't have enough time for handle this PR for now. Let me close PR. |
Background
This
Example 7.5 Double Quoted Line Breaks
case is not working properly. Some related cases as well.https://yaml.org/spec/1.2.2/#example-double-quoted-line-breaks
I'd like to handle correctly this behavior.
What are these changes
isWhiteChar
isOnlyWhiteToLineEnds
scanNextEmptyLines
scan
.docStartLine
field in Scanner.a: |\n Text\n
seems not worked properly. It will be used to fix this issue withscanLiteral
changes.scanDoubleQuote
\
is found, ignore the next char.