Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TXmlReader.ParseComment throws parser error for valid XML comments #11

Closed
tschroff opened this issue Oct 10, 2023 · 2 comments
Closed

Comments

@tschroff
Copy link
Contributor

tschroff commented Oct 10, 2023

Symptom

The while loop in TXmlReader.ParseComment which should handle nested angular brackets seems to be unable to handle the following valid XML comment properly and subsequently the method throws a parser error exception about an invalid comment.

<!--FOO>
    <BAR IDREF="01"/>
</FOO-->

Analysis

Since the while loop expects nested angular brackets to always start with an opening bracket, the number of levels in the above example immediately decreases to 0 at the end of the first line and the loop is exited. The "--" then cannot be found before the current position and the mentiond parser error is raised.

Also, the loop seems to expect an identical amount of opening and closing angular brackets. This seems to be an invalid assumption for a comment that can contain anything.
So I expect this

<!-- All of >>>this<<<<<<<< is a comment -->

to also throw an exception.
I can also think of examples that might cause the parser to leave to scope of the comment:

<!-- All of this is a comment<<<< -->

Suggestions

I think the while loop should look for the first occurrence of the closing sequence "-->" instead of counting the level of nested angular brackets.
Maybe something like this:

procedure TXmlReader.ParseComment;
begin
  var P := FCurrent;
  if (P[0] <> '!') or (P[1] <> '-') or (P[2] <> '-') then
    ParseError(@RS_XML_INVALID_COMMENT);

  Inc(P, 3);
  var Start := P;
  var Closed := False;

  { Move to end of comment. }
  while not Closed do
  begin
    if (P^ = #0) then
      ParseError(@RS_XML_UNEXPECTED_EOF);

    Closed := (P >= (Start + 2)) and (P^ = '>') and (P[-1] = '-') and (P[-2] = '-');
    Inc(P);
  end;

  SetString(FValueString, Start, P - Start - 3);
  FCurrent := P;
end;

Thank you!

@tschroff
Copy link
Contributor Author

I added the following test to expose this issue.

procedure TTestXmlReader.TestIssue11;
begin
  var Doc := TXmlDocument.Create;
  Doc.Parse('<foo><!--bar> </bar--></foo>');
  Doc.Parse('<foo><!-- >>>This is a comment<<< --></foo>');
  Doc.Parse('<foo><!-- This is a comment<<< --></foo>');
  Assert.Pass('Valid comments should pass.');
end;

The test passes with the suggested changes given in my previous comment.

However it seems that TTestXmlReader.TestParseError_InvalidCommentSuffix needs to become TTestXmlReader.TestParseError_UnclosedComment then, expecting a RS_XML_UNEXPECTED_EOF (or a new RS_XML_UNCLOSED_COMMENT) error.

procedure TTestXmlReader.TestParseError_UnclosedComment;
begin
  ExpectParseError('<foo><!--</foo>', RS_XML_UNEXPECTED_EOF, 1, 6);
  ExpectParseError('<foo><!--></foo>', RS_XML_UNEXPECTED_EOF, 1, 6);
  ExpectParseError('<foo><!---></foo>', RS_XML_UNEXPECTED_EOF, 1, 6);
  ExpectParseError('<foo><!--bar-></foo>', RS_XML_UNEXPECTED_EOF, 1, 6);
end;

But I think that would be reasonable.

tschroff added a commit to tschroff/Neslib.Xml that referenced this issue Oct 19, 2023
tschroff added a commit to tschroff/Neslib.Xml that referenced this issue Oct 19, 2023
…mplete comment end marker. Besides that, ignore all triangular brackets. Adapt unittests to this new behavior.
neslib added a commit that referenced this issue Oct 19, 2023
Suggested fix for issue #11 (comment parsing)
@neslib
Copy link
Owner

neslib commented Oct 19, 2023

@tschroff Thank you for contributing a fix for this issue. It is much appreciated!
I have merged #12

@neslib neslib closed this as completed Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants