Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not write out invalid characters to XML file. #148

Closed
wants to merge 1 commit into from
Closed

Do not write out invalid characters to XML file. #148

wants to merge 1 commit into from

Conversation

rodrigc
Copy link
Contributor

@rodrigc rodrigc commented Oct 1, 2015

For special characters such as: <, >, or ', write them out
to the file as:
    &#[decimal ASCII value];

Character     Written to    Appears in Jenkins
to encode     XML file      test result viewer
=========     ==========    ==================
  <            &lt;             <
  >            &gt;             >
  '            &apos;           '
  &            &amp;            &
  "            &quot;           "

For characters which are listed as RestrictedChar
in http://www.w3.org/TR/xml11/#charsets , these characters
are completely invalid XML, and cannot even be escaped.

Character     Written to    Appears in Jenkins
to encode     XML file      test result viewer
=========     ==========    ==================
  0x08         &amp;#8;          &#8;
  0x1F         &amp;#31;         &#31;

This will at least generate a valid XML file,
but let us see where these restricted characters would
appear in the output.

Fixes #136

} else if (c == '>') {
quoted << "&gt;";
} else if (c == '\'') {
quoted << "&apos;";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why bother respecting all these entities? Why not just use the decimal ASCII value in all cases? The latter would keep the code simpler, and the resulting XML wouldn't depend on the definition of the entities, which are not available in a "plain" XML document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I respected those entities, because the old code did that.
Changing this behavior to emit ASCII for these characters would be a change
in behavior, and would require updating the utils/text/operations_test:escape_xml__some_escaping unit test.
I didn't want to go down that path, but if you want that, and allow me to change the unit test, I can do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I guess it makes sense to leave as you did. I read the spec links you provided and it seems that there is a clear difference between the "special characters" and everything else.

I personally like the named entities better for readability, but the introduction of two behaviors (named entities for these, numeric for the rest) is a bit unfortunate.

@jmmv
Copy link
Member

jmmv commented Oct 18, 2015

This change is missing tests. The existing tests for escape_xml need to be extended to ensure that a variety of non-alphanumeric characters are properly handled.

And super-nitpick: the change description does not follow the guidelines in CONTRIBUTING.md:

  • Follow standard Git commit message guidelines. The first line has a maximum length of 50 characters, does not terminate in a period, and has to summarize the whole commit. [...]

uqs pushed a commit to freebsd/freebsd-src that referenced this pull request Nov 18, 2015
They are emitting characters which are triggering
a kyua bug which causes kyua to emit invalid XML.
This invalid XML is causing false failures in Jenkins.

On a separate note, kyua needs to be fixed with this:
  freebsd/kyua#148
or something similar.


git-svn-id: svn+ssh://svn.freebsd.org/base/head@291015 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
uqs pushed a commit to freebsd/freebsd-src that referenced this pull request Nov 18, 2015
They are emitting characters which are triggering
a kyua bug which causes kyua to emit invalid XML.
This invalid XML is causing false failures in Jenkins.

On a separate note, kyua needs to be fixed with this:
  freebsd/kyua#148
or something similar.
// as '&amp;#[decimal ASCII value];'
// so that in the XML file we will see the escaped
// character.
quoted << "&amp;#" << int(*it) << ";";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cast would better be static_cast< std::string::size_type >(*it) to not make any assumptions on the character/integer sizes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added that.

For special characters such as: <, >, or ', write them out
to the file as:
    &#[decimal ASCII value];

Character     Written to    Appears in Jenkins
to encode     XML file      test result viewer
=========     ==========    ==================
  <            &lt;             <
  >            &gt;             >
  '            &apos;           '
  &            &amp;            &
  "            &quot;           "

For characters which are listed as RestrictedChar
in http://www.w3.org/TR/xml11/#charsets , these characters
are completely invalid XML, and cannot even be escaped.

Character     Written to    Appears in Jenkins
to encode     XML file      test result viewer
=========     ==========    ==================
  0x08         &amp;#8;          &#8;
  0x1F         &amp;#31;         &#31;

This will at least generate a valid XML file,
but let us see where these restricted characters would
appear in the output.

Fixes #136
@@ -90,6 +91,8 @@ ATF_TEST_CASE_BODY(escape_xml__some_escaping)

ATF_REQUIRE_EQ("&quot;&amp;&lt;&gt;&apos;", text::escape_xml("\"&<>'"));
ATF_REQUIRE_EQ("&amp;&amp;&amp;", text::escape_xml("&&&"));
ATF_REQUIRE_EQ("&amp;#8;&amp;#11;", text::escape_xml("\b\v"));
ATF_REQUIRE_EQ("\t&amp;#127;BAR&amp;", text::escape_xml("\t\x7f""BAR&"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of the double "" in the middle of the argument passed to escape_xml? Seems unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed because otherwise the compiler will think that it is \x7fBA

@jmmv
Copy link
Member

jmmv commented Nov 19, 2015

Committed.

I did a manual merge to add a proper entry to NEWS as part of your commit and to drop trailing whitespace from the code (which GitHub helpfully did not show, but I saw on the command line).

@jmmv jmmv closed this Nov 19, 2015
uqs pushed a commit to freebsd/freebsd-src that referenced this pull request Dec 1, 2015
kyua 0.12 has fix for freebsd/kyua#148
which eliminates invalid XML characters from being written to test reports
with "kyua report-junit".


git-svn-id: svn+ssh://svn.freebsd.org/base/head@291616 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
uqs pushed a commit to freebsd/freebsd-src that referenced this pull request Dec 1, 2015
kyua 0.12 has fix for freebsd/kyua#148
which eliminates invalid XML characters from being written to test reports
with "kyua report-junit".
bdrewery pushed a commit to bdrewery/freebsd that referenced this pull request Dec 7, 2015
They are emitting characters which are triggering
a kyua bug which causes kyua to emit invalid XML.
This invalid XML is causing false failures in Jenkins.

On a separate note, kyua needs to be fixed with this:
  freebsd/kyua#148
or something similar.


git-svn-id: svn+ssh://svn.freebsd.org/base/head@291015 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
bdrewery pushed a commit to bdrewery/freebsd that referenced this pull request Dec 7, 2015
kyua 0.12 has fix for freebsd/kyua#148
which eliminates invalid XML characters from being written to test reports
with "kyua report-junit".


git-svn-id: svn+ssh://svn.freebsd.org/base/head@291616 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
@rodrigc rodrigc deleted the xml-encode branch February 19, 2019 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants