-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pedantic white space preservation not supported. #242
Comments
I'm sorry this isn't what you want. The behavior is documented in the README, although the specific case you have isn't clear. (The rule also is applied within an element, and the formatting example could be clearer that it includes both end of line normalization and whitespace.) The best approach for TinyXML-2 has been discussed before, and this is a case where TinyXML-2 is intentionally choosing the generally more useful yet non-compliant behavior. If you want to submit a pull request for a new behavior (PEDANTIC_WHITESPACE maybe?) it would be a worthwhile integration if it doesn't add too much code complexity. |
Next example shows that whitespace only is not preserved:
Gives output:
|
Leaving open in case someone wants to submit a patch for this. TinyXML2 is working as intended; it would need a new whitespace mode to fix. |
I agree that a new whitespace preservation option is needed, because currently legitimate HTML like this, fails to be parsed as expected. This: is printed as: I am trying to patch it myself, but so far, I can't manage to do it, because to work properly, such PEDANTIC_WHITESPACE option requires context knowledge of the surrounding nodes (whitespace should be interpreted as text only if it is inside the |
@ minimum, should support |
@leethomason @jodyp12 @peterbiglr @petko zeux/pugixml#74 shows how https://github.com/zeux/pugixml has a mode that might be helpful to y'all, though it's not precisely |
I've looked at this and created a few supporting unit tests. Latest pull request: #938 IMHO it is a problem just for some rare legacy systems such as ours. It is essential to some but only rare use-cases. As a result rather than relying on current whitespace options, I've created one called PRESERVERRAW_WHITESPACE. White space being just space at present. Seems the only use-case for legacy systems.
I didn't worry about |
In the following xml the whitespace in text, which happens to be a space character, is stripped.
<tspan font-weight="bold"> </tspan>
This happens when XMLNode::ParseDeep() calls XMLDocument::Identify(), which in turn calls XMLUtil::SkipWhiteSpace().
If a character comes anywhere after the whitespace Identify() correctly creates a text element and backs up to the 1st character, correctly keeping the space character along with the following character.
<tspan font-weight="bold"> a</tspan>
The result is that whitespace is not fully preserved in text - which doesn't match the documentation. This example isn't just an exercise, it's an actual shipstopper when reading legacy files in a well known application that has been migrated to tinyxml2.
The text was updated successfully, but these errors were encountered: