Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml content lost #21

Open
missing1984 opened this issue Apr 23, 2014 · 2 comments
Open

xml content lost #21

missing1984 opened this issue Apr 23, 2014 · 2 comments

Comments

@missing1984
Copy link

XML:

original XML, (actually a html snippet)

<html>This Form has
    <b>"searchTemplate"</b>
    under root element.<br/>All Panels wil get data without clicking
    <b>"Search"</b>
    button,
    <b>"autoRun"</b>
    is on
    <br/>Click
    <b>"Search"</b>
    button is reqiured only for
    <b>"time2"</b>
    <p/>
</html>

After process by elementtree.

var result = et.parse(xml);
console.log(result.write());

output:

<html>This Form has
    <b>"searchTemplate"</b>
    <br/>
    <b>"Search"</b>
    <b>"autoRun"</b>
    <br/>
    <b>"Search"</b>
    <b>"time2"</b>
    <p/>
</html>

Some of the content was lost.

@Kami
Copy link
Contributor

Kami commented Apr 23, 2014

Hm, this looks like a bug in the XML parsing library we are using (saxjs). I tried with the latest version of saxjs and the same problem exists. I will dig deeper later on.

In any case, this parser is only meant to be used with valid XML. If you are looking at parsing HTML which is usually not well-formed and valid, you should look at other libraries which are specially designed for parsing HTML.

@missing1984
Copy link
Author

Hi, saxjs support this scenario and i found the element tail actually been parsed in your TreeBuilder.

The problem should be in _serialize_xml function as it only take care the element text but not the element tail. After i add the tail, the serialized output looks correct.

function _serialize_xml(write, elem, encoding, qnames, namespaces, indent, indent_string) {
  var tag = elem.tag;
  var text = elem.text;
  var tail = elem.tail;       << take element tail into consideration.
  var items;
  var i;

append the tail text right after each tag.

  if (tail)
  {
      write(tail);
  }
  if (newlines) {
    write("\n");
  }

missing1984 pushed a commit to missing1984/node-elementtree that referenced this issue May 15, 2014
missing1984 pushed a commit to missing1984/node-elementtree that referenced this issue May 15, 2014
This reverts commit 9f5a0be.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants