Implement DOMParser and XMLSerializer classes #1368

lehni · 2016-01-27T08:24:09Z

All modern browsers expose these two useful classes with both a very simple API:

https://developer.mozilla.org/en/docs/Web/API/DOMParser
https://developer.mozilla.org/en/docs/XMLSerializer

Without knowing the details, I believe it shouldn't be too hard to implement these as thin shim to what jsdom already has available.

I am currently using quickly hacked together shims in paper.js that look like the code below.

I am happy to work on a better version that can find its way into jsdom, if you think this would be useful and could give me some pointers as to where to start looking.

function XMLSerializer() {
}

XMLSerializer.prototype.serializeToString = function(node) {
    var text = jsdom.serializeDocument(node);
    // Fix a jsdom issue where all SVG tagNames are lowercased:
    // https://github.com/tmpvar/jsdom/issues/620
    var tagNames = ['linearGradient', 'radialGradient', 'clipPath', 'textPath'];
    for (var i = 0, l = tagNames.length; i < l; i++) {
        var tagName = tagNames[i];
        text = text.replace(
            new RegExp('(<|</)' + tagName.toLowerCase() + '\\b', 'g'),
            function(all, start) {
                return start + tagName;
            });
    }
    return text;
};

function DOMParser() {
}

DOMParser.prototype.parseFromString = function(string, contenType) {
    var div = document.createElement('div');
    div.innerHTML = string;
    return div.firstChild;
};

domenic · 2016-01-27T23:19:45Z

Sure, it would be great to have help implementing this!

Let's do this one at a time: one PR for DOMParser, one for XMLSerializer.

For DOMParser, the spec is at https://w3c.github.io/DOM-Parsing/#the-domparser-interface. You'd need to introduce new IDL (probably a new folder, domparsing, next to window/ and nodes/ and friends). Then write the impl class. You can peruse recent commits that introduced Location and History in the same fashion to see how that's done.

The actual implementation does indeed look not too complicated. We'd ideally want to do things a bit more in depth than your version though. I think the method will end up looking similar to jsdom.jsdom, actually! in lib/jsdom.js.

lehni · 2016-01-28T02:53:49Z

Thanks for the pointers! For XMLSerializer the specs are also there: https://w3c.github.io/DOM-Parsing/#the-xmlserializer-interface

For the DOMParser, I am not sure I understand your last comment. With jsdom.jsdom, did you mean exports.jsdom = ... inside jsdom.js?

domenic · 2016-01-28T03:26:51Z

Yep! The behavior is actually quite similar: take a string, and depending on another passed argument, create a document parsed as XML or a document parsed as HTML.

lehni · 2016-01-28T04:10:43Z

Alright, sounds good! BTW, I just realized that this is more according to specs:

DOMParser.prototype.parseFromString = function(string, contenType) {
    // Create a new document, since we're supposed to always return one.
    var doc = document.implementation.createHTMLDocument(''),
        body = doc.body,
        last;
    // Set the body's HTML, then change the DOM according the specs.
    body.innerHTML = string;
    // Remove all top-level children (<html><head/><body/></html>)
    while (last = doc.lastChild)
        doc.removeChild(last);
    // Insert the first child of the body at the top.
    doc.appendChild(body.firstChild);
    return doc;
};

kahwee · 2016-02-03T07:58:44Z

This may be helpful: https://developer.mozilla.org/en-US/docs/Web/API/DOMParser

It's under the section "DOMParser HTML extension for other browsers".

Tell me how I can help on this. I'm trying to write some test cases that involve DOMParser.

domenic · 2016-02-04T13:49:34Z

@kahwee I tried to outline how to do a pull request for DOMParser in #1368 (comment). You might want to coordinate with @lehni to make sure you two don't duplicate work.

kahwee · 2016-02-04T15:16:18Z

@domenic I haven't started work on this. I find the harder piece is XMLSerializer which xmldom sort of covers: https://github.com/jindw/xmldom/blob/master/dom.js

It is better to divide the task into 2 pull request.

@lehni How's progress on this?

Thank you both.

It's missing: - <parsererror> elements when parsing XML strings fails - copying the active document's URL Also adds the empty-per-spec XMLDocument interface. Closes #1341. Part of #1368.

domenic · 2016-07-02T19:34:10Z

So DOMParser is now implemented and will be in the next release.

I've looked at a bunch of XML serializers on npm and nobody seems to have put together one that matches the spec. In the meantime https://github.com/cburgmer/xmlserializer seems closest, so we could use that I guess. I opened cburgmer/xmlserializer#8 to get more spec compliance.

lehni · 2016-07-06T09:21:06Z

@domenic many thanks for implementing this! And apologies for dropping the ball on this one. Been swamped with work here...

ianks · 2016-08-12T23:43:35Z

@domenic Does DOMParser in jsdom support parsing XML yet?

domenic · 2016-08-13T00:04:52Z

It supports it as well as jsdom does, which is pretty OKish

ianks · 2016-08-13T00:36:28Z

I attemped to parse a SOAP response with it and it seemed to choke a bit, giving back undefined as a few of the properties on the Document object. I assume this probably has to do with namespaces?

domenic · 2016-08-13T00:59:32Z

Hmm, possibly. Could you give a small-ish testcase?

ianks · 2016-08-15T17:43:28Z

@domenic I'm not sure how to write a full test case here, but this example highlights the undefined keys issue that exists.

var data =
  `<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
      <soap:Header>
          <ns2:ResponseHeader xmlns:ns2="https://adwords.google.com/api/adwords/mcm/v201509" xmlns="https://adwords.google.com/api/adwords/cm/v201509">
              <requestId>1234554321</requestId>
              <serviceName>ManagedCustomerService</serviceName>
              <methodName>get</methodName>
              <operations>1</operations>
              <responseTime>115</responseTime>
          </ns2:ResponseHeader>
      </soap:Header>
      <soap:Body>
          <ns2:getResponse xmlns="https://adwords.google.com/api/adwords/cm/v201509" xmlns:ns2="https://adwords.google.com/api/adwords/mcm/v201509">
              <ns2:rval>
                  <totalNumEntries>2</totalNumEntries>
                  <Page.Type>ManagedCustomerPage</Page.Type>
                  <ns2:entries>
                      <ns2:name>Test1</ns2:name>
                      <ns2:customerId>1234566789</ns2:customerId>
                  </ns2:entries>
                  <ns2:entries>
                      <ns2:name>Test2</ns2:name>
                      <ns2:customerId>987654321</ns2:customerId>
                  </ns2:entries>
              </ns2:rval>
          </ns2:getResponse>
      </soap:Body>
  </soap:Envelope>`;

var document = (new window.DOMParser()).parseFromString(data, "text/xml")

domenic · 2016-08-15T17:45:05Z

And what document object properties are undefined when you do that?

ianks · 2016-08-15T18:09:04Z

The two types which are have undefined keys are: SymbolTreeNode and Document, which I believe correspond to the documentElement and... something else in chrome.

It should be noted that the way I required the files looks like this:

  // is there a better way?
  global.DOMParser = window.DOMParser = require('jsdom/lib/jsdom/living/domparsing/DOMParser-impl').implementation;

domenic · 2016-08-15T18:10:46Z

Oh, yeah, that won't work. You need to use an actual jsdom, e.g.

const DOMParser = jsdom.jsdom().defaultView.DOMParser;

Anyway, I still don't understand what properties you're talking about, exactly. document.SymbolTreeNode and document.Document are undefined in real browsers too, not just jsdom.

ianks · 2016-08-15T18:16:33Z

Thank you for the info. Here is what I meant by SymbolTreeNode and Document types:

domenic · 2016-08-15T18:17:36Z

Weird. But I'd again really appreciate some code. Are you saying that document.undefined === SymbolTreeNode? Or what?

ianks · 2016-08-15T18:18:36Z

Yes exactly!

domenic · 2016-08-15T18:19:35Z

Great. So then it can't be true that document.undefined === Document, so I guess that part of your comment was just a red herring.

domenic · 2016-08-15T18:22:47Z

Here's a test case showing that this is indeed happening: https://tonicdev.com/57b2082567f249120021c588/57b2082567f249120021c589/branches/master I'll open a separate issue.

domenic · 2016-08-15T18:24:06Z

Oh, no, that's not what's happening: https://tonicdev.com/57b2082567f249120021c588/57b2082567f249120021c589/branches/master

The issue is that both document.undefined and window.SymbolTreeNode are undefined. So I don't think there is any bug here. Maybe there is a bug in whatever software you are using to generate those tree views.

ianks · 2016-08-15T18:35:55Z

Yes, I think you are right. Here is a list of all the properties I have when looping through document: https://gist.github.com/ianks/c935930573921025c28c460019ec85ab

I think my original diagnoses was just wrong 😢

Here is my super simple test case:

var data =
  `<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
      <soap:Header>
          <ns2:ResponseHeader xmlns:ns2="https://adwords.google.com/api/adwords/mcm/v201509" xmlns="https://adwords.google.com/api/adwords/cm/v201509">
              <requestId>1234554321</requestId>
              <serviceName>ManagedCustomerService</serviceName>
              <methodName>get</methodName>
              <operations>1</operations>
              <responseTime>115</responseTime>
          </ns2:ResponseHeader>
      </soap:Header>
      <soap:Body>
          <ns2:getResponse xmlns="https://adwords.google.com/api/adwords/cm/v201509" xmlns:ns2="https://adwords.google.com/api/adwords/mcm/v201509">
              <ns2:rval>
                  <totalNumEntries>2</totalNumEntries>
                  <Page.Type>ManagedCustomerPage</Page.Type>
                  <ns2:entries>
                      <ns2:name>Test1</ns2:name>
                      <ns2:customerId>1234566789</ns2:customerId>
                  </ns2:entries>
                  <ns2:entries>
                      <ns2:name>Test2</ns2:name>
                      <ns2:customerId>987654321</ns2:customerId>
                  </ns2:entries>
              </ns2:rval>
          </ns2:getResponse>
      </soap:Body>
  </soap:Envelope>`;

var doc = (new window.DOMParser()).parseFromString(data, "text/xml")
var entries = doc.querySelectorAll('rval > entries');

if (entries.length !== 2) {
  console.log('Invalid number of entries, should be 2, got ' + entries.length);
} else {
  console.log('Passed!');
}

domenic · 2016-08-15T19:12:57Z

That test case "fails" in Chrome and Firefox for me: http://jsbin.com/timizagego/edit?html,console

ianks · 2016-08-15T19:16:08Z

Yeah, Monday got the best of me. Should be 2 entries, not one. I updated the example:

http://jsbin.com/duvavigasi/edit?html,console

domenic · 2016-08-15T19:16:57Z

Ah great! I'll file a separate issue.

ianks · 2016-08-15T19:19:00Z

@domenic Thanks for holding my hand through that one. You are an excellent OSS contributor!

Joris-van-der-Wel · 2016-08-15T22:38:24Z

Thank you for the info. Here is what I meant by SymbolTreeNode and Document types:

I am guessing that your tool is showing symbol keys as "undefined":

const mySymbol = Symbol();
// try inspecting this object in your tool:
const obj = {};
obj[mySymbol] = 123; // your tool would probably display this as {undefined: 123}
// you can also play around with:
Object.getOwnPropertySymbols(obj);

DylanPiercey · 2016-10-22T00:42:28Z

@domenic - great work getting DOMParser included. I was using it though and realized that it doesn't handle syntax errors the same way. You can check out what it should be like here.

Is this possible to add?

domenic · 2016-10-22T01:10:34Z

@DylanPiercey please file a separate issue with a test case. We already attempt to produce parsererror elements when possible, but I think there are some edge cases where it's not working.

DylanPiercey · 2016-10-22T01:22:52Z

@domenic I was mostly just messing arround with stuff like

<div>
    <a></>
</div>

Which didn't throw a parseerror.

I can open a new issue, but if it's already implemented then thats good enough for me.
Do you think you could point me to an actual parse error that is handled by jsdom so that I can use it for some unit testing?

jonnermut · 2016-11-03T05:16:37Z

@domenic is there any way to avoid the lower casing of tag names when using the workaround for XMLSerializer.serializeToString proposed by the original poster?
Started having a look at parse5, and got lost in the api docs...

domenic · 2016-11-03T15:00:57Z

I'm not aware of any, no. We need to use an actual XML serializer, not a HTML one like parse5, to get XML-correct casing.

jonnermut · 2016-11-05T06:40:00Z

I managed to get mostly want using jsdom's DOMParser and https://github.com/cburgmer/xmlserializer

One thorny one was that xmlserializer lower cases tags with the namespaceURI set to 'http://www.w3.org/1999/xhtml', which it seems that jsdom sets on all tags created via Document.createElement.

Here's some rough code illustrating this:

#!/usr/local/bin/node
var jsdom = require("jsdom");
var xmlserializer = require('xmlserializer');

function XMLSerializer() {
}

XMLSerializer.prototype.serializeToString = function(node) {
    return xmlserializer.serializeToString(node);
};

jsdom.env({
  html: `<html><body></body></html>`,
   virtualConsole: jsdom.createVirtualConsole().sendTo(console),
  done: function (err, window) {
     var DOMParser = window.DOMParser
     var xml = '<myTag myAttr="hello"></myTag>'
     var parser = new window.DOMParser();
      var doc = parser.parseFromString(xml, "application/xml");

      // swizzle the doc.createElement func, otherwise <otherElement> comes out as <otherelement>
      doc.createElement = function(localName) { return doc.createElementNS('', localName) }

      var el2 = doc.createElement('otherElement')
      doc.firstChild.appendChild(el2)

      var xmlSerializer = new XMLSerializer()
      console.log(xmlSerializer.serializeToString(doc))

  }
});

Outputs:

<myTag xmlns="null" myAttr="hello"><otherElement/></myTag>

Which is mostly want I want - case preserved - except for the xmlns="null" which I think is a recent bug in xmlserialiser

It's missing: - <parsererror> elements when parsing XML strings fails - copying the active document's URL Also adds the empty-per-spec XMLDocument interface. Closes jsdom#1341. Part of jsdom#1368.

tommedema · 2018-03-15T09:54:31Z

I tried to polyfill XMLParser as suggested:

function XMLSerializer() {
}

XMLSerializer.prototype.serializeToString = function(node) {
    var text = jsdom.serializeDocument(node);
    return text;
};

Unfortunately serializeDocument is no longer part of the API. serialize() only works on the whole document; and it is not possible to create a new document from a HTML node (only from a string). It's not possible to turn a node into a string without a XML parser. Especially if you are trying to serialize a doctype.

My solution was to use xmldom:

import { XMLSerializer } from 'xmldom';
global.XMLSerializer = XMLSerializer;

lehni · 2018-03-16T17:47:55Z

My current version looks like this:

function XMLSerializer() {
};

XMLSerializer.prototype.serializeToString = function(node) {
  if (!node) {
    return '';
  }
  // Fix a jsdom issue where all SVG tagNames are lowercased:
  // https://github.com/tmpvar/jsdom/issues/620
  var text = node.outerHTML;
  var tagNames = ['linearGradient', 'radialGradient', 'clipPath', 'textPath'];
  for (var i = 0, l = tagNames.length; i < l; i++) {
    var tagName = tagNames[i];
    text = text.replace(
      new RegExp('(<|</)' + tagName.toLowerCase() + '\\b', 'g'),
      function(match, start) {
        return start + tagName;
      }
    );
  }
  return text;
};

tommedema · 2018-03-16T18:18:06Z

@lehni that doesn't return doctypes if you pass in document.documentElement as node, even though browsers do

lehni · 2018-03-17T11:12:52Z

I'm not saying it's perfect, but it works for my use-case

teclone · 2018-07-12T08:21:12Z

@domenic, @tommedema, @ianks, @lehni. There is a complete javascript xml-serializer that i created and is avaliable as an npm package. It follows the w3c spec and even added some improvements. its serialization is neat and accurate with 99% test coverage.

domenic · 2018-07-12T15:03:14Z

Oh dear, I wish we'd known about that a month ago! @Sebmaster created our own package, w3c-xmlserializer, and #2282 has the code to integrate it all ready to go... I just need to find some time to work on jsdom -_-.

teclone · 2018-07-12T17:32:03Z

Alright @domenic. sorry it came late.

Sebmaster · 2018-10-29T20:13:31Z

I think this is it! Closing due to the release of v13.

lehni mentioned this issue Apr 4, 2016

Fix #1365: Support image loading through node-canvas module. #1366

Closed

domenic mentioned this issue Jun 9, 2016

support DOMParser #1341

Closed

kentcdodds mentioned this issue Jun 9, 2016

Implement of DOMParser #1519

Closed

domenic added the feature label Jul 2, 2016

etpinard mentioned this issue Jul 20, 2016

Tried downloading plotly.js on node js plotly/plotly.js#757

Closed

domenic mentioned this issue Aug 15, 2016

querySelectorAll does not appear to be namespace aware #1587

Open

etpinard mentioned this issue Sep 2, 2016

Plotly.js requires gl in Electron contexts plotly/plotly.js#891

Closed

mstange mentioned this issue Mar 17, 2017

Should be able to be invoked from the command line and output to a file mstange/svgspritesheet#7

Open

bradoyler mentioned this issue Mar 19, 2017

.svgString() automatically lowercases tags? d3-node/d3-node#13

Closed

Sebmaster closed this as completed Oct 29, 2018

patoncrispy mentioned this issue Dec 19, 2018

XMLSerializer is not a constructor/is undefined jestjs/jest#7537

Closed

curvedriver mentioned this issue Apr 27, 2021

node - Escape unsupported characters like '<' or '>' paperjs/paper.js#1922

Open

Implement DOMParser and XMLSerializer classes #1368

Implement DOMParser and XMLSerializer classes #1368

Comments

lehni commented Jan 27, 2016

domenic commented Jan 27, 2016

lehni commented Jan 28, 2016

domenic commented Jan 28, 2016

lehni commented Jan 28, 2016

kahwee commented Feb 3, 2016

domenic commented Feb 4, 2016

kahwee commented Feb 4, 2016

domenic commented Jul 2, 2016

lehni commented Jul 6, 2016

ianks commented Aug 12, 2016

domenic commented Aug 13, 2016

ianks commented Aug 13, 2016

domenic commented Aug 13, 2016

ianks commented Aug 15, 2016

domenic commented Aug 15, 2016

ianks commented Aug 15, 2016

domenic commented Aug 15, 2016

ianks commented Aug 15, 2016

domenic commented Aug 15, 2016

ianks commented Aug 15, 2016

domenic commented Aug 15, 2016

domenic commented Aug 15, 2016

domenic commented Aug 15, 2016

ianks commented Aug 15, 2016 • edited

domenic commented Aug 15, 2016

ianks commented Aug 15, 2016

domenic commented Aug 15, 2016

ianks commented Aug 15, 2016

Joris-van-der-Wel commented Aug 15, 2016

DylanPiercey commented Oct 22, 2016

domenic commented Oct 22, 2016

DylanPiercey commented Oct 22, 2016 • edited

jonnermut commented Nov 3, 2016

domenic commented Nov 3, 2016

jonnermut commented Nov 5, 2016 • edited

tommedema commented Mar 15, 2018 • edited

lehni commented Mar 16, 2018 • edited

tommedema commented Mar 16, 2018

lehni commented Mar 17, 2018

teclone commented Jul 12, 2018 • edited

domenic commented Jul 12, 2018

teclone commented Jul 12, 2018

Sebmaster commented Oct 29, 2018

ianks commented Aug 15, 2016 •

edited

DylanPiercey commented Oct 22, 2016 •

edited

jonnermut commented Nov 5, 2016 •

edited

tommedema commented Mar 15, 2018 •

edited

lehni commented Mar 16, 2018 •

edited

teclone commented Jul 12, 2018 •

edited