Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception cloning elements #1142

Closed
jwarkentin opened this issue Jun 2, 2015 · 4 comments
Closed

Exception cloning elements #1142

jwarkentin opened this issue Jun 2, 2015 · 4 comments

Comments

@jwarkentin
Copy link

This issue only happens with certain HTML, but if you run the following snippet of code that tries to call cloneNode(true) it will throw a NamespaceError.

var jsdom = require('jsdom');

jsdom.env('http://www.ksl.com/?nid=148&sid=30067208', function(errors, window) {
  window.document.body.cloneNode(true);
});

I know it's some element in the DOM tree causing the exception, but I haven't narrowed it down to which exactly is causing it to throw. Anyway, I don't know that there's any condition where cloning a chunk of the DOM is supposed the throw an exception.

I don't know if it's related but at trying to clone with the following URL will throw a different exception. It throws InvalidCharacterError:

http://www.ksl.com/?sid=18035498

@domenic
Copy link
Member

domenic commented Jun 2, 2015

Nice find. This is almost certainly because we are using the normal constructors/etc. for creating elements or attributes, and those constructors throw such errors in the case of invalid names. However, browsers always use secret construction methods which allow invalid names while cloning. Should be easy to fix once we track down the offending line. PR welcome if you want to dig into it, or I can probably get around to it within a day or two.

@jwarkentin
Copy link
Author

Ok, I wrote up a script and tracked that first error down to an element in the page that looks like this:

<http://KSL.com>

It looks like an invalid element without a closing tag and there are two of them in the same sentence. It looks like the browser just ignores it completely.

For the second error (the InvalidCharacterError) it looks like it's caused by a weird snippet in the HTML at the end of one of the paragraphs of the article that looks like this:

<Home-Schooling</b>

Again, the browser just strips out that invalid HTML and continues on, leaving the rest of the text around it intact.

I don't know if it will be useful to you but here's the script I wrote to track those down:

var jsdom = require('jsdom');

// var url = 'http://www.ksl.com/?nid=148&sid=30067208';
var url = 'http://www.ksl.com/?sid=18035498';

jsdom.env(url, function(errors, window) {
  var errPath = [ window.document.body ];

  for(var i = 0; i < errPath.length; ++i) {
    var curEl = errPath[i],
        cNodes = curEl.childNodes;

    for(var j = 0; j < cNodes.length; ++j) {
      var cNode = cNodes[j];

      try {
        cNode.cloneNode(true);
      } catch(err) {
        errPath.push(cNode);
        break;
      }
    }
  }

  renderPath(errPath);
});

function renderPath(nodes) {
  for(var i = 0; i < nodes.length; ++i) {
    var node = nodes[i],
        spaces = Array(i + 1).join('  '),
        nodeAttrs = node.attributes,
        attrs = [];

    for(var j = 0; j < nodeAttrs.length; ++j) {
      attrs.push(nodeAttrs[j].name + '="' + nodeAttrs[j].value + '"');
    }

    console.log(`${spaces}<${node.tagName}${attrs.length ? ' ' + attrs.join(' ') : ''}>`);
  }
}

@domenic
Copy link
Member

domenic commented Jun 30, 2015

So, 5.5.0 is out with this fix! Sorry it took so long to get a release; normally we're pretty fast about this sort of thing.

@jwarkentin
Copy link
Author

It works great! Thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants