Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calls that iterate over nodes with HTMLCollection should not mutate the collection #280

Closed
sakoht opened this issue Mar 27, 2016 · 5 comments

Comments

@sakoht
Copy link

sakoht commented Mar 27, 2016

We (@chiua and I) noticed that only half of the scripts in our test page were being removed.

The issue is that getElementsByTagName returns an HTMLCollection, and when you iterate over it, and the loop mutates it, the iterator gets confused. In our case there were 38 scripts and only 19 (every other) were removed.

Prime example:
https://github.com/mozilla/readability/blob/master/Readability.js#L1036-L1047

_removeScripts: function(doc) {
    this._forEachNode(doc.getElementsByTagName('script'), function(scriptNode) {
      scriptNode.nodeValue = "";
      scriptNode.removeAttribute('src');

      if (scriptNode.parentNode)
        scriptNode.parentNode.removeChild(scriptNode);
    });

We saw this on Chrome but tested on Firefox and saw the same thing.

It seems like lots of code iterates over elements this way, and it might cause problems in other places.

We fixed it by updating _forEachNode to copy the HTMLCollection into a regular array. Glad to put up a PR for this, but we figured someone might have another solution in-progress.

@gijsk
Copy link
Contributor

gijsk commented Mar 28, 2016

Not in progress, but wouldn't it be easier to make the iteration work back-to-front for cases like this? I'd be somewhat worried about the performance of making an Array copy of (for some uses of _forEachNode, quite large!) node lists.

@sakoht
Copy link
Author

sakoht commented Mar 28, 2016

Yep. That would work in all scenarios I've seen.

@sakoht
Copy link
Author

sakoht commented Mar 28, 2016

If any loop modifies the DOM in more extensive ways there could still be side-effect problems (i.e. insertion of new nodes of the type you iterate over), but I didn't notice any cases like that.

You guys know the whole codebase. Ideally you only do my shallow copy in those cases, if they exist.

@gijsk
Copy link
Contributor

gijsk commented Mar 29, 2016

Well, it basically boils down to auditing the callsites. I'm technically still on holiday today, so I'll have a proper look later this week, but it looks like at least this:

readability/Readability.js

Lines 238 to 251 in c3c91a7

var links = articleContent.getElementsByTagName("a");
this._forEachNode(links, function(link) {
var href = link.getAttribute("href");
if (href) {
// Replace links with javascript: URIs with text content, since
// they won't work after scripts have been removed from the page.
if (href.indexOf("javascript:") === 0) {
var text = this._doc.createTextNode(link.textContent);
link.parentNode.replaceChild(text, link);
} else {
link.setAttribute("href", toAbsoluteURI(href));
}
}
});

Also modifies the set of things that would be in the NodeList (in the JS-URI <a> case).

The annoying thing is that Firefox for desktop and Android both use the JSDOMParser DOM backend for perf reasons, and the tests use both that and the npm jsdom module, both of which have array-based NodeList implementations, so this was never caught before - and also, we can't rely on the dynamic behaviour in the removal case. That is, with live NodeLists, if you know you're going to just be removing the items, I wouldn't be surprised that the most performant would be something like:

var list = doc.getElementsByTagName("script");
while (list[0]) {
  list[0].remove();
}

which would work on a modern Gecko/Blink/Webkit DOM implementation, but would infinite-loop on the array-based ones. :-\

gijsk added a commit that referenced this issue Mar 31, 2016
@gijsk
Copy link
Contributor

gijsk commented Jul 18, 2016

Fixed by #300.

@gijsk gijsk closed this as completed Jul 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants