Skip remote urls, inject scripts that contain html properly (fixed whitepsace issues) #3

merged 6 commits into from Mar 8, 2014

2 participants


Sorry about that, my editor likes to do that automatically sometimes.


Ah, I see. But the tests are failing?


Apologies again, fixed now :)


One more thing - could you add a test for the new functionality?


k, tests added. I also changed the way the html fix works a bit so that it doesn't translate tags in comments to escape sequences, and added it to the stylesheet processing as well. The way i'm getting cheerio to not process the text is a bit of a hack, but i'm not sure how else to go about it.

@ashaffer ashaffer add tests for non-local url skipping and html in scripts/styles bugs;…
… fix issues with prior implementation of said fixes
@gabrielflorit gabrielflorit merged commit 6281402 into gabrielflorit:master Mar 8, 2014

1 check passed

Details default The Travis CI build passed

Thanks for the pull request - much appreciated. Also, I tried going back to the previous implementation:

 $(element).replaceWith('<style>' + fs.readFileSync(path.join(file.base, href), 'utf8') + '</style>');

instead of:

var el = $('<style></style>').text('tmp');
el[0].children[0].data = fs.readFileSync(path.join(file.base, href), 'utf8');

and found no issues, so I went with the former. But maybe I'm missing something?


Ugh, I swear i'm not ordinarily this incompetent. I didn't fully understand the scope of the issue I was trying to address until now. The correct failing test case for javascript looks like this:

// </script><select>
var str = '';

Similar code appears in comments within angular.js, which is what the real original problem was for me. That being said, none of the things I proposed actually resolve this issue properly. Your original method will cause cheerio to try to fix up the tags, whereas mine will leave the text umodified, but of course break the actual html.

The problem is that the html spec doesn't recognize javascript comment syntax. I'm not sure what the right resolution here is.

It seems like the options are:

  • Ignore it, and say its an issue for some pre-processing step to handle
  • Strip comments (although this leaves open the possibility of </script> appearing in a quoted string, such as in an async script loader)
  • Attempt to transform embedded </script>'s into some equivalent non-ambiguous string. E.g. ('</scri' + 'pt>')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment