Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser hangs for a specific stylesheet #127

Open
rviscomi opened this issue Aug 7, 2019 · 3 comments

Comments

@rviscomi
Copy link

commented Aug 7, 2019

馃憢 hello! I'm planning to use this library as part of a very large research project consisting of ~5M websites. I'm running into an issue for a particular stylesheet: http://crimetool.bocsar.nsw.gov.au/bocsar/css/smartadmin-production.min.css (which I've also archived here)

I've added a bunch of debugging statements to the script to try to figure out why it's hanging and it seems that it's getting tripped up when attempting to trim the string starting with nav ul li.active.open>. In the stylesheet there are what appear to be thousands of tab characters spread over just a few lines.

image

image

Here's the code where I believe the hangup is happening:

css/lib/parse/index.js

Lines 200 to 208 in 64910e8

return trim(m[0])
.replace(/\/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*\/+/g, '')
.replace(/"(?:\\"|[^"])*"|'(?:\\'|[^'])*'/g, function(m) {
return m.replace(/,/g, '\u200C');
})
.split(/\s*(?![^(]*\)),\s*/)
.map(function(s) {
return s.replace(/\u200C/g, ',');
});

Where m[0] is the text containing thousands of tabs starting with nav ul li.active.open>.

Specifically, the str.replace(/^\s+|\s+$/g, '') statement of the trim function:

css/lib/parse/index.js

Lines 572 to 574 in 64910e8

function trim(str) {
return str ? str.replace(/^\s+|\s+$/g, '') : '';
}

What could I do to patch this and make it more resilient to the insane things that happen across millions of websites?

@rviscomi

This comment has been minimized.

Copy link
Author

commented Aug 8, 2019

I was able to get the parser to complete by changing the trim function to use the built-in String.prototype.trim() method as well as capping the number of iterations in the addParent function.

The script is still taking ~5 minutes to run, which is timing out some of my tests but I'll continue to iterate on optimizations. Would be great to update the parser to use modern methods like trim and better handle deeply nested style rules.

@rviscomi

This comment has been minimized.

Copy link
Author

commented Aug 8, 2019

image

A trace of the script execution shows that this statement is taking the vast majority of the execution time: split(/\s*(?![^(]*\)),\s*/) (part of the selector function)

Are there more efficient ways to do this?

Edit: I think what's happening is that the String.prototype.trim() method actually can't handle the thousands of tabs, so now that work has fallen onto the split method where it's now hanging up.

@rviscomi

This comment has been minimized.

Copy link
Author

commented Aug 8, 2019

Ok talking through this by myself in this thread has been both therapeutic and productive 馃槃

The issue with the trim was not that it couldn't handle the exorbitant amount of whitespace, but that at the end of those thousands of tabs was actually a continuation of the selector. Something like this: nav ul li.active.open> [TAB*1million] :before. So there was actually no trailing whitespace to trim!

I fixed this (and sped up the execution time immensely) by collapsing all repeated whitespace into a single space character.

/**
 * Trim `str`.
 */

function trim(str) {
  // Collapse all whitespace, then trim.
  return str ? str.replace(/\s+/g, ' ').replace(/^\s+|\s+$/g, '') : '';
}

I don't think this affects the CSS parsing at all since whitespace really doesn't matter that much AFAIK. Could someone more familiar with this code spot check this to make sure it's valid? I'm happy to submit a PR if it's helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can鈥檛 perform that action at this time.