clean-css 4.0 #842

Closed
jakubpawlowicz opened this Issue Dec 16, 2016 · 26 comments

Projects

None yet

5 participants

@jakubpawlowicz
Owner
jakubpawlowicz commented Dec 16, 2016 edited

Hey folks,

So several new commits, I've been working on for a while, just landed in master.
So far there's a much simpler tokenizer & import inliner, while the optimizations are basically the same as in 3.4.

However the latter is going to change as well. Here are some ideas:

  • regroupsimple and advanced optimizations into optimization levels, similar to gcc, i.e. -O0 (no optimizations just import inlining & rebasing), -O1 (default - same as basic previously), -O2 (same as advanced previously but with disabled restructuring) - EDIT: see #850 - DONE
  • ability to turn on/off any optimization in both API and CLI - EDIT: see #850 - DONE
  • whitelist -O1 optimizations based on property name (how to handle shorthands?) instead of trying to blindly apply all optimizations to every and each value - EDIT: it'd nice but have but let's leave it for 4.1.

And some other ideas for 4.0 not covered in tickets yet:

  • change CleanCSS#minify method to minify(styles, sourceMap, callback) or minify(styles, callback) so sourceMap can be passed directly instead of via constructor as it does not belong there - DONE in #857
  • merge processImport and processImportFrom into one imports flag which defaults to local - which means remote imports being disabled by default - DONE as new inline option
  • code in the current master gets rid of root, relativeTo, target options in favor of just a single option rebaseTo - this means all URLs has to be relative to the current directory, or, in case of imported files, relative to the imported file. All absolute URLs are left as they are. See da45a94. I wonder if that's enough. There was once an idea to allow specifying target value as a function, so maybe allowing rebaseTo to be a function is a way to cover many use cases not covered above?

Feedback is more than welcome!
Jakub

@scniro scniro referenced this issue in scniro/gulp-clean-css Dec 16, 2016
Closed

rebase to absolute relative to root fails since 2.1.2 #22

@ngyikp
ngyikp commented Dec 16, 2016

Not sure if you want individual bug reports even for pre-releases, but here is a couple of bugged selectors:

Input:

.a[title="a b c's d e f"] {background:#000;}
.b[data-json='"aaaa":"bbbb"'] {background:#111;}

Terminal:

Ngs-Applejack:test ngyikp$ npm install git://github.com/jakubpawlowicz/clean-css
/Users/ngyikp/Desktop/test
└─┬ clean-css@4.0.0-pre  (git://github.com/jakubpawlowicz/clean-css.git#a18f86c69fd9e5cc08bf52c501f040554f8017c3)
  ├─┬ commander@2.9.0 
  │ └── graceful-readlink@1.0.1 
  └── source-map@0.5.6 

Output:

.a[title="a b c'sdef"]{background:#000}
.b[data-json='aaaa:bbbb']{background:#111}
@ben-eb
ben-eb commented Dec 16, 2016 edited

It might be worth switching to a Promise based interface instead of using callbacks. The upcoming async/await makes the async usage much nicer as a result:

async function minify (styles) {
  const cleancss = new CleanCSS();
  const result = await cleancss.minify(styles);
  console.log(result);
}
@jakubpawlowicz
Owner

@ngyikp thanks, much appreciated! I'll open a new issue unless you want to do it.
@ben-eb I think we should do it 👍

@madwizard-thomas

code in the current master gets rid of root, relativeTo, target options in favor of just a single option rebaseTo - this means all URLs has to be relative to the current directory, or, in case of imported files, relative to the imported file.
Does the code then still rewrite relative (image) URLs when supplying filenames to the CLI or using a file hash in the code? Otherwise including files from different directories with relative URLs to images or fonts will be impossible.

I think the existing options are really confusing and what complicates things is that clean-css rebases in two different stages (in the code linked above and in the main code).
But you probably will lose some flexibility when just using a rebaseTo. It might be usefull to rewrite to absolute URLs if the CSS will later be inlined in a html page of which the exact location is not known (virtual URLs).

@alexlamsl
Contributor
alexlamsl commented Dec 19, 2016 edited

@jakubpawlowicz just tried your master branch with html-minifier - apart from some odd test failures due to parsing difference on invalid inputs, the web version of our tests failed due to lack of process.hrtime()

@alexlamsl
Contributor

I can confirm the web version will work if this line and this line are removed.

@jakubpawlowicz
Owner

@madwizard-thomas they were indeed confusing and, unfortunately, a big technical debt since old times. I wonder if a static & dynamic (callback) rebaseTo would be enough. It may not be from CLI point of view where there will be no dynamic interface.

@alexlamsl thanks for checking! I'll make sure there's a web-friendly version of those.

@madwizard-thomas

In your opinion is clean-css meant to be used to concatenate multiple source files (besides minifying them) or is this more of an afterthought? Because even the readme.md gives an example using 'cat' to concatenate the files rather than passing them to clean-css.
Separate concatenation does have its issues though. Concatenating before using clean-css usually breaks all relative urls because no rebasing is done (with gulp we could use gulp-concat-css which rebases but it doesn't support sourcemaps). Concatenation after cleancss resolves that but prevents clean-css from doing cross source file optimizations like selector merging.

@jakubpawlowicz jakubpawlowicz added a commit that referenced this issue Dec 27, 2016
@jakubpawlowicz Fixes #785 - adds `@font-face` de-duplication.
Why:

* Extra `@font-face` rules can be safely removed.
* There's no API/CLI switch to turn it off as it's pending
  a refactoring in v4 - see #842.
ccb8af4
@alexlamsl
Contributor

Ran html-minifier tests on https://github.com/jakubpawlowicz/clean-css/tree/e2fe4d0e80a29fb0b9082f36aa2b52c1447887e2 - generally looks good.

Except for the curious case of ]]>, which clean-css seems to recognise it as a selector of some sort:

p.h{background:red}<![CDATA[\np.i{background:red}\n]]>p.j{background:red}

got minified to:

]]>p.j,p.h{background:red}
@alexlamsl
Contributor

Similar issues with -->:

p.h{background:red}<!--\np.i{background:red}\n-->p.j{background:red}

Output:

-->p.j,p.h{background:red}
@jakubpawlowicz
Owner

Thanks @alexlamsl - it's an interesting case with <!-- ... --> as major browsers seem to ignore the comments, see: https://jsfiddle.net/vv6h5hfz/ - should we strip those HTML comments? How they are used in html-minifier?

@alexlamsl
Contributor

@jakubpawlowicz your jsfiddle example gave me a blue background on IE11, so does not seem to ignore the rules within <!-- ... -->

I've tested with the following:

  • Red
<style>body{background:red}</style>
  • Blue
<style>body{background:red}<!--body{background:blue}--></style>
  • Green
<style>body{background:red}<!--body{background:blue}-->body{background:green}</style>

But that aside, what I was getting at was the --> or ]]> seem to be treated as tokens of some sort by clean-css, which I'm not sure if the behaviour is as intended.

@jakubpawlowicz
Owner

It definitely does not behave properly in the current master. If we get rid of <!-- and --> (as browsers do) then some optimizations can be applied, e.g. body{background:red}<!--body{background:blue}-->body{background:green} -> body{background:green}.

@alexlamsl
Contributor

@jakubpawlowicz thanks for the clarification 👍

@alexlamsl
Contributor

I've got a quick patch which grabs only the trailing character sequence that constitutes valid selector:

diff --git a/lib/optimizer/tidy-rules.js b/lib/optimizer/tidy-rules.js
index a57fec7..476f7ea 100644
--- a/lib/optimizer/tidy-rules.js
+++ b/lib/optimizer/tidy-rules.js
@@ -2,6 +2,7 @@ var Marker = require('../tokenizer/marker');
 var formatPosition = require('../utils/format-position');

 var RELATION_PATTERN = /[>\+~]/;
+var INVALID_FIRST_CHARACTER_PATTERN = /[|>+~-]/;
 var WHITESPACE_PATTERN = /\s/;

 var LESS_THAN = '<';
@@ -10,7 +11,7 @@ var STAR_FIRST_CHILD_PLUS_HTML_HACK = '*:first-child+html ';

 function hasInvalidCharacters(value) {
   var isEscaped;
-  var isInvalid = false;
+  var isInvalid = 0;
   var character;
   var isQuote = false;
   var i, l;
@@ -23,11 +24,9 @@ function hasInvalidCharacters(value) {
     } else if (character == Marker.SINGLE_QUOTE || character == Marker.DOUBLE_QUOTE) {
       isQuote = !isQuote;
     } else if (!isQuote && (character == Marker.CLOSE_BRACE || character == Marker.EXCLAMATION || character == LESS_THAN || character == Marker.SEMICOLON)) {
-      isInvalid = true;
-      break;
-    } else if (!isQuote && i === 0 && RELATION_PATTERN.test(character)) {                                              -      isInvalid = true;
-      break;
+      isInvalid = i + 1;
+    } else if (!isQuote && i === isInvalid && INVALID_FIRST_CHARACTER_PATTERN.test(character)) {
+      isInvalid = i + 1;
     }

     isEscaped = character == Marker.BACK_SLASH;
@@ -146,9 +145,13 @@ function tidyRules(rules, removeUnsupported, adjacentSpace, beautify, warnings)
     var rule = rules[i];
     var reduced = rule[1];

-    if (hasInvalidCharacters(reduced)) {
+    var validFrom = hasInvalidCharacters(reduced);
+    if (validFrom === reduced.length) {
       warnings.push('Invalid selector \'' + rule[1] + '\' at ' + formatPosition(rule[2][0]) + '. Ignoring.');
       continue;
+    } else if (validFrom > 0) {
+      warnings.push('Invalid characters \'' + reduced.slice(0, validFrom) + '\' at ' + formatPosition(rule[2][0]) + '. Ignoring.');
+      reduced = reduced.slice(validFrom);
     }

     reduced = removeWhitespace(reduced, beautify);
@alexlamsl
Contributor
alexlamsl commented Jan 7, 2017 edited

Upon further testing, looks like the only character sequence the browser is ignoring are <!-- or --> (even in successive repetition) but not the others.

So here's a narrower patch for the issue:

diff --git a/lib/optimizer/tidy-rules.js b/lib/optimizer/tidy-rules.js
index a57fec7..b2db50f 100644
--- a/lib/optimizer/tidy-rules.js
+++ b/lib/optimizer/tidy-rules.js
@@ -144,7 +144,10 @@ function tidyRules(rules, removeUnsupported, adjacentSpace, beautify, warnings)

   for (var i = 0, l = rules.length; i < l; i++) {
     var rule = rules[i];
-    var reduced = rule[1];
+    var reduced = rule[1].replace(/^(?:(?:<!--|-->)\s*)+/, function(match) {
+      warnings.push('HTML comment \'' + match + '\' at ' + formatPosition(rule[2][0]) + '. Removing.');
+      return '';
+    });

     if (hasInvalidCharacters(reduced)) {
       warnings.push('Invalid selector \'' + rule[1] + '\' at ' + formatPosition(rule[2][0]) + '. Ignoring.');
@alexlamsl
Contributor

With the patch above I managed to get satisfactory test results from html-minifier and no new test failures on clean-css (850's branch).

Two existing workarounds got removed as well - options.advanced = false and special handling of <!-- css -->. 😀

Updated https://github.com/kangax/html-minifier/tree/clean-css-4 to reflect the current state of things.

@jakubpawlowicz
Owner

Thanks again @alexlamsl - I'll add some tests and commit it to master - credits go to you.

@jakubpawlowicz jakubpawlowicz added a commit that referenced this issue Jan 8, 2017
@jakubpawlowicz See #842 - removes HTML comments from selectors.
Kudos to @alexlamsl for the fix.

Why:

* Apparently anything wrapped between HTML comments still gets
  processed by browsers.
dedb3bc
@jakubpawlowicz
Owner

@alexlamsl I'd like to attribute the commit to you (as author) - if you'd like to have it please drop me your email at ${ see my email in package.json }.

@jakubpawlowicz jakubpawlowicz added a commit that referenced this issue Jan 8, 2017
@alexlamsl @jakubpawlowicz alexlamsl + See #842 - removes HTML comments from selectors.
Kudos to @alexlamsl for the fix.

Why:

* Apparently anything wrapped between HTML comments still gets
  processed by browsers.
e59a5aa
@jakubpawlowicz
Owner

Here's one more idea for 4.0 - what if you could pass a callback as an option to transform any property? This would solve:

  • rewriting / resolving URLs as one could update them by hand if nothing else works
  • dropping properties if needed
  • rewriting some values if needed

For example:

new CleanCSS({
  transformProperty: function (name, value) {
    if (name == 'background-image' && value.indexOf('invalid/path') > -1) {
      return value.replace('invalid/path', 'valid/path');
    }
  }
}).minify(...)

or

new CleanCSS({
  transformProperty: function (name, value) {
    if (name.indexOf('-o-') === 0) {
      return false; // drop a property
    }
  }
}).minify(...)
@alexlamsl
Contributor

@jakubpawlowicz FYI, this proposed feature should allow html-minifier to optimise URLs within style sheets which is done via a preprocessing hack atm

@jakubpawlowicz
Owner

Cool, I think it'd be quite useful for any custom stuff.

@jakubpawlowicz
Owner

Closing this ticket as 4.0 is feature ready. Thanks @ngyikp @ben-eb @madwizard-thomas @alexlamsl for your valuable input! 💯

@alexlamsl
Contributor

@jakubpawlowicz thank you for your hard work and looking forward to 4.0 👍

@jakubpawlowicz
Owner

Thanks @alexlamsl - looking forward to seeing it in html-minifier!

@jakubpawlowicz
Owner

4.0 is out. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment