Design Doc: CSS Parsing

CSS Parsing

Joshua Marantz, November 2010

Many rewrite optimizations manipulate CSS content either inlined in the HTML document or linked as an external resource. Some simple optimizations appear to be possible without parsing the CSS content. For example, we currently have filters to outline and combine CSS which work (almost) by simply copying or concatenating the content blindly. However, as we try to do more advanced optimizations, more intelligent analysis of CSS will be necessary.

In fact, even for simple optimizations like CSS combination, we have run into problems requiring more intelligent parsing: CSS files may contain URLs for various reasons (notably setting background images and @importing other CSS documents) and these URLs may be relative in which case they are relative to the location of the CSS document (http://www.w3.org/TR/CSS21/syndata.html#uri). Therefore if we are combining the CSS files into a different directory, we must also update the URLs to remain accurate.

Currently we are updating URLs via basic search and replace. This is reasonably easy because urls are formatted in a pretty unambiguous way (eg: url(http://www.google.com) ). However, as with any context-free method, this is susceptible to errors. For example, I believe our current implementation would choke on:

#a { color: green }  /* url(...  */
#b { color: blue }    /*      ...) */

because it does not know about comments.

In addition to rewriting URLs there are many other optimizations that we would like to run that will require more awareness of the syntactic structure and meaning of CSS therefore we intend to fully parse CSS files.

Some simple rewrites will only require syntactic parsing of the CSS. Here are some examples:

Minification - Removing whitespace and comments. Much of the whitespace and all comments in CSS are unnecessary. (We should be careful about which whitespace is unnecessary and whether JavaScript ever inspects CSS files)
@import CSS combining - We are already combining CSS files referenced directly on a website, but many CSS files begin with @import statements for importing other CSS files. These CSS files could be automatically expanded and combined as well. (We should be careful to expand these in the correct order)
Rewriting images - We currently rewrite images found on HTML pages. Images may also be referenced from CSS (e.g. background images) and could be rewritten there as well.

Some more advanced rewrites will require semantic understanding of the CSS and the HTML it is being used on. Examples include:

Stripping unused rules - Often in large websites, CSS files are large and include unused cruft. Rules which would never be applied to any element on a page are unnecessary and can be removed. (We must be cautious, because JavaScript can change the class of nodes and cause a rule to be needed)
Renaming classes - To save bytes, CSS class names could be shortened. All references to the class would also need to be changed (in the HTML and JavaScript).
Simplifying rules - It has been claimed that applying styles to a page takes a significant amount of processor time. If this is because of unnecessarily complicated CSS rules (e.g. too many selectors), we could try to simplify rules automatically.
Re-factoring rules - To reduce redundancy, factor out the common code in rules. This should not require much html or javascript analysis because we can leave the class names the same and just change the way it is coded in CSS.

Challenges we will face:

JavaScript - First and foremost, JavaScript complicates any rewrite we do to a website. Now JS can rewrite whole sections of HTML and require the entire document to re-flow (and thus causing problems for most of the advanced optimizations above). But even less invasive JS could cause problems. For example, a standard way to use JS is to change the classes of elements in reaction to the user. Therefore, many rules that appear to be unused at page load will actually be used upon e.g. a click.
CSS flowing rules are complicated - Finding out which rules apply, in what order and how that order affects the actual styling attributes applied to DOM nodes is, apparently, rather complicated. We will need to extensively test our implementation to make sure that it is applying rules correctly (or rather, consistently with browsers :)

We have open-sourced webutil/css/parser.h and included it in our open-source distribution. It currently supports minification.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design Doc: CSS Parsing

CSS Parsing

Clone this wiki locally