A regex that tokenizes CSS.
JavaScript CoffeeScript
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
test Merge <operator> and <punctuation> into <punctuator> Feb 26, 2015
.gitignore css-tokens v0.4.0 (sync with js-tokens@0.3.0) Dec 19, 2014
.travis.yml
LICENSE css-tokens v0.4.1 Jan 8, 2015
changelog.md css-tokens v1.0.1 Jun 20, 2015
generate-index.js
index.js
package.json css-tokens v1.0.1 Jun 20, 2015
readme.md Document the <name> type Feb 26, 2015
regex.coffee

readme.md

Overview Build Status

A regex that tokenizes CSS.

var cssTokens = require("css-tokens")

var cssString = ".foo{prop: foo;}\n..."

cssString.match(cssTokens)
// [".foo", "{", "prop", ":", " ", "foo", ";", "}", "\n", ...]

Installation

  • npm install css-tokens
var cssTokens = require("css-tokens")

Usage

cssTokens

A regex with the g flag that matches CSS tokens.

The regex always matches, even invalid CSS and the empty string.

The next match is always directly after the previous.

var token = cssTokens.matchToToken(match)

Takes a match returned by cssTokens.exec(string), and returns a {type: String, value: String} object. The following types are available:

  • string
  • comment
  • number
  • unquotedUrl
  • name
  • punctuator
  • whitespace
  • invalid

Comments and strings also have a closed property indicating if the token was closed or not (see below).

Strings come in two flavors. To distinguish them, check if the token starts with ' or ".

Names may start with @ (as in at-rule names), . (as in class selectors) and # (as in id selectors and hex colors).

For example usage, please see this gist.

Invalid code handling

Unterminated strings are still matched as strings. CSS strings cannot contain (unescaped) newlines, so unterminated strings simply end at the end of the line.

Unterminated multi-line comments are also still matched as comments. They simply go on to the end of the string.

Unterminated unquoted urls are also still matched as unquoted urls. They continue as long as there are valid characters.

Invalid ASCII characters have their own capturing group.

Limitations

Tokenizing CSS using regexes—in fact, one single regex—won’t be perfect. But that’s not the point either.

Quoted vs. unquoted urls

The following is hardly a “limitation”, but could be mentioned:

url(http://www.w3.org/2000/svg)
url('http://www.w3.org/2000/svg')

The first line is matched as one single token (unquotedUrl), while the second is matched as four (name + punctuator + string + punctuator). This could be fixed, but isn’t to simplify the regex.

License

The X11 (“MIT”) License.