This aims to be a very small and very fast spec compliant css lexer (or scanner or tokenizer depending on your favourite nomenclature).
It is not the fastest, nor is it the smallest, but it chooses to trade size for speed and speed for correctness. Smaller lexers exist but they sacrifice speed and correctness. Faster lexers exist but they sacrifice code size, and the ability to easily run in the browser. More clearly written lexers exist, but usually at the sacrifice of both speed and size. For details on how fast, how small, and how correct, see below.
The applications are quite limited. If you know what CSS is, and you know what a lexer/scanner/tokenizer is, then you probably know why you would want this. If you don't know those things or how you could use them, then this probably won't be helpful for you.
If you're using node.js then running npm i csslex
which will install the
dependency in your node_modules
folder. Then import it with:
import { lex, types, value } from "csslex";
If you're using Deno, then you can try the following line:
import { lex, types, value } from "https://deno.land/x/csslex/mod.ts";
If you're using a Browser, you can import using unpkg or esm.sh:
import { lex, types, value } from "https://esm.sh/csslex";
If you can understand typescript, this will be helpful:
type Token = [type: typeof types[keyof typeof types], start: number, end: number]
lex(css: string): Generator<Token>
The main lex
function takes a css string, and creates an iterable of "Tokens".
Each "Token" is a tuple 3 (an array always with 3 elements inside it). The first
item in the array is the number representing the type, the second is the start
position of that token in the css string, the second is the end of that token in
the string.
So for example:
import { lex, types, value } from "https://esm.sh/csslex";
Array.from(lex("margin: 1px"))[ // -> output
([types.IDENT, 0, 6],
[types.COLON, 6, 7],
[types.WHITESPACE, 7, 8][(types.DIMENSION, 8, 11)])
];
If you want to know the raw value of a token, simply take your original string
and call .slice(start, end)
. However you can also give the string and a token
tuple to value
which will also do extra things like normalise escape
characters and give you structural values:
import { lex, types, value } from "https://esm.sh/csslex";
value("margin: 1px", [types.IDENT, 0, 6]) == "margin";
value("margin: 1px", [types.COLON, 6, 7]) == ":";
value("margin: 1px", [types.DIMENSION, 8, 11]) ==
{ type: "integer", value: 1, unit: "px" };
This uses css-tokenizer-tests
which provides a set of difficult inputs
intended to test the edge cases of the spec.
It also uses "snapshot testing" to avoid regressions, it tokenizes the
postcss-parser-tests
series of css files, as well as open-props
.
@romainmenke maintains a comparison of
CSS tokenizers with scores pertaining to each. csslex
aims to always
achieve a perfect score here, so if you visit the scores page an it does
not have a perfect score, please file an issue!
This package aims to be the smallest minified css tokenizer codebase. Here's a comparison of popular alternatives:
Name | Minified | Gzipped |
---|---|---|
@csstools/tokenizer |
4.1kb | 1.1kb |
csslex (this) |
4.7kb | 1.9kb |
@csstools/css-tokenizer |
15.5kb | 3.4kb |
css-tokenize |
19.1kb | 5.7kb |
parse-css |
16kb | 4.1kb |
css-tree |
157.9kb | 45kb |
You can run node bench.js
to get some benchmark numbers. Here's some I ran on
the machine I developed the library on:
Name | ops/sec |
---|---|
css-tree | 3,080 ops/sec ±0.43% (96 runs sampled) |
csslex (this) | 2,314 ops/sec ±0.45% (93 runs sampled) |
@csstools/css-tokenizer | 1,622 ops/sec ±0.76% (96 runs sampled) |