fix: don't insert 0 at start or end of attribute if whitespace #2036

SethFalco · 2024-06-14T11:22:01Z

Docs

Updates the documentation to clarify explicitly that the plugin impacts whitespace. Internally, it's because we parse all numbers in the attribute into an array, then rewrite it back.

Bug Fix

I also fix a bug that I discovered while testing the issue reported. When an attribute like " 0 0 150 100 " comes up, SVGO inserts a 0 to the start and end of the attribute, completely changing the meaning.

This was because when we do the split, the array we get back starts with an empty string and ends with an empty string, which were then converted to 0, resulting in "0 0 0 150 100 0" instead of "0 0 150 100".

seth@seth-pc-tux:~$ node
Welcome to Node.js v20.11.0.
Type ".help" for more information.
> Number("")
0

Now we filter them out before parsing.

Performance

Just applying a recommendation from SonarLint, use RegExp#exec instead of String#match.

Hmm, I don't see how that'd work.

For clarity, with your proposed regex, the split would be:

'0 0 100 100' ⇒ ['0', ' ', '0', ' ', '100', ' ', '100']

'0 0 100 100' ⇒ [' 0', ' ', '0', ' ', '100', ' ', '100 ']

'0 0 100 100' ⇒ [' 0', ' ', '0', ' ', '100', ' ', '100 ']

I don't think this would be easy with regex alone, as it'd require a negative look behind with a non-fixed width, which isn't possible.

For example, to exclude the spaces at the end, /(?:\s,?\s*|,\s*)(?!\s*$)/g would work. But the same strategy can't be employed to exclude the spaces at the start.

Meanwhile, even if we were to ignore the different results, using either of the proposed regexes performs slower than a simple regex + iteration.

Something that actually would be both sensible and faster would be to trim the attribute instead of iterating after splitting, though.

i.e. node.attributes.viewBox.trim().split(/\s,?\s*|,\s*/g)

const Benchmark = require('benchmark'); const suite = new Benchmark.Suite; const data = [ '0 0 100 100', ' 0 0 100 100 ', ' 0 0 100 100 ', ] function splitV1(viewbox) { return viewbox.split(/\s,?\s*|,\s*/g).filter(v => v.length != 0); }; // v1 x 2,648,671 ops/sec ±0.87% (91 runs sampled) function splitV2(viewbox) { return viewbox.trim().split(/\s,?\s*|,\s*/g); }; // v2 x 2,800,976 ops/sec ±0.53% (93 runs sampled)

Number() is ok to extra spaces, so all we need is to split numbers. The key factor here is to get rid of empty strings at the start and the end. So, to make it work in the middle, I've added \b separator (lookbehind and lookahead also would work, but \b is simpler). I forgot, that split() captures group, so non-capturing group is needed. So, the regex would be like: /\b(?:\s+,?\s*|\s*,\s*)\b/g.

Also, matchAll() can be used just to get numbers, but it will not be spec compliant — any separators would work.

Edit: there is one drawback though: numbers can start with a dot, so instead of \b better to use lookahead at the end: /\b(?:\s+,?\s*|\s*,\s*)(?=[.\d])/g.

Ahh, yeah, that makes sense. And indeed, your proposal works now!

As we have existing test cases that support -, we should add that too.
So /\b(?:\s+,?\s*|\s*,\s*)(?=[-.\d])/g, which includes a - in the look ahead.

Alternatively, we could also just use [^\s] to exclude spaces rather than a list of items to include, which simplifies the regex a little while offering more safety in case we missed something.
So /\b(?:\s+,?|\s*,)\s*(?=[^\s])/g, this also moves the duplicate matcher for \s* to outside of the non-capture group.

const data = [ '0 0 100 100', ' 0 0 100 100 ', ' 0 0 100 100 ', ' 0 0 0.5 .5 ', '20.000001 -19.99999 17.123456 70.708090', ]; const splitV2 = (viewbox) => { return viewbox.trim().split(/\s,?\s*|,\s*/g); }; // v2 x 1,585,072 ops/sec ±1.14% (92 runs sampled) const splitV5 = (viewbox) => { return viewbox.split(/\b(?:\s+,?|\s*,)\s*(?=[^\s])/g); }; // v5 x 1,736,749 ops/sec ±0.43% (93 runs sampled)

[^\s] is just \S

I'd opened a pull request to apply the regex change, but closed it.

I've been getting more into benchmarking, and decided to finally revamp how I benchmark locally, which includes defining more rules and utilities.

One rule, is that functions must have identical input/output. Not "effectively identical", which is what I did before, though there may be some exceptions to that later, like with math/precision. This enforces a more complete benchmark, rather than only testing intermediate steps in an algorithm, where the performance of later steps may be impacted by the results of earlier steps.

For example, I treated "100" and " 100" as the same, because we know that both become 100 when called on Number(). However, this then doesn't account for the performance of Number() itself with the different inputs.

Writing a more complete benchmark has a notable impact on results:

const data = [ '0 0 120 120', '0 0 230 120', '110 0 120 120', '0 0 200.28423 200.28423', '20.000001 -19.99999 17.123456 70.708090', ' 0 0 150 100 ', ' 0 0 0.5 .5 ', ]; function splitV2(viewbox) { return viewbox .trim() .split(/(?:\s,?|,)\s*/g) .map(Number); }; // v2 x 619,413 ops/sec ±0.42% (97 runs sampled) function splitV5(viewbox) { return viewbox .split(/\b(?:\s+,?|\s*,)\s*(?=\S)/g) .map(Number); }; // v5 x 599,606 ops/sec ±0.42% (94 runs sampled)

The difference is minor (4-5%), but I'd argue using trim is simpler to maintain over a more complicated regex (I say this as someone that loves regex.) while offering on-par or better performance over the inputs we'd realistically encounter.

But now we have .filter() instead of .trim(), which probably is worse.

Yeah, .filter() is worse! I still plan to do a PR for this to use trim and to move the \s* outside the group. I was just dropping an update already with the findings.

Before opening another PR, I'm just going to finish evaluating how I benchmark locally so I can handle this better in future, and review other decisions I've made historically based on benchmarks. (Outside of SVGO too.)

SethFalco added 2 commits June 14, 2024 12:17

docs: clarify that cleanup numeric values removes redundant spaces

d501272

fix: don't insert 0 start or end of attribute is whitespace

2c80c72

SethFalco force-pushed the chore-cleanup-numeric-values branch from 440ed4f to 2c80c72 Compare June 14, 2024 11:25

SethFalco changed the title ~~fix: don't insert 0 start or end of attribute is whitespace~~ fix: don't insert 0 at start or end of attribute if whitespace Jun 14, 2024

SethFalco merged commit 5481fc2 into svg:main Jun 14, 2024
12 checks passed

SethFalco deleted the chore-cleanup-numeric-values branch June 16, 2024 11:58

GreLI reviewed Jun 21, 2024

View reviewed changes

This was referenced Jun 30, 2024

refactor(cleanupNumericValues): improve how viewbox is split #2044

Closed

refactor(cleanupNumbericValues): improve how viewbox is split #2046

Merged

@@ @@ -39,6 +39,7 @@ export const fn = (_root, params) => { @@
                       if (node.attributes.viewBox != null) {
                         const nums = node.attributes.viewBox.split(/\s,?\s*|,\s*/g);
                         node.attributes.viewBox = nums
+                          .filter((value) => value.length != 0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: don't insert 0 at start or end of attribute if whitespace #2036

fix: don't insert 0 at start or end of attribute if whitespace #2036

SethFalco commented Jun 14, 2024 •

edited

Loading

GreLI Jun 21, 2024 •

edited

Loading

SethFalco Jun 21, 2024 •

edited

Loading

GreLI Jun 22, 2024 •

edited

Loading

SethFalco Jun 24, 2024 •

edited

Loading

GreLI Jun 25, 2024

SethFalco Jun 30, 2024 •

edited

Loading

GreLI Jun 30, 2024

SethFalco Jun 30, 2024 •

edited

Loading

fix: don't insert 0 at start or end of attribute if whitespace #2036

fix: don't insert 0 at start or end of attribute if whitespace #2036

Conversation

SethFalco commented Jun 14, 2024 • edited Loading

Docs

Bug Fix

Performance

Related

GreLI Jun 21, 2024 • edited Loading

Choose a reason for hiding this comment

SethFalco Jun 21, 2024 • edited Loading

Choose a reason for hiding this comment

GreLI Jun 22, 2024 • edited Loading

Choose a reason for hiding this comment

SethFalco Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

GreLI Jun 25, 2024

Choose a reason for hiding this comment

SethFalco Jun 30, 2024 • edited Loading

Choose a reason for hiding this comment

GreLI Jun 30, 2024

Choose a reason for hiding this comment

SethFalco Jun 30, 2024 • edited Loading

Choose a reason for hiding this comment

SethFalco commented Jun 14, 2024 •

edited

Loading

GreLI Jun 21, 2024 •

edited

Loading

SethFalco Jun 21, 2024 •

edited

Loading

GreLI Jun 22, 2024 •

edited

Loading

SethFalco Jun 24, 2024 •

edited

Loading

SethFalco Jun 30, 2024 •

edited

Loading

SethFalco Jun 30, 2024 •

edited

Loading