Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: don't insert 0 at start or end of attribute if whitespace #2036

Merged
merged 2 commits into from
Jun 14, 2024

Conversation

SethFalco
Copy link
Member

@SethFalco SethFalco commented Jun 14, 2024

Docs

Updates the documentation to clarify explicitly that the plugin impacts whitespace. Internally, it's because we parse all numbers in the attribute into an array, then rewrite it back.

Bug Fix

I also fix a bug that I discovered while testing the issue reported. When an attribute like " 0 0 150 100 " comes up, SVGO inserts a 0 to the start and end of the attribute, completely changing the meaning.

This was because when we do the split, the array we get back starts with an empty string and ends with an empty string, which were then converted to 0, resulting in "0 0 0 150 100 0" instead of "0 0 150 100".

seth@seth-pc-tux:~$ node
Welcome to Node.js v20.11.0.
Type ".help" for more information.
> Number("")
0

Now we filter them out before parsing.

Performance

Just applying a recommendation from SonarLint, use RegExp#exec instead of String#match.

Related

@SethFalco SethFalco force-pushed the chore-cleanup-numeric-values branch from 440ed4f to 2c80c72 Compare June 14, 2024 11:25
@SethFalco SethFalco changed the title fix: don't insert 0 start or end of attribute is whitespace fix: don't insert 0 at start or end of attribute if whitespace Jun 14, 2024
@SethFalco SethFalco merged commit 5481fc2 into svg:main Jun 14, 2024
12 checks passed
@SethFalco SethFalco deleted the chore-cleanup-numeric-values branch June 16, 2024 11:58
@@ -39,6 +39,7 @@ export const fn = (_root, params) => {
if (node.attributes.viewBox != null) {
const nums = node.attributes.viewBox.split(/\s,?\s*|,\s*/g);
node.attributes.viewBox = nums
.filter((value) => value.length != 0)
Copy link
Member

@GreLI GreLI Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easier and more performant to amend regex to something like /\b(\s+,?\s*|,\s*)\b/g

Copy link
Member Author

@SethFalco SethFalco Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't see how that'd work.

For clarity, with your proposed regex, the split would be:

  • '0 0 100 100'['0', ' ', '0', ' ', '100', ' ', '100']
  • '0 0 100 100'[' 0', ' ', '0', ' ', '100', ' ', '100 ']
  • '0 0 100 100'[' 0', ' ', '0', ' ', '100', ' ', '100 ']

I don't think this would be easy with regex alone, as it'd require a negative look behind with a non-fixed width, which isn't possible.

For example, to exclude the spaces at the end, /(?:\s,?\s*|,\s*)(?!\s*$)/g would work. But the same strategy can't be employed to exclude the spaces at the start.

Meanwhile, even if we were to ignore the different results, using either of the proposed regexes performs slower than a simple regex + iteration.

Something that actually would be both sensible and faster would be to trim the attribute instead of iterating after splitting, though.

i.e. node.attributes.viewBox.trim().split(/\s,?\s*|,\s*/g)

const Benchmark = require('benchmark');
const suite = new Benchmark.Suite;

const data = [
  '0 0 100 100',
  ' 0 0 100 100 ',
  '  0  0  100  100  ',
]

function splitV1(viewbox) {
  return viewbox.split(/\s,?\s*|,\s*/g).filter(v => v.length != 0);
};
// v1 x 2,648,671 ops/sec ±0.87% (91 runs sampled)

function splitV2(viewbox) {
  return viewbox.trim().split(/\s,?\s*|,\s*/g);
};
// v2 x 2,800,976 ops/sec ±0.53% (93 runs sampled)

Copy link
Member

@GreLI GreLI Jun 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Number() is ok to extra spaces, so all we need is to split numbers. The key factor here is to get rid of empty strings at the start and the end. So, to make it work in the middle, I've added \b separator (lookbehind and lookahead also would work, but \b is simpler). I forgot, that split() captures group, so non-capturing group is needed. So, the regex would be like: /\b(?:\s+,?\s*|\s*,\s*)\b/g.

Also, matchAll() can be used just to get numbers, but it will not be spec compliant — any separators would work.

Edit: there is one drawback though: numbers can start with a dot, so instead of \b better to use lookahead at the end: /\b(?:\s+,?\s*|\s*,\s*)(?=[.\d])/g.

Copy link
Member Author

@SethFalco SethFalco Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, yeah, that makes sense. And indeed, your proposal works now!

As we have existing test cases that support -, we should add that too.
So /\b(?:\s+,?\s*|\s*,\s*)(?=[-.\d])/g, which includes a - in the look ahead.

Alternatively, we could also just use [^\s] to exclude spaces rather than a list of items to include, which simplifies the regex a little while offering more safety in case we missed something.
So /\b(?:\s+,?|\s*,)\s*(?=[^\s])/g, this also moves the duplicate matcher for \s* to outside of the non-capture group.

const data = [
  '0 0 100 100',
  ' 0 0 100 100 ',
  '  0  0  100  100  ',
  ' 0  0  0.5  .5 ',
  '20.000001 -19.99999 17.123456 70.708090',
];

const splitV2 = (viewbox) => {
  return viewbox.trim().split(/\s,?\s*|,\s*/g);
};
// v2 x 1,585,072 ops/sec ±1.14% (92 runs sampled)

const splitV5 = (viewbox) => {
  return viewbox.split(/\b(?:\s+,?|\s*,)\s*(?=[^\s])/g);
};
// v5 x 1,736,749 ops/sec ±0.43% (93 runs sampled)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[^\s] is just \S

Copy link
Member Author

@SethFalco SethFalco Jun 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd opened a pull request to apply the regex change, but closed it.

I've been getting more into benchmarking, and decided to finally revamp how I benchmark locally, which includes defining more rules and utilities.

One rule, is that functions must have identical input/output. Not "effectively identical", which is what I did before, though there may be some exceptions to that later, like with math/precision. This enforces a more complete benchmark, rather than only testing intermediate steps in an algorithm, where the performance of later steps may be impacted by the results of earlier steps.

For example, I treated "100" and " 100" as the same, because we know that both become 100 when called on Number(). However, this then doesn't account for the performance of Number() itself with the different inputs.

Writing a more complete benchmark has a notable impact on results:

const data = [
  '0 0 120 120',
  '0 0 230 120',
  '110 0 120 120',
  '0 0 200.28423 200.28423',
  '20.000001 -19.99999 17.123456 70.708090',
  ' 0 0      150 100 ',
  '  0  0  0.5  .5  ',
];

function splitV2(viewbox) {
  return viewbox
    .trim()
    .split(/(?:\s,?|,)\s*/g)
    .map(Number);
};
// v2 x 619,413 ops/sec ±0.42% (97 runs sampled)

function splitV5(viewbox) {
  return viewbox
    .split(/\b(?:\s+,?|\s*,)\s*(?=\S)/g)
    .map(Number);
};
// v5 x 599,606 ops/sec ±0.42% (94 runs sampled)

The difference is minor (4-5%), but I'd argue using trim is simpler to maintain over a more complicated regex (I say this as someone that loves regex.) while offering on-par or better performance over the inputs we'd realistically encounter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But now we have .filter() instead of .trim(), which probably is worse.

Copy link
Member Author

@SethFalco SethFalco Jun 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, .filter() is worse! I still plan to do a PR for this to use trim and to move the \s* outside the group. I was just dropping an update already with the findings.

Before opening another PR, I'm just going to finish evaluating how I benchmark locally so I can handle this better in future, and review other decisions I've made historically based on benchmarks. (Outside of SVGO too.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cleanupAttrs in 'preset-default' overrides is not respected
2 participants