-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: don't insert 0 at start or end of attribute if whitespace #2036
Conversation
440ed4f
to
2c80c72
Compare
@@ -39,6 +39,7 @@ export const fn = (_root, params) => { | |||
if (node.attributes.viewBox != null) { | |||
const nums = node.attributes.viewBox.split(/\s,?\s*|,\s*/g); | |||
node.attributes.viewBox = nums | |||
.filter((value) => value.length != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be easier and more performant to amend regex to something like /\b(\s+,?\s*|,\s*)\b/g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't see how that'd work.
For clarity, with your proposed regex, the split would be:
'0 0 100 100'
⇒['0', ' ', '0', ' ', '100', ' ', '100']
'0 0 100 100'
⇒[' 0', ' ', '0', ' ', '100', ' ', '100 ']
'0 0 100 100'
⇒[' 0', ' ', '0', ' ', '100', ' ', '100 ']
I don't think this would be easy with regex alone, as it'd require a negative look behind with a non-fixed width, which isn't possible.
For example, to exclude the spaces at the end, /(?:\s,?\s*|,\s*)(?!\s*$)/g
would work. But the same strategy can't be employed to exclude the spaces at the start.
Meanwhile, even if we were to ignore the different results, using either of the proposed regexes performs slower than a simple regex + iteration.
Something that actually would be both sensible and faster would be to trim
the attribute instead of iterating after splitting, though.
i.e. node.attributes.viewBox.trim().split(/\s,?\s*|,\s*/g)
const Benchmark = require('benchmark');
const suite = new Benchmark.Suite;
const data = [
'0 0 100 100',
' 0 0 100 100 ',
' 0 0 100 100 ',
]
function splitV1(viewbox) {
return viewbox.split(/\s,?\s*|,\s*/g).filter(v => v.length != 0);
};
// v1 x 2,648,671 ops/sec ±0.87% (91 runs sampled)
function splitV2(viewbox) {
return viewbox.trim().split(/\s,?\s*|,\s*/g);
};
// v2 x 2,800,976 ops/sec ±0.53% (93 runs sampled)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Number()
is ok to extra spaces, so all we need is to split numbers. The key factor here is to get rid of empty strings at the start and the end. So, to make it work in the middle, I've added \b
separator (lookbehind and lookahead also would work, but \b
is simpler). I forgot, that split()
captures group, so non-capturing group is needed. So, the regex would be like: /\b(?:\s+,?\s*|\s*,\s*)\b/g
.
Also, matchAll()
can be used just to get numbers, but it will not be spec compliant — any separators would work.
Edit: there is one drawback though: numbers can start with a dot, so instead of \b
better to use lookahead at the end: /\b(?:\s+,?\s*|\s*,\s*)(?=[.\d])/g
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, yeah, that makes sense. And indeed, your proposal works now!
As we have existing test cases that support -
, we should add that too.
So /\b(?:\s+,?\s*|\s*,\s*)(?=[-.\d])/g
, which includes a -
in the look ahead.
Alternatively, we could also just use [^\s]
to exclude spaces rather than a list of items to include, which simplifies the regex a little while offering more safety in case we missed something.
So /\b(?:\s+,?|\s*,)\s*(?=[^\s])/g
, this also moves the duplicate matcher for \s*
to outside of the non-capture group.
const data = [
'0 0 100 100',
' 0 0 100 100 ',
' 0 0 100 100 ',
' 0 0 0.5 .5 ',
'20.000001 -19.99999 17.123456 70.708090',
];
const splitV2 = (viewbox) => {
return viewbox.trim().split(/\s,?\s*|,\s*/g);
};
// v2 x 1,585,072 ops/sec ±1.14% (92 runs sampled)
const splitV5 = (viewbox) => {
return viewbox.split(/\b(?:\s+,?|\s*,)\s*(?=[^\s])/g);
};
// v5 x 1,736,749 ops/sec ±0.43% (93 runs sampled)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[^\s]
is just \S
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd opened a pull request to apply the regex change, but closed it.
I've been getting more into benchmarking, and decided to finally revamp how I benchmark locally, which includes defining more rules and utilities.
One rule, is that functions must have identical input/output. Not "effectively identical", which is what I did before, though there may be some exceptions to that later, like with math/precision. This enforces a more complete benchmark, rather than only testing intermediate steps in an algorithm, where the performance of later steps may be impacted by the results of earlier steps.
For example, I treated "100"
and " 100"
as the same, because we know that both become 100
when called on Number()
. However, this then doesn't account for the performance of Number()
itself with the different inputs.
Writing a more complete benchmark has a notable impact on results:
const data = [
'0 0 120 120',
'0 0 230 120',
'110 0 120 120',
'0 0 200.28423 200.28423',
'20.000001 -19.99999 17.123456 70.708090',
' 0 0 150 100 ',
' 0 0 0.5 .5 ',
];
function splitV2(viewbox) {
return viewbox
.trim()
.split(/(?:\s,?|,)\s*/g)
.map(Number);
};
// v2 x 619,413 ops/sec ±0.42% (97 runs sampled)
function splitV5(viewbox) {
return viewbox
.split(/\b(?:\s+,?|\s*,)\s*(?=\S)/g)
.map(Number);
};
// v5 x 599,606 ops/sec ±0.42% (94 runs sampled)
The difference is minor (4-5%), but I'd argue using trim
is simpler to maintain over a more complicated regex (I say this as someone that loves regex.) while offering on-par or better performance over the inputs we'd realistically encounter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But now we have .filter()
instead of .trim()
, which probably is worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, .filter()
is worse! I still plan to do a PR for this to use trim
and to move the \s*
outside the group. I was just dropping an update already with the findings.
Before opening another PR, I'm just going to finish evaluating how I benchmark locally so I can handle this better in future, and review other decisions I've made historically based on benchmarks. (Outside of SVGO too.)
Docs
Updates the documentation to clarify explicitly that the plugin impacts whitespace. Internally, it's because we parse all numbers in the attribute into an array, then rewrite it back.
Bug Fix
I also fix a bug that I discovered while testing the issue reported. When an attribute like
" 0 0 150 100 "
comes up, SVGO inserts a0
to the start and end of the attribute, completely changing the meaning.This was because when we do the split, the array we get back starts with an empty string and ends with an empty string, which were then converted to
0
, resulting in"0 0 0 150 100 0"
instead of"0 0 150 100"
.Now we filter them out before parsing.
Performance
Just applying a recommendation from SonarLint, use
RegExp#exec
instead ofString#match
.Related