Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: fast regex check on string #696

Merged
merged 1 commit into from Mar 18, 2024
Merged

Conversation

cesco69
Copy link
Contributor

@cesco69 cesco69 commented Mar 13, 2024

The regex

[\u0000-\u001f\u0022\u005c\ud800-\udfff]|[\ud800-\udbff](?![\udc00-\udfff])|(?:[^\ud800-\udbff]|^)[\udc00-\udfff]

can be

[\u0000-\u001f\u0022\u005c\ud800-\udfff]

The first part of alternation ([\u0000-\u001f\u0022\u005c\ud800-\udfff]) will match anything that might have been matched by the second ([\ud800-\udbff](?![\udc00-\udfff])) and third part ((?:[^\ud800-\udbff]|^)[\udc00-\udfff]).

I have made this simple online test for play with both rule (https://jsfiddle.net/nd84e7fz/):

const NEW = /[\u0000-\u001f\u0022\u005c\ud800-\udfff]/;
const OLD = /[\u0000-\u001f\u0022\u005c\ud800-\udfff]|[\ud800-\udbff](?![\udc00-\udfff])|(?:[^\ud800-\udbff]|^)[\udc00-\udfff]/;

const str1 = '\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007\b\t\n\u000b\f\r' +
              '\u000e\u000f\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017' +
              '\u0018\u0019\u001a\u001b\u001c\u001d\u001e\u001f"\\abc012'
console.log(OLD.test(str1) === NEW.test(str1), 'should escape control chars')

const str2 = '\uDF06\uD834'
console.log(OLD.test(str2) === NEW.test(str2), 'should escape surrogate pair')

output

true, "should escape control chars"
true, "should escape surrogate pair"

The two rule works in the same manner!

benchmark only regex execution

NEW REGEX 96K ops/s ± 8.44%
OLD REGEX 82K ops/s ± 8.23% (14.57 % slower)

benchmark fast-json-stringify (npm run bench) with focus on long string
master

long string without double quotes........................ x 14,857 ops/sec ±0.33% (187 runs sampled)
long string.............................................. x 15,856 ops/sec ±0.34% (192 runs sampled)

PR

long string without double quotes........................ x 15,732 ops/sec ±0.47% (191 runs sampled)
long string.............................................. x 16,211 ops/sec ±0.41% (191 runs sampled)

DIFF

long string without double quotes...... 5.88% (PR faster)
long string.............................2.24% (PR faster)

Checklist

Signed-off-by: francesco <francesco.bagnoli.69@gmail.com>
@Fdawgs Fdawgs changed the title chore(perf): fast regex (20% fast) perf: fast regex (20% fast) Mar 13, 2024
@cesco69 cesco69 changed the title perf: fast regex (20% fast) perf: fast regex (14% fast) Mar 14, 2024
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@cesco69
Copy link
Contributor Author

cesco69 commented Mar 15, 2024

Side node, also merged into the "origin" project of the regex BridgeAR/fast-json-escape#6

@mcollina this PR is a good perfrmance improvent (the other I have made are just micro-optimization), please, I need some attention here.

@cesco69 cesco69 changed the title perf: fast regex (14% fast) perf: fast regex check on string (14% fast) Mar 15, 2024
@cesco69 cesco69 changed the title perf: fast regex check on string (14% fast) perf: fast regex check on string Mar 15, 2024
@mcollina
Copy link
Member

What attention do you need?

Copy link
Member

@gurgunday gurgunday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - good catch

@cesco69
Copy link
Contributor Author

cesco69 commented Mar 18, 2024

What attention do you need?

@mcollina Because it seem a good improvement on long string parsing (from 2.2% to 5.8), I would like to get this PR into the master's as quickly as possible.

@mcollina mcollina merged commit 59ce4f7 into fastify:master Mar 18, 2024
19 checks passed
@cesco69 cesco69 deleted the patch-3 branch March 19, 2024 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants