Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow _normalizeIPv6 #40

Closed
mcollina opened this issue Apr 20, 2019 · 2 comments
Closed

Slow _normalizeIPv6 #40

mcollina opened this issue Apr 20, 2019 · 2 comments

Comments

@mcollina
Copy link

We are investigating slow startup time in Fastify, and we identified this library as our primary bottleneck (see also ajv-validator/ajv#995).

This is a flamegraph of the startup time. Apparently, the bottleneck is normalizing IPv6 addresses in

const matches = host.match(protocol.IPV6ADDRESS) || [];
.

Note that that RegExp is extremely long:

/^\[?((?:(?:(?:(?:[0-9A-Fa-f]{1,4})\:){6}(?:(?:(?:[0-9A-Fa-f]{1,4})\:(?:[0-9A-Fa-f]{1,4}))|(?:(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9]))))|(?:\:\:(?:(?:[0-9A-Fa-f]{1,4})\:){5}(?:(?:(?:[0-9A-Fa-f]{1,4})\:(?:[0-9A-Fa-f]{1,4}))|(?:(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9]))))|(?:(?:(?:[0-9A-Fa-f]{1,4}))?\:\:(?:(?:[0-9A-Fa-f]{1,4})\:){4}(?:(?:(?:[0-9A-Fa-f]{1,4})\:(?:[0-9A-Fa-f]{1,4}))|(?:(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9]))))|(?:(?:(?:(?:[0-9A-Fa-f]{1,4})\:){0,1}(?:[0-9A-Fa-f]{1,4}))?\:\:(?:(?:[0-9A-Fa-f]{1,4})\:){3}(?:(?:(?:[0-9A-Fa-f]{1,4})\:(?:[0-9A-Fa-f]{1,4}))|(?:(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9]))))|(?:(?:(?:(?:[0-9A-Fa-f]{1,4})\:){0,2}(?:[0-9A-Fa-f]{1,4}))?\:\:(?:(?:[0-9A-Fa-f]{1,4})\:){2}(?:(?:(?:[0-9A-Fa-f]{1,4})\:(?:[0-9A-Fa-f]{1,4}))|(?:(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9]))))|(?:(?:(?:(?:[0-9A-Fa-f]{1,4})\:){0,3}(?:[0-9A-Fa-f]{1,4}))?\:\:(?:[0-9A-Fa-f]{1,4})\:(?:(?:(?:[0-9A-Fa-f]{1,4})\:(?:[0-9A-Fa-f]{1,4}))|(?:(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9]))))|(?:(?:(?:(?:[0-9A-Fa-f]{1,4})\:){0,4}(?:[0-9A-Fa-f]{1,4}))?\:\:(?:(?:(?:[0-9A-Fa-f]{1,4})\:(?:[0-9A-Fa-f]{1,4}))|(?:(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9]))))|(?:(?:(?:(?:[0-9A-Fa-f]{1,4})\:){0,5}(?:[0-9A-Fa-f]{1,4}))?\:\:(?:[0-9A-Fa-f]{1,4}))|(?:(?:(?:(?:[0-9A-Fa-f]{1,4})\:){0,6}(?:[0-9A-Fa-f]{1,4}))?\:\:)))(?:(?:\%25|\%(?![0-9A-Fa-f]{2}))((?:(?:[A-Za-z0-9\-\.\_\~\xA0-\u200D\u2010-\u2029\u202F-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]|(?:(?:%[EFef][0-9A-Fa-f]%[0-9A-Fa-f][0-9A-Fa-f]%[0-9A-Fa-f][0-9A-Fa-f])|(?:%[89A-Fa-f][0-9A-Fa-f]%[0-9A-Fa-f][0-9A-Fa-f])|(?:%[0-9A-Fa-f][0-9A-Fa-f])))+)))?\]?$/

This regexp is costly to evaluate at a 5-10ms each time on my machine (Node 10).

Do you think it's possible to simplify the above regexp so it's simpler and faster to execute?

@jsumners
Copy link

jsumners commented Sep 17, 2019

@mcollina I think that is the generally accepted way to validate a string as an IPv6 address. Other methods involve feeding the string into a library that validates by converting the string to bytes and determining if the bytes are a valid address -- https://guava.dev/releases/19.0/api/docs/src-html/com/google/common/net/InetAddresses.html#line.167

@garycourt
Copy link
Owner

Having looked into this, I am not able to reduce the complexity of this RegExp without sacrificing correctness, unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants