Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Improvements for string().guid() #1211

Merged
merged 3 commits into from
Jun 15, 2017
Merged

Performance Improvements for string().guid() #1211

merged 3 commits into from
Jun 15, 2017

Conversation

DavidTPate
Copy link
Contributor

This PR includes some performance improvements to the run-time validation of the string().guid() validation type.

How Was This Determined?

I ran a series of benchmarks for some common cases of GUID/UUIDs and then made tweaks to improve the performance.

Schemas

const guidSchema = stringSchema.guid();
const guidVersionsSchema = stringSchema.guid({
   version: ['uuidv1', 'uuidv2', 'uuidv3', 'uuidv4']
});

Benchmarks

// validate-schema#guid
guidSchema.validate('D1A5279D-B27D-4CD4-A05E-EFDD53D08E8D');

// validate-schema#guid-wrapped
guidSchema.validate('{D1A5279D-B27D-4CD4-A05E-EFDD53D08E8D}');

// validate-schema#guid-versions
guidVersionsSchema.validate('D1A5279D-B27D-4CD4-A05E-EFDD53D08E8D');

// validate-schema#guid-versions-wrapped
guidVersionsSchema.validate('{D1A5279D-B27D-4CD4-A05E-EFDD53D08E8D}');

Original Benchmarks

Unmodified I received the following benchmarks:

string
  validate-schema#guid-versions         x 353,757 ops/sec
  validate-schema#guid-versions-wrapped x 370,182 ops/sec
  validate-schema#guid                  x 406,565 ops/sec
  validate-schema#guid-wrapped          x 423,820 ops/sec

From there I deferred more of the work to the Regular Expression but kept the version check within an if statement, which brought me the following results:

string
  validate-schema#guid                  x 428,658 ops/sec
  validate-schema#guid-versions         x 439,570 ops/sec
  validate-schema#guid-versions-wrapped x 451,946 ops/sec
  validate-schema#guid-wrapped          x 458,762 ops/sec

From there I went and pulled the version & 89AB checks along with utilizing back references within the Regular Expression for divider checks. Additionally, I simplified the Regular Expression to eliminate capture & non-capture groups where possible. This lead to the follow (current) benchmarks:

string
  validate-schema#guid-versions         x 449,913 ops/sec
  validate-schema#guid                  x 458,897 ops/sec
  validate-schema#guid-versions-wrapped x 472,751 ops/sec
  validate-schema#guid-wrapped          x 473,101 ops/sec

Note: These benchmarks were run on my machine, so if you run them expect the numbers themselves to vary (also by Node version & platform), but they should stay around the same magnitude of difference.

@DavidTPate DavidTPate added the feature New functionality or improvement label Jun 5, 2017
@Marsup
Copy link
Collaborator

Marsup commented Jun 5, 2017

According to my tests, it seems to be an improvement on node 6, but it's very much the opposite for node 8, not sure what to do here. I'd tend to favor the future version. What do you get on your machine ?

@WesTyler WesTyler mentioned this pull request Jun 5, 2017
@DavidTPate
Copy link
Contributor Author

That's very interesting as I noticed improvements on Node 8 as well.

Node 6.10.3 (Fedora 24 - 64 bit)

string
  validate-schema#guid-versions         x 372,592 ops/sec
  validate-schema#guid-versions-wrapped x 386,855 ops/sec
  validate-schema#guid                  x 427,795 ops/sec
  validate-schema#guid-wrapped          x 444,966 ops/sec

Node 6.10.3 (Fedora 24 - 64 bit) - New Code

string
  validate-schema#guid                  x 491,424 ops/sec
  validate-schema#guid-versions         x 502,937 ops/sec
  validate-schema#guid-versions-wrapped x 517,636 ops/sec
  validate-schema#guid-wrapped          x 529,663 ops/sec

Node 8.0.0 (Fedora 24 - 64 bit)

string
  validate-schema#guid-versions-wrapped x 388,319 ops/sec
  validate-schema#guid-versions         x 398,400 ops/sec
  validate-schema#guid-wrapped          x 450,996 ops/sec
  validate-schema#guid                  x 473,374 ops/sec

Node 8.0.0 (Fedora 24 - 64 bit) - New Code

string
  validate-schema#guid                  x 451,330 ops/sec
  validate-schema#guid-wrapped          x 454,979 ops/sec
  validate-schema#guid-versions-wrapped x 471,269 ops/sec
  validate-schema#guid-versions         x 499,805 ops/sec

@DavidTPate
Copy link
Contributor Author

Forgot to follow up on this as well:

I'd tend to favor the future version.

Definitely agree.

@Marsup
Copy link
Collaborator

Marsup commented Jun 13, 2017

Confirmed improvement on 8.1 ¯\_(ツ)_/¯

I'll review soon.

Copy link
Collaborator

@Marsup Marsup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it requires changes as most of my comments are minor. Let me know if you think any of those is relevant.

}

const regex = /^([\[{\(]?)([0-9A-F]{8})([:-]?)([0-9A-F]{4})([:-]?)([0-9A-F]{4})([:-]?)([0-9A-F]{4})([:-]?)([0-9A-F]{12})([\]}\)]?)$/i;
const guidRegex = new RegExp(`^([\\[{\\(]?)[0-9A-F]{8}([:-]?)[0-9A-F]{4}\\2?[${checkVersion ? versionNumbers : '0-9A-F'}][0-9A-F]{3}\\2?[${checkVersion ? '89AB' : '0-9A-F'}][0-9A-F]{3}\\2?[0-9A-F]{12}([\\]}\\)]?)$`, 'i');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I long for named capturing groups...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You and me both, I can't tell you how many times I counted and recounted these things as I was making incremental changes.

}

versionNumbers = versionNumbers.join('');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it a deopt to change a var type ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think I was just having a bout of temporary idiocy. Instead of versionNumbers.push(versionNumber) and versionNumbers = versionNumbers.join('') I could just replace line 361 with versionNumbers += versionNumber so that I am left with my concatenated string.

The purpose of these parts is solely to be used for building out the RegExp.

versions.push(version);
const versionNumber = uuids[version];
Hoek.assert(versionNumber, 'version at position ' + i + ' must be one of ' + Object.keys(uuids).join(', '));
Hoek.assert(!(versionNumber in versions), 'version at position ' + i + ' must not be a duplicate.');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasOwnProperty is usually faster, but that'd be marginal. Think a Set would do better ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be interested to see. If it's marginal either way I would prefer a Set as I think it will be easier to comprehend the entire block of code.

const versions = [];
let versionNumbers = [];
const versions = {};
let checkVersion = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checkVersion is basically versionNumbers's truthiness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, taking a second look I agree.

@DavidTPate
Copy link
Contributor Author

DavidTPate commented Jun 14, 2017

So I tried a few different versions. Ultimately the Set was the most consistent and fastest, results from the 3 different tests are below:

Object In

string
  validate-schema#guid-versions-wrapped x 433,471 ops/sec
  validate-schema#guid-wrapped          x 443,962 ops/sec
  validate-schema#guid-versions         x 458,346 ops/sec
  validate-schema#guid                  x 464,135 ops/sec

Set

string
  validate-schema#guid-wrapped          x 461,747 ops/sec
  validate-schema#guid-versions-wrapped x 468,894 ops/sec
  validate-schema#guid-versions         x 476,740 ops/sec
  validate-schema#guid                  x 481,809 ops/sec

hasOwnProperty

string
  validate-schema#guid-wrapped          x 417,693 ops/sec
  validate-schema#guid-versions-wrapped x 418,105 ops/sec
  validate-schema#guid-versions         x 434,678 ops/sec
  validate-schema#guid                  x 436,262 ops/sec

@WesTyler
Copy link
Contributor

Nice :D

@Marsup
Copy link
Collaborator

Marsup commented Jun 14, 2017

Perfect, I'll be waiting for your update with a Set then.

FYI I have noticed the same kind of improvement using Maps instead of objects as hash, I might be switching joi to that where it makes sense at some point.

@Marsup Marsup added this to the 10.6.0 milestone Jun 15, 2017
@Marsup Marsup self-assigned this Jun 15, 2017
@Marsup Marsup merged commit fcc8fc6 into hapijs:master Jun 15, 2017
@DavidTPate DavidTPate deleted the guid-perf branch June 15, 2017 19:44
@lock
Copy link

lock bot commented Jan 9, 2020

This thread has been automatically locked due to inactivity. Please open a new issue for related bugs or questions following the new issue template instructions.

@lock lock bot locked as resolved and limited conversation to collaborators Jan 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature New functionality or improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants