Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

path: Improve POSIX path.join performance #6929

Closed
wants to merge 1 commit into from
Closed

path: Improve POSIX path.join performance #6929

wants to merge 1 commit into from

Conversation

joliss
Copy link

@joliss joliss commented Jan 21, 2014

Here is a simple benchmark:

var path = require('path');
var foo = 'foo', bar = 'bar';

var n = 1000000
global.x = new Array(n)
global.y = new Array(n)

console.time('path.join');
for (var i = 0; i < n; i++) {
  x[i] = path.join(foo, bar);
}
console.timeEnd('path.join');


function myJoin (foo, bar) {
  // Naive implementation for comparison
  return foo + '/' + bar;
}

console.time('myJoin')
for (i = 0; i < n; i++) {
  y[i] = myJoin(foo, bar);
}
console.timeEnd('myJoin')

This commit reduces the time for path.join in the above benchmark from
2 us to .45 us. It is still much slower than the naive myJoin
implementation (.2 us), but it's progress at least.

There seems to be some duplicate work between join, normalize, and
normalizeArray, if people want to optimize it further.

Note that this only improves the POSIX implementation. The Windows
implementation uses .filter closures as well (and also regexes), so
there is much room for improvement, but I'm not trying to fix it here.

Thanks to @isaacs and @othiym23 for helping with this optimization.

@Nodejs-Jenkins
Copy link

Thank you for contributing this pull request! Here are a few pointers to make sure your submission will be considered for inclusion.

Commit joliss/node@e2b8922 has the following error(s):

  • Commit message must indicate the subsystem this commit changes

The following commiters were not found in the CLA:

  • Jo Liss

You can fix all these things without opening another issue.

Please see CONTRIBUTING.md for more information

@joliss
Copy link
Author

joliss commented Jan 21, 2014

Added subsystem and signed CLA. Should be OK now.

var path = '';
for (var i = 0; i < arguments.length; i++) {
var segment = arguments[i];
if (!util.isString(segment)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious. Have you benchmarked what it would be w/o the check?

Also, I know it's not your code, but since you're here. Can you change it to exports.join = function join() {. Unnamed functions are such a pain to track down with performance benchmarking.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious. Have you benchmarked what it would be w/o the check?

I tried it, and it doesn't make a difference (not above the noise at least).

Also, I know it's not your code, but since you're here. Can you change it to exports.join = function join() {. Unnamed functions are such a pain to track down with performance benchmarking.

All the functions in the module are anonymous, so it'd be inconsistent if I changed it only for this one. May I suggest you add names to all the functions in a separate commit?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oy. Hadn't noticed that. Sure, make it a separate commit or don't bother. Either way is fine with me. :)

Here is a simple benchmark:

```js
var path = require('path');
var foo = 'foo', bar = 'bar';
function myJoin (foo, bar) {
  // Naive implementation for comparison
  return foo + '/' + bar;
}
console.time('join');
for (var i = 0; i < 1000000; i++) {
  path.join(foo, bar);
}
console.timeEnd('join');
```

This commit reduces the time for path.join in the above benchmark from
2 us to .45 us. It is still much slower than the naive myJoin
implementation (well below .1 us), but it's progress at least.

There seems to be some duplicate work between join, normalize, and
normalizeArray, if people want to optimize it further.

Note that this only improves the POSIX implementation. The Windows
implementation uses .filter closures as well (and also regexes), so
there is much room for improvement, but I'm not trying to fix it here.

Thanks to @isaacs and @othiym23 for helping with this optimization.
path = normalizeArray(path.split('/').filter(function(p) {
return !!p;
}), !isAbsolute).join('/');
for (var i = 0; i < segments.length; i++) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say the first one is an empty string, that means it's an absolute path. So you could force push the first element then start the for loop at 1. Like so:

nonEmptySegments.push(segments[0]);
for (var i = 1; i < segments.length; i++) {

Then the return (isAbsolute ? '/' : '') + path; isn't necessary at the bottom because [ '', 'a', 'b', 'c' ].join('/') == '/a/b/c'. So it'll auto resolve whether the path is absolute or not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's getting late, and my brain is having trouble parsing whether this is really 100% equivalent in this function. I don't think it makes a difference in performance, and it's harder to figure out what's going on when the first segment of nonEmptySegments is possibly empty. Should we leave it as is?

@trevnorris
Copy link

Well, you're correct. There are a lot of performance improvements that can be done here. Honestly I'm cool with how it is now if you don't feel like following up with any of the suggestions. Ping me back when you've done everything you feel like touching and I'll sign-off.

@trevnorris
Copy link

Merged in b9bec20 with reworded commit message. Thanks for the work. :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants