Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize indentation - cache indentation string #286

Closed
wants to merge 3 commits into from

Conversation

rhengles
Copy link
Contributor

Well, the title says it all... while working on adding the useTabs option, I came up with this. Is it faster to add a string with the length of tabWidth instead of adding each space every time?

@@ -211,7 +221,10 @@ function fits(next, restCommands, width) {
return false;
}

function print(w, doc) {
function print(options, doc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The crux of the issue is that we are doing repeatIndent from 0 for every single line that we are printing.

What I would recommend you to do is to make an indent cache inside of this function that looks like this:

let indentCache = [];
function _makeIndent(n) {
  let cached = indentCache[n];
  if (cached) { return cached; }

  let result = '';
  for (let i = 0; i < n; ++i) {
    result += " ";
  }
  indentCache[n] = result;
  return result;
}

(of course tweak this again when you need to change the spacing to be tabs).

What do you think?

} else {
indentCache[ind] = indentLine = _repeatIndent(ind, indentStr);
}
out.push("\n" + indentLine);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vjeux Sure, thank you! How about this? I want to avoid global state and a function call if the value is already cached...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vjeux Ooops, there is some leftover code... let me fix it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run your benchmark and see if a function call makes any difference? If you are going this route, may as drop the caching from the indent function altogether since you are doing it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, will do that. But the bigger issue really is the global state - if I call prettier.format(...) two times in the same environment with different options (I know this should be very rare, but still), the second time would use the options from the first.
That's why I put the indentCache object inside the print() function.

If the performance difference is negligible, is it better to pass the indentCache object to the _repeatIndent() function and put the check inside there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realised - it is negligible, because it would be called only once for each level of indentation. I will proceed to send the cache object to the repeatIndent function.

s += " ";
}

return s;
}

function _repeatIndent(n, indent) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those two functions are doing the same thing. Could you rename this one to _repeatString and replace the call sites of _makeIndent(n) to _repeatString(n, " ")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, String.prototype.repeat was added since node 3. This project doesn't support node 0.12 so you don't even need to implement this. You can use indent.repeat(n) instead :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want this to run in most browsers though, too, for demos and such. Is that function implemented in all modern browsers?

@jlongster
Copy link
Member

jlongster commented Jan 18, 2017

Agreed with @vjeux comment about but first: can you measure the function and give us some numbers that actually provide it is worth the extra code?

I suspect it won't be. That is linear for the number of lines in file so even a 2000 line file, it'll only run 2000 times which honestly isn't that much. There are many other places we should optimize first.

I'll do some quick numbers. Here's a simple program:

function _makeIndent(n) {
  var s = "";

  for (var i = 0; i < n; i++) {
    s += " ";
  }

  return s;
}

for(var i=0; i<5; i++) {
  const start = Date.now();
  for(var n=0; n<2000; n++) {
    global.x = _makeIndent(n);
  }
  console.log(Date.now() - start);
}

The runs the benchmark 5 times and creates an indent 2000 times, with the indent linearly increasing, so this will show the worse case too (will create an indentation level up to 2000). Setting it on global.x should avoid v8 from optimizing it away completely. The results:

% time node test.js
36
18
19
16
16

Even an unoptimized version only takes 36ms. I'm guessing the JIT kicks in and future runs are half the cost. But let's make it a realistic benchmark and do only sane indent levels by changing the line to:

    global.x = _makeIndent(n % 10);

Now it only creates indent levels from 0 to 10. The results:

% time node test.js
15
1
0
0
1

The JIT versions are instantaneous. This is probably because strings are immutable in JS engines, so once it creates a string, I think it can simply reuse them in future expressions so it doesn't require any new allocation.

Trying to optimize it might just make it slower. You can't really beat JS engines with the kind of stuff; trust the JIT! Let's try the above optimized version with our test that creates indent level of 0-10 many times (won't run the 0-2000 case as that's unfair as it never reuses indent levels):

let indentCache = [];
function _makeIndent(n) {
  let cached = indentCache[n];
  if (cached) { return cached; }

  let result = '';
  for (let i = 0; i < n; ++i) {
    result += " ";
  }
  indentCache[n] = result;
  return result;
}

for(var i=0; i<5; i++) {
  const start = Date.now();
  for(var n=0; n<2000; n++) {
    global.x = _makeIndent(n % 10);
  }
  console.log(Date.now() - start);
}

Here are the results:

% time node test.js
16
1
0
0
1

This is no faster than the unoptimized version. This is not a bottleneck in our app, so we should keep the code simple.

@@ -131,16 +131,26 @@ function hasHardLine(doc) {
});
}

function _makeIndent(n) {
function _makeIndent(tabWidth) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to change this to tabWidth.

We want it specifically to represent a single space. I've been thinking about a few scenarios where we might indent something to an "odd" level: meaning, indent something 2 times, but add a space to make line up with something. Even people who use tabs sometimes have to mix spaces in there to make things line up appropriately.

Right now we have full control over indentation at the printer level. We would lose that with this, and for no gain. If we ever support tabs, it needs to be solely in the code that prints the indentation and it should convert 2 or 4 spaces to tab, and leave the rest. That's all you need to do.

@jlongster
Copy link
Member

I don't think we should merge this PR, sorry. We don't need these optimizations, and see my comment for why we shouldn't change the API. There isn't anything else in this PR.

Thanks for the suggestions, any time we need to analyze some of our code it's helpful to reaffirm assumptions. I hate rejecting PRs and I hope you continue to inspect the code and possibly optimize other areas. I would suggest you run it through Chrome's profiler and investigates places that really need to be optimized, I would love that!

@rhengles
Copy link
Contributor Author

Thank you @jlongster, it's been a great learning experience and a very thoughtful and detailed response.

@rhengles rhengles closed this Jan 18, 2017
@jlongster
Copy link
Member

@rhengles Thanks! We can probably even just use the str.repeat function and get rid of _makeIndent entirely. I just checked browser support and it's pretty good. If you want, I'd love a PR to simply to " ".repeat(n). In my benchmarks, even creating an indentation level of 2000 a lot is super performant. Thanks for cleaning up this code!

@lock lock bot added the locked-due-to-inactivity Please open a new issue and fill out the template instead of commenting. label Jan 21, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jan 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked-due-to-inactivity Please open a new issue and fill out the template instead of commenting.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants