Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve black startup to format a single file. Fixes #2564 #2656

Closed
wants to merge 1 commit into from

Conversation

fabioz
Copy link
Contributor

@fabioz fabioz commented Nov 29, 2021

Make the startup of black faster when formatting a single file by importing less modules and skipping loading of .gitignore.

With this patch the time to format one file went from a baseline of:
(min /max time from doing many runs on my machine)
0.29s -> 0.47s to 0.17s -> 0.32s

-- Note that this includes the time to format too, not just the startup time. Also, this depends a lot on the target machine (slower machines will probably have more to gain)... in my machine I'd say there's around a 50% increase in the startup speed. There's still room for improvement, but I'd say that the low-hanging fruits have been taken care of.

The number of imported modules for doing a run in this case went from 268 to 202.

It went something as:

Modules imported (baseline):
268

Without asyncio:
235

Without concurrent.futures
222

Without pathspec:
214

Without multiprocessing:
202

  • Add a CHANGELOG entry if necessary?
  • Add / update tests if necessary?
  • Add new / update outdated documentation?

@cooperlees
Copy link
Collaborator

Thanks. I like the idea. I don't know if we (maintainers collective) want lazy imports all over the code base tho. I think it's worth it potentially, but for cases where latency really matters we recommended blackd.

Some of your import moving has broke tests ... I did expect more work to be needed in some of those places. This is the fun and games with moving imports for speed introduces.

Bringing in @ichard26 to ensure this does not break mypyc - I think it's time we add an action to build with mypyc @ichard26 - I'm happy to help here to avoid regressions there.

I am interested to hear others thoughts.

@fabioz
Copy link
Contributor Author

fabioz commented Nov 29, 2021

Ok, I've just fixed the tests and added back methods which had been removed from the public API.

Also, lazy imports are added very judiciously in the PR, so, I hope this would be OK (I personally nowadays prefer lazy imports project-wide, but this really needs to be a conscious decision from the project and I understand this isn't the case for black, so, I tried to leave changes to a minimum to make it work yet still bring in substantial benefits for the use case where those modules aren't used at all).

As for the tests which were failing due to the imports, I don't think that something as black.ProcessPoolExecutor would be expected as a part of its API, so, fixing the test seems fair... the issue here is more on using monkey-patching for that (it's easy to do, but it really does makes the tests a bit more brittle).

@cooperlees cooperlees requested a review from ambv November 30, 2021 16:21
Copy link
Collaborator

@felix-hilden felix-hilden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this must be a positive. All of these dynamic imports makes me a bit iffy style-wise, but if the performance gains are deemed worth it, then it's fine by me 👍 Some comments mostly about the ignore handling below.

except GitWildMatchPatternError:
ctx.exit(1)

assert exclude_and_gitignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning for this assert? If try fails then we exit, so something else? If it needs to be there, we should probably give users a proper message.

@@ -558,6 +570,17 @@ def get_sources(

sources.add(p)
elif p.is_dir():
if exclude_and_gitignore is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somewhat dislike that this is in a loop. I get that we'd like to compute the ignore only if we have a directory. That could be better expressed in the original place with any(Path(s).is_dir() for s in src) or something like that. We could perhaps even make the conversions to Path beforehand so that it isn't duplicated in the loop. Maybe.


try:
exclude_and_gitignore = compute_exclude_and_gitignore(exclude, root)
except GitWildMatchPatternError:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that you've moved the error inward! But it's not all the way is it? Surely it could be wrapped around get_gitignore directly.

gitignore = get_gitignore(root)
else:
gitignore = None
return exclude, gitignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I appreciate the idea of making small functions, I feel this is more indirection because the decision of computing the exclude pattern is moved to the computing function. And if the decision was moved back, this would be a two-line wrapper to two simple calls. But this is highly debatable. Thoughts?

This might be better if we get rid of the exclude_and_gitignore sentinel as I suggested in another comment, because we could unpack directly: ex, ign = compute(ex, root).

# Load reformat_many as lazily as possible as loading it will
# require support for asyncio and concurrent.futures which
# are slow to import.
from black.reformat_many import reformat_many
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this, seems like a good win for essentially nothing.

Copy link
Collaborator

@ichard26 ichard26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First review pass, nothing majorly wrong but I got quite a few questions.

I'll take a closer look at this tomorrow -- in particular I'll make sure this doesn't break compilation with mypyc which brings its own import time reductions :)

Thank you for working on this!

@@ -1361,9 +1264,50 @@ def patch_click() -> None:
module._verify_python_env = lambda: None # type: ignore


def shutdown(loop: "asyncio.AbstractEventLoop") -> None:
Copy link
Collaborator

@ichard26 ichard26 Dec 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these two utility functions used anywhere given reformat_many.py uses the ones available in concurrency.py?

return concurrency.cancel(tasks)


async def schedule_formatting(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing, is this used anywhere? Right now Black doesn't have an official API (and even then I doubt multiprocessing code should be treated as public).

Comment on lines +1304 to +1310
for arg in sys.argv:
if "multiprocessing" in arg:
# Don't import multiprocessing unless really needed.
from multiprocessing import freeze_support

freeze_support()
break
Copy link
Collaborator

@ichard26 ichard26 Dec 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there documentation that confirms this is a valid way of using freeze_support on Windows? This feels exceedingly sketchy.

@ichard26
Copy link
Collaborator

FYI I picked up your PR and pushed it to completion, hopefully you don't mind. Thanks for the PR in the first place!

@fabioz
Copy link
Contributor Author

fabioz commented Aug 27, 2022

@ichard26 thank you for taking up the PR to completion (I'm sorry I didn't have more time available to continue the work to get it to completion and I really appreciate that you were able to do that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: performance Black is too slow. Or too fast.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants