New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance optimization discussion #366
Comments
Raw Python C API is not a silver bullet. We'd have to rewrite parts of the program in straight C and since most of it operates on blib2to3.pytree, we'd need that in C, too. My not-so-secret plan is to cythonize blib2to3 and go from there. But that's something I'll get to after Black becomes more stable and most issues currently open are solved. In the mean time, Black does safety checks that other formatters don't. If you haven't encountered any problems, use --fast which is over 2X faster than --safe. |
I ran black on a big file at work and was confused as to why it's so slow when the file needs many changes and so fast when there are no changes? Naively one would think you load a full syntax AST, run some transformers, dump it and if the new string is different from the old you write it. If this was the case then it should be roughly the same for many changes and no changes... So where is my naive thinking incorrect? |
There is a cache. If the file hasn't changed since the last time we ran Black, we don't do any processing. |
It would be amazing to compile Black with @mypyc/mypyc. If I understand correctly, @msullivan is interested in making it happen. FTR, if this will require changes to the Black codebase, I'm open to that as long as they are not sweeping or overly disruptive. |
I can start sending up PRs to support mypyc in the next week building on work done by my intern from last summer @SanjitKal The most disruptive part of doing this is that |
I have a draft branch that makes black able to compile and run with mypyc: https://github.com/msullivan/black/tree/mypyc (It requires python/mypy#7481 to land pickling support in mypyc before it works right) |
Thanks so much for working on this @msullivan. I say we should blacken blib2to3 too if we are going to add annotations to it anyway. @zsol @ambv do you agree? |
Hi everyone, I have been working on finishing mypyc support since the original work done in 2019. While I'm not confident enough to submit a work (there's probably more edge cases to fix + even more optimizations to be done), I have been able to get a fully (EDIT: I realized that I actually forgot to mark a few files in +--------------------------------+-------------+-----------------------+
| Benchmark | interpreted | compiled |
+================================+=============+=======================+
| src/black/brackets.py | 264 ms | 125 ms: 2.11x faster |
+--------------------------------+-------------+-----------------------+
| src/black/cache.py | 54.2 ms | 29.3 ms: 1.85x faster |
+--------------------------------+-------------+-----------------------+
| src/black/comments.py | 180 ms | 98.9 ms: 1.82x faster |
+--------------------------------+-------------+-----------------------+
| src/black/concurrency.py | 28.4 ms | 16.7 ms: 1.71x faster |
+--------------------------------+-------------+-----------------------+
| src/black/const.py | 4.59 ms | 2.34 ms: 1.96x faster |
+--------------------------------+-------------+-----------------------+
| src/black/debug.py | 41.5 ms | 22.9 ms: 1.81x faster |
+--------------------------------+-------------+-----------------------+
| src/black/files.py | 144 ms | 84.5 ms: 1.70x faster |
+--------------------------------+-------------+-----------------------+
| src/black/linegen.py | 904 ms | 417 ms: 2.17x faster |
+--------------------------------+-------------+-----------------------+
| src/black/lines.py | 526 ms | 275 ms: 1.91x faster |
+--------------------------------+-------------+-----------------------+
| src/black/mode.py | 96.9 ms | 44.1 ms: 2.20x faster |
+--------------------------------+-------------+-----------------------+
| src/black/nodes.py | 665 ms | 312 ms: 2.13x faster |
+--------------------------------+-------------+-----------------------+
| src/black/numerics.py | 44.3 ms | 25.3 ms: 1.75x faster |
+--------------------------------+-------------+-----------------------+
| src/black/output.py | 75.1 ms | 45.0 ms: 1.67x faster |
+--------------------------------+-------------+-----------------------+
| src/black/parsing.py | 147 ms | 85.0 ms: 1.72x faster |
+--------------------------------+-------------+-----------------------+
| src/black/report.py | 66.9 ms | 40.2 ms: 1.66x faster |
+--------------------------------+-------------+-----------------------+
| src/black/rusty.py | 15.7 ms | 9.43 ms: 1.66x faster |
+--------------------------------+-------------+-----------------------+
| src/black/strings.py | 138 ms | 85.9 ms: 1.61x faster |
+--------------------------------+-------------+-----------------------+
| src/black/trans.py | 1.05 sec | 580 ms: 1.80x faster |
+--------------------------------+-------------+-----------------------+
| src/black/__init__.py | 810 ms | 461 ms: 1.76x faster |
+--------------------------------+-------------+-----------------------+
| src/blackd/__init__.py | 184 ms | 98.2 ms: 1.88x faster |
+--------------------------------+-------------+-----------------------+
| src/black_primer/cli.py | 104 ms | 52.7 ms: 1.98x faster |
+--------------------------------+-------------+-----------------------+
| src/black_primer/lib.py | 269 ms | 157 ms: 1.71x faster |
+--------------------------------+-------------+-----------------------+
| src/blib2to3/pgen2/conv.py | 224 ms | 126 ms: 1.78x faster |
+--------------------------------+-------------+-----------------------+
| src/blib2to3/pgen2/driver.py | 213 ms | 111 ms: 1.92x faster |
+--------------------------------+-------------+-----------------------+
| src/blib2to3/pgen2/grammar.py | 93.7 ms | 52.6 ms: 1.78x faster |
+--------------------------------+-------------+-----------------------+
| src/blib2to3/pgen2/literals.py | 55.5 ms | 32.8 ms: 1.69x faster |
+--------------------------------+-------------+-----------------------+
| src/blib2to3/pgen2/parse.py | 141 ms | 82.8 ms: 1.70x faster |
+--------------------------------+-------------+-----------------------+
| src/blib2to3/pgen2/pgen.py | 391 ms | 221 ms: 1.77x faster |
+--------------------------------+-------------+-----------------------+
| src/blib2to3/pgen2/token.py | 71.6 ms | 41.2 ms: 1.74x faster |
+--------------------------------+-------------+-----------------------+
| src/blib2to3/pgen2/tokenize.py | 468 ms | 266 ms: 1.76x faster |
+--------------------------------+-------------+-----------------------+
| src/blib2to3/pygram.py | 106 ms | 54.7 ms: 1.93x faster |
+--------------------------------+-------------+-----------------------+
| src/blib2to3/pytree.py | 630 ms | 350 ms: 1.80x faster |
+--------------------------------+-------------+-----------------------+
| Geometric mean | (ref) | 1.82x faster |
+--------------------------------+-------------+-----------------------+ Note that these benchmark results aren't perfect, I haven't done the usual system tweaks to improve the reliability (note that I plan to produce more reliable results once I am ready to submit a PR). Also, all of the files were already well formatted so no safety checks were run (I expect those to worsen the performance improvement). But for some credibility, I did use pyperf in the following script: benchmark.py (note: very messy!)import os
import subprocess
import sys
import time
from functools import partial
from pathlib import Path
import pyperf
import black
THIS_DIR = Path(__file__).parent
_TARGETS = [
"src/black/brackets.py",
"src/black/cache.py",
"src/black/comments.py",
"src/black/concurrency.py",
"src/black/const.py",
"src/black/debug.py",
"src/black/files.py",
"src/black/linegen.py",
"src/black/lines.py",
"src/black/mode.py",
"src/black/nodes.py",
"src/black/numerics.py",
"src/black/output.py",
"src/black/parsing.py",
"src/black/report.py",
"src/black/rusty.py",
"src/black/strings.py",
"src/black/trans.py",
"src/black/__init__.py",
"src/blackd/__init__.py",
"src/black_primer/cli.py",
"src/black_primer/lib.py",
"src/blib2to3/pgen2/conv.py",
"src/blib2to3/pgen2/driver.py",
"src/blib2to3/pgen2/grammar.py",
"src/blib2to3/pgen2/literals.py",
"src/blib2to3/pgen2/parse.py",
"src/blib2to3/pgen2/pgen.py",
"src/blib2to3/pgen2/token.py",
"src/blib2to3/pgen2/tokenize.py",
"src/blib2to3/pygram.py",
"src/blib2to3/pytree.py",
]
TARGETS = [THIS_DIR / Path(path) for path in _TARGETS]
if dump_path := os.getenv("BM_DUMP_TO", False):
dump_path = Path(dump_path)
print(f"Will dump results to `{dump_path!s}`.")
else:
print("Need a filepath to dump results to (use `BM_DUMP_TO` env var).")
sys.exit(1)
SCRIPT = """\
from pathlib import Path
from functools import partial
import pyperf
import black
TARGET = Path("{target}")
runner = pyperf.Runner()
with open(TARGET, "r", encoding="utf8") as f:
code = f.read()
def format_func(code):
try:
black.format_file_contents(code, fast={fast}, mode={mode})
except black.NothingChanged:
pass
runner.bench_func("{name}", format_func, code)
"""
runner = pyperf.Runner()
results = []
try:
tmp_dir = THIS_DIR / "benchmark-workdir"
for i, t in enumerate(TARGETS, start=1):
print(f"Benchmarking `{t!s}` ({i}/{len(TARGETS)})")
bench_script = SCRIPT.format(
target=str(t.resolve()), name=str(t), fast="False", mode="black.Mode(experimental_string_processing=True)"
)
script = tmp_dir / f"{i}.py"
with open(script, "w", encoding="utf8") as f:
f.write(bench_script)
tmp_file = tmp_dir / f"{i}.json"
t0 = time.perf_counter()
subprocess.run([sys.executable, str(script), "--output", str(tmp_file)], check=True)
t1 = time.perf_counter()
print(f"Took {round(t1 - t0, 3)} seconds.")
with open(tmp_file, "r", encoding="utf8") as f:
result = pyperf.Benchmark.loads(f.read())
results.append(result)
except Exception:
raise
suite = pyperf.BenchmarkSuite(results)
suite.dump(str(dump_path), replace=True) |
Update:
After some more compatibility work, here's the new (this time gathered on under a tuned stable system) numbers:
**note that these numbers are from a wheel built locally, no clue on what the cibuildwheel wheels' numbers are like TO-DOs:
Finally, since this project ended up more involved than I had originally expected, I'm tracking and managing my work in a separate repository (it originally contained the automated wheel build workflow) now: https://github.com/ichard26/black-mypyc-wheels. Head over there to see the status on this work, item by item. Thanks for the patience! |
Mise à jour deux! Alors ... yeah no I am not writing the rest of this in French. Anyway, there's been some more progress on this.
TO-DOs:
TL;DR: black is now even faster, there's new tooling available, and I'm closer to (finally) putting up my PR. After that, time for community testing :D |
Alright I'm kicking off community testing for the mypyc wheels 🎉 To get started please see this issue: ichard26/black-mypyc-wheels#12. Please redirect any comment on this effort on that issue tracker and not here. Thank you in advance! |
We now have mypyc wheels and I don't think there's anything actionable in this issue. New issues can be created for any specific performance improvements. |
This is similar to #349, and #109. I am unfortunate enough to deal with 100+ KB python files, on
a system where black runs at only 60KB/s. For the same reason as in #349, black is not a viable save hook, except that here the problem is single-file throughput, not startup time.
[Low throughput is common to the other python formatters that I've tried (yapf, autopep8). Profiling immediately reveals that, while much of the time is spent by other libraries (esp in
lib2to3_parse
), but at least 10-20% falls within black itself. There are specific solutions to the save hook performance problem (e.g., by detecting which parts of the file have changed and only reformatting those), but they would all make the program logic much more complicated.]Would it be feasible to stop using a single-file approach for black and progressively move stable, basic types (like
Line
orBracketTracker
) to a compiled library which is reachable by the main code through Python's C API? This does complicate installation, but would finally provide a path to speed comparable with e.g.gofmt
andclang-format
.The text was updated successfully, but these errors were encountered: