Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance optimization discussion #366

Closed
frsfnrrg opened this issue Jun 21, 2018 · 12 comments
Closed

Performance optimization discussion #366

frsfnrrg opened this issue Jun 21, 2018 · 12 comments
Labels
C: performance Black is too slow. Or too fast. S: accepted The changes in this design / enhancement issue have been accepted and can be implemented T: enhancement New feature or request

Comments

@frsfnrrg
Copy link

This is similar to #349, and #109. I am unfortunate enough to deal with 100+ KB python files, on
a system where black runs at only 60KB/s. For the same reason as in #349, black is not a viable save hook, except that here the problem is single-file throughput, not startup time.

[Low throughput is common to the other python formatters that I've tried (yapf, autopep8). Profiling immediately reveals that, while much of the time is spent by other libraries (esp in lib2to3_parse), but at least 10-20% falls within black itself. There are specific solutions to the save hook performance problem (e.g., by detecting which parts of the file have changed and only reformatting those), but they would all make the program logic much more complicated.]

Would it be feasible to stop using a single-file approach for black and progressively move stable, basic types (like Line or BracketTracker) to a compiled library which is reachable by the main code through Python's C API? This does complicate installation, but would finally provide a path to speed comparable with e.g. gofmt and clang-format.

@ambv
Copy link
Collaborator

ambv commented Jun 21, 2018

Raw Python C API is not a silver bullet. We'd have to rewrite parts of the program in straight C and since most of it operates on blib2to3.pytree, we'd need that in C, too.

My not-so-secret plan is to cythonize blib2to3 and go from there. But that's something I'll get to after Black becomes more stable and most issues currently open are solved.

In the mean time, Black does safety checks that other formatters don't. If you haven't encountered any problems, use --fast which is over 2X faster than --safe.

@ambv ambv changed the title [Feature Request] Consider using compiled code for common functions Performance optimization discussion Jun 21, 2018
@ambv ambv added T: user support OP looking for assistance or answers to a question T: style What do we want Blackened code to look like? labels Jun 21, 2018
@boxed
Copy link

boxed commented Oct 30, 2018

I ran black on a big file at work and was confused as to why it's so slow when the file needs many changes and so fast when there are no changes? Naively one would think you load a full syntax AST, run some transformers, dump it and if the new string is different from the old you write it. If this was the case then it should be roughly the same for many changes and no changes... So where is my naive thinking incorrect?

@JelleZijlstra
Copy link
Collaborator

There is a cache. If the file hasn't changed since the last time we ran Black, we don't do any processing.

@ambv
Copy link
Collaborator

ambv commented May 7, 2019

It would be amazing to compile Black with @mypyc/mypyc. If I understand correctly, @msullivan is interested in making it happen. FTR, if this will require changes to the Black codebase, I'm open to that as long as they are not sweeping or overly disruptive.

@msullivan
Copy link
Contributor

I can start sending up PRs to support mypyc in the next week building on work done by my intern from last summer @SanjitKal

The most disruptive part of doing this is that blib2to3 needs to have its type annotations merged into it so it can be typechecked and compiled. If we are doing that, do we want to blacken the blib2to3 source?

@msullivan
Copy link
Contributor

I have a draft branch that makes black able to compile and run with mypyc: https://github.com/msullivan/black/tree/mypyc

(It requires python/mypy#7481 to land pickling support in mypyc before it works right)

@JelleZijlstra
Copy link
Collaborator

Thanks so much for working on this @msullivan.

I say we should blacken blib2to3 too if we are going to add annotations to it anyway. @zsol @ambv do you agree?

@ichard26 ichard26 added T: enhancement New feature or request C: performance Black is too slow. Or too fast. and removed T: user support OP looking for assistance or answers to a question labels Apr 28, 2021
@JelleZijlstra JelleZijlstra removed the T: style What do we want Blackened code to look like? label May 30, 2021
@ichard26 ichard26 added the S: accepted The changes in this design / enhancement issue have been accepted and can be implemented label Jun 4, 2021
@ichard26
Copy link
Collaborator

ichard26 commented Jun 5, 2021

Hi everyone,

I have been working on finishing mypyc support since the original work done in 2019. While I'm not confident enough to submit a work (there's probably more edge cases to fix + even more optimizations to be done), I have been able to get a fully (EDIT: I realized that I actually forgot to mark a few files in src/blib2to3/pgen2/ as mypyc targets 🤦 ) compiled (i.e. everything under src/black/ and src/blib2to3 except for src/black/__main__.py) Black working. These are the formatting time results of a clang compiled under optimization level 3 Black vs a normal interpreted Black (both are from a branch with compatibility changes based off main) with --experimental-string-processing.

+--------------------------------+-------------+-----------------------+
| Benchmark                      | interpreted | compiled              |
+================================+=============+=======================+
| src/black/brackets.py          | 264 ms      | 125 ms: 2.11x faster  |
+--------------------------------+-------------+-----------------------+
| src/black/cache.py             | 54.2 ms     | 29.3 ms: 1.85x faster |
+--------------------------------+-------------+-----------------------+
| src/black/comments.py          | 180 ms      | 98.9 ms: 1.82x faster |
+--------------------------------+-------------+-----------------------+
| src/black/concurrency.py       | 28.4 ms     | 16.7 ms: 1.71x faster |
+--------------------------------+-------------+-----------------------+
| src/black/const.py             | 4.59 ms     | 2.34 ms: 1.96x faster |
+--------------------------------+-------------+-----------------------+
| src/black/debug.py             | 41.5 ms     | 22.9 ms: 1.81x faster |
+--------------------------------+-------------+-----------------------+
| src/black/files.py             | 144 ms      | 84.5 ms: 1.70x faster |
+--------------------------------+-------------+-----------------------+
| src/black/linegen.py           | 904 ms      | 417 ms: 2.17x faster  |
+--------------------------------+-------------+-----------------------+
| src/black/lines.py             | 526 ms      | 275 ms: 1.91x faster  |
+--------------------------------+-------------+-----------------------+
| src/black/mode.py              | 96.9 ms     | 44.1 ms: 2.20x faster |
+--------------------------------+-------------+-----------------------+
| src/black/nodes.py             | 665 ms      | 312 ms: 2.13x faster  |
+--------------------------------+-------------+-----------------------+
| src/black/numerics.py          | 44.3 ms     | 25.3 ms: 1.75x faster |
+--------------------------------+-------------+-----------------------+
| src/black/output.py            | 75.1 ms     | 45.0 ms: 1.67x faster |
+--------------------------------+-------------+-----------------------+
| src/black/parsing.py           | 147 ms      | 85.0 ms: 1.72x faster |
+--------------------------------+-------------+-----------------------+
| src/black/report.py            | 66.9 ms     | 40.2 ms: 1.66x faster |
+--------------------------------+-------------+-----------------------+
| src/black/rusty.py             | 15.7 ms     | 9.43 ms: 1.66x faster |
+--------------------------------+-------------+-----------------------+
| src/black/strings.py           | 138 ms      | 85.9 ms: 1.61x faster |
+--------------------------------+-------------+-----------------------+
| src/black/trans.py             | 1.05 sec    | 580 ms: 1.80x faster  |
+--------------------------------+-------------+-----------------------+
| src/black/__init__.py          | 810 ms      | 461 ms: 1.76x faster  |
+--------------------------------+-------------+-----------------------+
| src/blackd/__init__.py         | 184 ms      | 98.2 ms: 1.88x faster |
+--------------------------------+-------------+-----------------------+
| src/black_primer/cli.py        | 104 ms      | 52.7 ms: 1.98x faster |
+--------------------------------+-------------+-----------------------+
| src/black_primer/lib.py        | 269 ms      | 157 ms: 1.71x faster  |
+--------------------------------+-------------+-----------------------+
| src/blib2to3/pgen2/conv.py     | 224 ms      | 126 ms: 1.78x faster  |
+--------------------------------+-------------+-----------------------+
| src/blib2to3/pgen2/driver.py   | 213 ms      | 111 ms: 1.92x faster  |
+--------------------------------+-------------+-----------------------+
| src/blib2to3/pgen2/grammar.py  | 93.7 ms     | 52.6 ms: 1.78x faster |
+--------------------------------+-------------+-----------------------+
| src/blib2to3/pgen2/literals.py | 55.5 ms     | 32.8 ms: 1.69x faster |
+--------------------------------+-------------+-----------------------+
| src/blib2to3/pgen2/parse.py    | 141 ms      | 82.8 ms: 1.70x faster |
+--------------------------------+-------------+-----------------------+
| src/blib2to3/pgen2/pgen.py     | 391 ms      | 221 ms: 1.77x faster  |
+--------------------------------+-------------+-----------------------+
| src/blib2to3/pgen2/token.py    | 71.6 ms     | 41.2 ms: 1.74x faster |
+--------------------------------+-------------+-----------------------+
| src/blib2to3/pgen2/tokenize.py | 468 ms      | 266 ms: 1.76x faster  |
+--------------------------------+-------------+-----------------------+
| src/blib2to3/pygram.py         | 106 ms      | 54.7 ms: 1.93x faster |
+--------------------------------+-------------+-----------------------+
| src/blib2to3/pytree.py         | 630 ms      | 350 ms: 1.80x faster  |
+--------------------------------+-------------+-----------------------+
| Geometric mean                 | (ref)       | 1.82x faster          |
+--------------------------------+-------------+-----------------------+

Note that these benchmark results aren't perfect, I haven't done the usual system tweaks to improve the reliability (note that I plan to produce more reliable results once I am ready to submit a PR). Also, all of the files were already well formatted so no safety checks were run (I expect those to worsen the performance improvement). But for some credibility, I did use pyperf in the following script:

benchmark.py (note: very messy!)
import os
import subprocess
import sys
import time
from functools import partial
from pathlib import Path

import pyperf

import black

THIS_DIR = Path(__file__).parent
_TARGETS = [
    "src/black/brackets.py",
    "src/black/cache.py",
    "src/black/comments.py",
    "src/black/concurrency.py",
    "src/black/const.py",
    "src/black/debug.py",
    "src/black/files.py",
    "src/black/linegen.py",
    "src/black/lines.py",
    "src/black/mode.py",
    "src/black/nodes.py",
    "src/black/numerics.py",
    "src/black/output.py",
    "src/black/parsing.py",
    "src/black/report.py",
    "src/black/rusty.py",
    "src/black/strings.py",
    "src/black/trans.py",
    "src/black/__init__.py",
    "src/blackd/__init__.py",
    "src/black_primer/cli.py",
    "src/black_primer/lib.py",
    "src/blib2to3/pgen2/conv.py",
    "src/blib2to3/pgen2/driver.py",
    "src/blib2to3/pgen2/grammar.py",
    "src/blib2to3/pgen2/literals.py",
    "src/blib2to3/pgen2/parse.py",
    "src/blib2to3/pgen2/pgen.py",
    "src/blib2to3/pgen2/token.py",
    "src/blib2to3/pgen2/tokenize.py",
    "src/blib2to3/pygram.py",
    "src/blib2to3/pytree.py",
]
TARGETS = [THIS_DIR / Path(path) for path in _TARGETS]
if dump_path := os.getenv("BM_DUMP_TO", False):
    dump_path = Path(dump_path)
    print(f"Will dump results to `{dump_path!s}`.")
else:
    print("Need a filepath to dump results to (use `BM_DUMP_TO` env var).")
    sys.exit(1)

SCRIPT = """\
from pathlib import Path
from functools import partial

import pyperf

import black

TARGET = Path("{target}")

runner = pyperf.Runner()
with open(TARGET, "r", encoding="utf8") as f:
    code = f.read()


def format_func(code):
    try:
        black.format_file_contents(code, fast={fast}, mode={mode})
    except black.NothingChanged:
        pass


runner.bench_func("{name}", format_func, code)
"""

runner = pyperf.Runner()
results = []
try:
    tmp_dir = THIS_DIR / "benchmark-workdir"
    for i, t in enumerate(TARGETS, start=1):
        print(f"Benchmarking `{t!s}` ({i}/{len(TARGETS)})")
        bench_script = SCRIPT.format(
            target=str(t.resolve()), name=str(t), fast="False", mode="black.Mode(experimental_string_processing=True)"
        )
        script = tmp_dir / f"{i}.py"
        with open(script, "w", encoding="utf8") as f:
            f.write(bench_script)
        tmp_file = tmp_dir / f"{i}.json"
        t0 = time.perf_counter()
        subprocess.run([sys.executable, str(script), "--output", str(tmp_file)], check=True)
        t1 = time.perf_counter()
        print(f"Took {round(t1 - t0, 3)} seconds.")
        with open(tmp_file, "r", encoding="utf8") as f:
            result = pyperf.Benchmark.loads(f.read())
        results.append(result)
except Exception:
    raise

suite = pyperf.BenchmarkSuite(results)
suite.dump(str(dump_path), replace=True)

@ichard26 ichard26 self-assigned this Jun 6, 2021
@ichard26
Copy link
Collaborator

ichard26 commented Jul 10, 2021

Update:

After some more compatibility work, here's the new (this time gathered on under a tuned stable system) numbers:

  • Formatting with safety checks: 1.74x faster
  • Formatting without safety checks: 1.90x faster
  • Importing black: 1.14x faster (this one is a very rough figure)
  • blib2to3 parsing: 1.72x faster

**note that these numbers are from a wheel built locally, no clue on what the cibuildwheel wheels' numbers are like

TO-DOs:

  • More performance tuning (duh!)
  • More performance testing - unfortunately it appears the wheels built via the automated workflow I've setup are slower (probably the price of building highly compatible wheels), but I don't know how bad it is right now. Also, I've done zero performance evaluation on Windows or MacOS so yeah ... lots of unknowns! (this work is blocked on me contining work on blackbench, a benchmarking suite for black)
  • Extended stability and compatibility testing (especially on Windows and MacOS) - while cibuildwheel makes it stupid easy to run tests while building wheels, I've only configured the test suite to run (as a compromise between compatibility checking and build speed). Running mypyc compiled Black with black-primer is in particular a priority.
  • Other UX and tooling items

Finally, since this project ended up more involved than I had originally expected, I'm tracking and managing my work in a separate repository (it originally contained the automated wheel build workflow) now: https://github.com/ichard26/black-mypyc-wheels. Head over there to see the status on this work, item by item.

Thanks for the patience!

@ichard26
Copy link
Collaborator

ichard26 commented Aug 10, 2021

Mise à jour deux!

Alors ... yeah no I am not writing the rest of this in French. Anyway, there's been some more progress on this.

  • I've finished up three rounds of optimizations for both src/blib2to3 and src/black. Formatting without safety checks is now ~2.14x faster (that's 13% faster compared to my previous mypyc numbers!). Very exciting and to say the least I'm incredibly happy!
  • Intended to be used to make sure interpreted Black and mypyc Black behave the same I wrote diff-shades. TBH it ended up being my take on a better black-primer (or mypy-primer) so yeah, Primer: Rethink expected changes logic + configuration #2194 (comment) might end up resolved as part of this effort :p
  • I've integrated the latest changes from main, in particular the Jupyter notebook support commit, and fixed the new crashes (mypyc is still alpha sadly)
  • And some other stuff including: improvements to blackbench, better system tuning for benchmarks, collecting and logging issues to later report to the mypyc issue tracker, etc.

TO-DOs:

  • So I was able to finally narrow down the code causing GCC on Linux to get very upset with mypyc's C code which is great! ... but now I need to compare GCC vs Clang for performance, binary size, and other infra considerations. gcc-10 failed at beating clang in almost every factor ... I'm sticking with clang 🙂
  • Get back to work on blackbench and finish up an alpha in preparation for the final set of benchmark numbers
  • Benchmark a lot .. like truly a ton and collect final numbers. Also, run diff-shades and make sure behaviour is the same across situations.
  • While that's happening, pester Jelle or Cooper so they can do some testing on MacOS (and especially on M1)
  • Put up the PR from mypyc-support-pt2

TL;DR: black is now even faster, there's new tooling available, and I'm closer to (finally) putting up my PR. After that, time for community testing :D

@ichard26
Copy link
Collaborator

Alright I'm kicking off community testing for the mypyc wheels 🎉 To get started please see this issue: ichard26/black-mypyc-wheels#12. Please redirect any comment on this effort on that issue tracker and not here. Thank you in advance!

@ichard26 ichard26 added this to the Release 22.0 milestone Dec 12, 2021
@ichard26 ichard26 added this to To Do (important) in Stable release via automation Jan 1, 2022
@ichard26 ichard26 removed this from the Release 22.0 milestone Jan 1, 2022
@JelleZijlstra JelleZijlstra moved this from To Do (important) to To Do (nice to have) in Stable release Jan 10, 2022
@ichard26 ichard26 removed their assignment Aug 29, 2022
@JelleZijlstra
Copy link
Collaborator

We now have mypyc wheels and I don't think there's anything actionable in this issue. New issues can be created for any specific performance improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: performance Black is too slow. Or too fast. S: accepted The changes in this design / enhancement issue have been accepted and can be implemented T: enhancement New feature or request
Projects
No open projects
Stable release
  
To Do (nice to have)
Development

No branches or pull requests

6 participants