string_processing: Performance regression #2314

JelleZijlstra · 2021-06-07T15:57:37Z

% time black --diff ~/some/file.py
All done! ✨ 🍰 ✨
1 file would be left unchanged.
black --diff ~/some/file.py  17.78s user 0.18s system 99% cpu 18.067 total
% time black --experimental-string-processing --diff ~/some/file.py
< many changes snipped >
would reformat /Users/jelle/some/file.py
All done! ✨ 🍰 ✨
1 file would be reformatted.
black --experimental-string-processing --diff   389.93s user 3.25s system 99% cpu 6:35.03 total

I can't share the file since it's private code, but it's 14000 lines of a single list display.

The text was updated successfully, but these errors were encountered:

JelleZijlstra · 2021-06-07T16:15:41Z

Assigning to myself to do profiling to figure out the cause

JelleZijlstra · 2021-06-11T01:07:25Z

Here's some profiling output on the latest release:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000 1161.996 1161.996 /home/jelle/ans/venv/lib/python3.6/site-packages/black/__init__.py:720(format_file_in_place)

        1    0.034    0.034 1161.996 1161.996 /home/jelle/ans/venv/lib/python3.6/site-packages/black/__init__.py:820(format_file_contents)

        2    0.095    0.047 1150.835  575.418 /home/jelle/ans/venv/lib/python3.6/site-packages/black/__init__.py:850(format_str)

1577774/28514    4.211    0.000 1092.089    0.038 /home/jelle/ans/venv/lib/python3.6/site-packages/black/linegen.py:296(transform_line)

105815/46    0.483    0.000 1091.390   23.726 /home/jelle/ans/venv/lib/python3.6/site-packages/black/linegen.py:947(run_transformer)

37028/123    0.998    0.000 1070.323    8.702 {method 'extend' of 'list' objects}

    88012    0.519    0.000  958.530    0.011 /home/jelle/ans/venv/lib/python3.6/site-packages/black/trans.py:126(__call__)

  2761738   48.296    0.000  942.670    0.000 /home/jelle/ans/venv/lib/python3.6/site-packages/black/lines.py:535(append_leaves)

 10635526   92.206    0.000  737.378    0.000 /home/jelle/ans/venv/lib/python3.6/site-packages/black/lines.py:48(append)

      244    0.142    0.001  624.820    2.561 /home/jelle/ans/venv/lib/python3.6/site-packages/black/trans.py:683(do_transform)

       92    0.001    0.000  321.532    3.495 /home/jelle/ans/venv/lib/python3.6/site-packages/black/trans.py:284(do_transform)

       46    3.479    0.076  311.581    6.773 /home/jelle/ans/venv/lib/python3.6/site-packages/black/trans.py:347(_merge_string_group)

  8858176   32.683    0.000  246.932    0.000 /home/jelle/ans/venv/lib/python3.6/site-packages/black/nodes.py:173(whitespace)

 10608999    9.382    0.000  193.363    0.000 /home/jelle/ans/venv/lib/python3.6/site-packages/black/lines.py:252(has_magic_trailing_comma)

 11556672   26.623    0.000  188.826    0.000 /home/jelle/ans/venv/lib/python3.6/site-packages/blib2to3/pytree.py:205(prev_sibling)

   128104  183.214    0.001  183.214    0.001 /home/jelle/ans/venv/lib/python3.6/site-packages/black/nodes.py:584(is_one_tuple_between)

  7546763   90.481    0.000  156.591    0.000 /home/jelle/ans/venv/lib/python3.6/site-packages/blib2to3/pytree.py:368(update_sibling_maps)

  8668937   22.493    0.000  134.083    0.000 /home/jelle/ans/venv/lib/python3.6/site-packages/black/nodes.py:482(replace_child)

  5376073   14.792    0.000  132.370    0.000 /home/jelle/ans/venv/lib/python3.6/site-packages/black/nodes.py:406(preceding_leaf)

 10608999   52.253    0.000   87.302    0.000 /home/jelle/ans/venv/lib/python3.6/site-packages/black/brackets.py:68(mark)

140427171   81.462    0.000   81.462    0.000 {built-in method builtins.id}

The problem is mostly that append_leaves() takes forever, and we repeatedly do that, probably once for every string in the logical line. My file is a 14000-line logical line with a lot of strings in it, so that adds up.

Perhaps this can be improved by adding more preformatted=True to append_leaves() calls, but I tried in a few places and got test failures.

ichard26 · 2021-06-11T03:43:37Z

Because I'm bored, here's a test case generator for this (don't @ me if this turns out to be garbage :p):

import random

lines = []

lines.append("a = [")

for _ in range(300):
    size = random.randint(30, 120)
    string = "".join([chr(random.randint(65, 122)) if random.random() < 0.8 else " " for _ in range(size)])
    clean = string
    for no in r'#"\{}':
        clean = clean.replace(no, "")
    comma = "," if random.random() > 0.5 else ""
    lines.append(f'    "{clean}"{comma}')

lines.append("]")

with open("out.py", "w", encoding="utf8") as f:
    f.write("\n".join(lines))

JelleZijlstra · 2022-12-23T20:15:46Z

#3467 makes this better but I'm still seeing about a 4.5x perf regression (though without mypyc) on an internal file:

$ git checkout .; rm -rf ~/.cache/black/
Updated 57 paths from the index
$ time black --preview path/to/file.py
reformatted path/to/file.py

All done! ✨ 🍰 ✨
1 file reformatted.

real    1m33.226s
user    1m32.556s
sys     0m0.468s
$ git checkout .; rm -rf ~/.cache/black/
Updated 1 path from the index
$ time black path/to/file.py
All done! ✨ 🍰 ✨
1 file left unchanged.

real    0m19.135s
user    0m18.740s
sys     0m0.176s

JelleZijlstra · 2022-12-23T20:31:32Z

With mypyc-compiled on an M1 Mac it's much better after the initial reformat:

% time python -m black path/to/file.py
All done! ✨ 🍰 ✨
1 file left unchanged.
python -m black path/to/file.py  2.05s user 0.05s system 95% cpu 2.190 total
% time python -m black --preview path/to/file.py
reformatted path/to/file.py

All done! ✨ 🍰 ✨
1 file reformatted.
python -m black --preview path/to/file.py  10.23s user 0.15s system 98% cpu 10.489 total
% rm -rf ~/Library/Caches/black                                               
% time python -m black --preview path/to/file.py
All done! ✨ 🍰 ✨
1 file left unchanged.
python -m black --preview path/to/file.py  3.09s user 0.07s system 95% cpu 3.297 total
% rm -rf ~/Library/Caches/black                                               
% time python -m black --preview path/to/file.py
All done! ✨ 🍰 ✨
1 file left unchanged.
python -m black --preview path/to/file.py  2.99s user 0.07s system 95% cpu 3.206 total
% python -m black --version
python -m black, 22.12.1.dev24+g3feff21 (compiled: yes)
Python (CPython) 3.11.1

So from 2 s to 3 s, still a significant slowdown but not as concerning.

JelleZijlstra added C: performance Black is too slow. Or too fast. F: strings Related to our handling of strings labels Jun 7, 2021

JelleZijlstra self-assigned this Jun 7, 2021

JelleZijlstra mentioned this issue Jun 9, 2021

Support named escapes (\N{...}) in string processing #2319

Merged

JelleZijlstra removed their assignment Jun 18, 2021

JelleZijlstra mentioned this issue Dec 9, 2022

Setting the 2023 stable style #3407

Closed

JelleZijlstra added the C: preview style Issues with the preview and unstable style. Add the name of the responsible feature in the title. label Dec 20, 2022

yilei mentioned this issue Dec 20, 2022

Significantly speedup ESP on large expressions that contain many strings #3467

Merged

3 tasks

hauntsaninja mentioned this issue Nov 13, 2023

Setting the 2024 stable style #4042

Closed

JelleZijlstra changed the title ~~Performance regression with experimental string processing~~ string_processing: Performance regression Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

string_processing: Performance regression #2314

string_processing: Performance regression #2314

JelleZijlstra commented Jun 7, 2021

JelleZijlstra commented Jun 7, 2021

JelleZijlstra commented Jun 11, 2021

ichard26 commented Jun 11, 2021 •

edited

Loading

JelleZijlstra commented Dec 23, 2022

JelleZijlstra commented Dec 23, 2022

string_processing: Performance regression #2314

string_processing: Performance regression #2314

Comments

JelleZijlstra commented Jun 7, 2021

JelleZijlstra commented Jun 7, 2021

JelleZijlstra commented Jun 11, 2021

ichard26 commented Jun 11, 2021 • edited Loading

JelleZijlstra commented Dec 23, 2022

JelleZijlstra commented Dec 23, 2022

ichard26 commented Jun 11, 2021 •

edited

Loading