Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys.settrace dramatic slowdown in 3.12 #107674

Open
2 tasks done
nedbat opened this issue Aug 5, 2023 · 21 comments
Open
2 tasks done

sys.settrace dramatic slowdown in 3.12 #107674

nedbat opened this issue Aug 5, 2023 · 21 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage type-bug An unexpected behavior, bug, or error

Comments

@nedbat
Copy link
Member

nedbat commented Aug 5, 2023

Bug report

Checklist

  • I am confident this is a bug in CPython, not a bug in a third-party project
  • I have searched the CPython issue tracker, and am confident this bug has not been reported before

A clear and concise description of the bug

A bug report in coverage (nedbat/coveragepy#1665) is reduced here to a pure do-nothing trace function for sys.settrace.

Reproduction:

% git clone https://github.com/nedbat/ndindex

% cd ndindex

% python3.11 -VV
Python 3.11.4 (main, Jun  7 2023, 08:42:37) [Clang 14.0.3 (clang-1403.0.22.14.1)]

% python3.11 -m venv v311

% ./v311/bin/pip install numpy

% python3.12 -VV
Python 3.12.0b4 (main, Jul 11 2023, 20:38:26) [Clang 14.0.3 (clang-1403.0.22.14.1)]

% python3.12 -m venv v312

% ./v312/bin/pip install --pre --extra-index https://pypi.anaconda.org/scientific-python-nightly-wheels/simple numpy

% # Run the code without the trace function.
% for x in 1 2 3; time ./v311/bin/python -m ndindex.tests.justdoit notrace
./v311/bin/python -m ndindex.tests.justdoit notrace  7.05s user 0.37s system 108% cpu 6.813 total
./v311/bin/python -m ndindex.tests.justdoit notrace  7.04s user 0.34s system 109% cpu 6.724 total
./v311/bin/python -m ndindex.tests.justdoit notrace  7.00s user 0.35s system 109% cpu 6.696 total

% # 3.12 is slightly faster without the trace function.
% for x in 1 2 3; time ./v312/bin/python -m ndindex.tests.justdoit notrace
./v312/bin/python -m ndindex.tests.justdoit notrace  6.56s user 0.30s system 106% cpu 6.422 total
./v312/bin/python -m ndindex.tests.justdoit notrace  6.48s user 0.31s system 106% cpu 6.359 total
./v312/bin/python -m ndindex.tests.justdoit notrace  6.38s user 0.28s system 107% cpu 6.217 total

% # 3.11 with tracing is about 3x slower.
% for x in 1 2 3; time ./v311/bin/python -m ndindex.tests.justdoit trace
./v311/bin/python -m ndindex.tests.justdoit trace  22.64s user 0.51s system 101% cpu 22.772 total
./v311/bin/python -m ndindex.tests.justdoit trace  21.92s user 0.46s system 101% cpu 21.979 total
./v311/bin/python -m ndindex.tests.justdoit trace  21.55s user 0.35s system 102% cpu 21.379 total

% # 3.12 with tracing is 7x slower.
% for x in 1 2 3; time ./v312/bin/python -m ndindex.tests.justdoit trace
./v312/bin/python -m ndindex.tests.justdoit trace  49.47s user 0.40s system 100% cpu 49.676 total
./v312/bin/python -m ndindex.tests.justdoit trace  49.53s user 0.39s system 100% cpu 49.784 total
./v312/bin/python -m ndindex.tests.justdoit trace  50.44s user 0.38s system 100% cpu 50.739 total

I don't know if the unusual numpy build has something to do with this.

Linked PRs

@nedbat nedbat added the type-bug An unexpected behavior, bug, or error label Aug 5, 2023
@neonene
Copy link
Contributor

neonene commented Aug 6, 2023

This issue can be seen starting from the commit 411b169 with usual numpy-1.25.2.

@AlexWaygood
Copy link
Member

cc. @markshannon

@nedbat
Copy link
Member Author

nedbat commented Oct 24, 2023

@markshannon Is there any news about this?

@nedbat
Copy link
Member Author

nedbat commented Oct 30, 2023

@iritkatriel perhaps you have some idea?

@iritkatriel
Copy link
Member

@iritkatriel perhaps you have some idea?

Did #107780 help?

@gaogaotiantian
Copy link
Member

Hi @nedbat , I circled back to this issue. If you are still interested, could you try the patch of #114986 and see if the result is improved?

@nedbat
Copy link
Member Author

nedbat commented Feb 15, 2024

@gaogaotiantian Thanks. I tried #114986, and it didn't run faster. I used the https://github.com/Quansight-Labs/ndindex repo test suite:
3.11: 160s
3.12: 255s
3.13 main: 248s
pr114986: 245s

I wish I had better news.

@gaogaotiantian
Copy link
Member

Oh, did you test it with an empty trace function?

@nedbat
Copy link
Member Author

nedbat commented Feb 15, 2024

I will do more tests, but the ultimate goal is to perform better in real-world scenarios. Here is more data from a subset of the test suite (-k test_chunking), which looks much more promising. It's hard to get consistent numbers, but this looks good. I would merge this change.

Python version no tracing Python trace C trace
3.11.8 1.19s 7.39s 2.46s
3.12.2 1.12s 11.57s 3.71s
3.13.0a3+ heads/main:a95b1a56bb 1.14s 10.37s 3.89s
3.13.0a3+ heads/pr/114986:cb09c55758 1.14s 6.48s 2.91s

@gaogaotiantian
Copy link
Member

Thank you for the test. Yes the PR only improves the performance of the actual trace function - which is normally the major cost for tracing tools. Good to know it helps with the more real-life benchmark. Unfortunately, due to the change of the tracing mechanism, the overhead for sys.settrace will always be larger than the old way - I did not see a quick solution to that.

@markshannon
Copy link
Member

Now that the eval_breaker is part of the thread state, here's an alternative approach that might work.

tstate->tracing only needs one bit, so use the low bit in the version part of the eval breaker.
For non-tracing functions we expect to always see the low bit set to 0.
For tracing functions we expect to always see the low bit set to 1.

If the code and global versions differ do the following:

  • If tracing bit is 0, update the version number (setting the low bit to 0), and instrument.
  • If the tracing bit is 1:
    • If the version number matches except the low bit: do nothing
    • Otherwise: update the version number (setting the low bit to 1), and clear any instrumentation.

I think this is correct, but I've not drawn up a full state diagram.

@encukou
Copy link
Member

encukou commented Feb 29, 2024

#114986 caused refleaks tests (./python -m test -R 3:3 test_frame) to fail.
If not fixed in 24 hours, I'll revert the commit to unblock buildbots. (I hope I'll get have time to investigate & come up with a proper fix, but can't promise that.)

@gaogaotiantian
Copy link
Member

@encukou See response in #116098

gaogaotiantian added a commit to gaogaotiantian/cpython that referenced this issue Mar 1, 2024
encukou pushed a commit that referenced this issue Mar 1, 2024
…H-114986)" (GH-116178)

Revert "gh-107674: Improve performance of `sys.settrace` (GH-114986)"

This reverts commit 0a61e23.
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this issue Mar 4, 2024
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this issue Mar 4, 2024
….settrace` (pythonGH-114986)" (pythonGH-116178)

Revert "pythongh-107674: Improve performance of `sys.settrace` (pythonGH-114986)"

This reverts commit 0a61e23.
gaogaotiantian added a commit to gaogaotiantian/cpython that referenced this issue Mar 21, 2024
adorilson pushed a commit to adorilson/cpython that referenced this issue Mar 25, 2024
adorilson pushed a commit to adorilson/cpython that referenced this issue Mar 25, 2024
….settrace` (pythonGH-114986)" (pythonGH-116178)

Revert "pythongh-107674: Improve performance of `sys.settrace` (pythonGH-114986)"

This reverts commit 0a61e23.
@rtb-zla-karma
Copy link

rtb-zla-karma commented Apr 3, 2024

Hi.

I wanted to post this in nedbat/coveragepy#1665 but it seems that investigation moved here. I hope my code example will help to investigate and not clutter the Issue.

I also have been chasing "tests run 2 times slower" dragon. I used coverage through pytest and beside some test cases running slower in general, I've had a consistent +43 seconds "freeze" in collecting phase for specific import. I will paste some log excerpts and then minimal code to reproduce some of them. Look at the log times in brackets to find problematic steps.

pytest_debug.txt - this comes from example code

[00:00:00.265179]       found cached rewritten pyc for /path/test_jieba_2/test/test_fut.py [assertion]
[00:00:00.265185]       early skip of rewriting module: fut [assertion]
[00:00:00.265190]       early skip of rewriting module: jieba [assertion]
[00:00:00.265195]       early skip of rewriting module: jieba.finalseg [assertion]
[00:00:00.265200]       early skip of rewriting module: jieba._compat [assertion]
[00:00:44.966856] ============================= test session starts ==============================
[00:00:44.966888] platform linux -- Python 3.12.2, pytest-8.1.1, pluggy-1.4.0 -- /venv/path/bin/python
[00:00:44.966893] using: pytest-8.1.1
[00:00:44.966896] setuptools registered plugins:
[00:00:44.966899]   pytest-cov-4.1.0 at /venv/path/lib/python3.12/site-packages/pytest_cov/plugin.py
[00:00:44.966902] rootdir: /path/test_jieba_2
[00:00:44.966905] plugins: cov-4.1.0
[00:00:44.966908] collected 1 item
[00:00:44.966910] 
[00:00:44.978316] test/test_fut.py .                                                       [100%]      early skip of rewriting module: pkg_resources [assertion]
[00:00:44.978354]       early skip of rewriting module: jieba.finalseg.prob_start [assertion]
[00:00:44.978360]       early skip of rewriting module: jieba.finalseg.prob_trans [assertion]
[00:00:44.978363]       early skip of rewriting module: jieba.finalseg.prob_emit [assertion]
[00:00:44.978366]         pytest_pycollect_makeitem [hook]
[00:00:44.978369]             collector: <Module test_fut.py>
[00:00:44.978372]             name: @py_builtins
[00:00:44.978375]             obj: <module 'builtins' (built-in)>
[00:00:44.978378]         finish pytest_pycollect_makeitem --> None [hook]

early skip of rewriting module: jieba... lines overlap with test session starts in a weird way in this minimal example but in full project I'm working on it froze at early skip of rewriting module: jieba.finalseg.prob_emit [assertion] consistently.

This one is from the project. Let's call it python_debug.txt (command to make it is at the end). I'm pasting that because in minimal example it freezes in different place but actually jieba import is the culprit, specificly jieba_pyfast.finalseg.prob_emit module.

[00:00:09.073173] # possible namespace for /venv/path/lib/python3.12/site-packages/google
[00:00:09.073279] # possible namespace for /venv/path/lib/python3.12/site-packages/google
[00:00:09.099473] import 'pkg_resources' # <_frozen_importlib_external.SourceFileLoader object at 0x7fa3cf936d20>
[00:00:09.099509] import 'jieba_pyfast._compat' # <_frozen_importlib_external.SourceFileLoader object at 0x7fa3cf936960>
[00:00:09.100432] # /venv/path/lib/python3.12/site-packages/jieba_pyfast/finalseg/__pycache__/prob_emit.cpython-312.pyc matches /venv/path/lib/python3.12/site-packages/jieba_pyfast/finalseg/prob_emit.py
[00:00:09.103991] # code object from '/venv/path/lib/python3.12/site-packages/jieba_pyfast/finalseg/__pycache__/prob_emit.cpython-312.pyc'
[00:00:58.043337] import 'jieba_pyfast.finalseg.prob_emit' # <_frozen_importlib_external.SourceFileLoader object at 0x7fa3cf883170>
[00:00:58.043791] # /venv/path/lib/python3.12/site-packages/jieba_pyfast/finalseg/__pycache__/prob_start.cpython-312.pyc matches /venv/path/lib/python3.12/site-packages/jieba_pyfast/finalseg/prob_start.py

Files and commands to reproduce

tree .

.
├── fut.py
├── requirements.txt
└── test
    ├── __init__.py
    └── test_fut.py
# fut.py
import jieba

def a():
    pass
#
# This file is autogenerated by pip-compile with Python 3.12
# by the following command:
#
#    pip-compile --allow-unsafe --output-file=requirements.txt requirements.in
#
coverage[toml]==7.4.1
    # via pytest-cov
iniconfig==2.0.0
    # via pytest
jieba==0.42.1
    # via -r requirements.in
packaging==23.2
    # via pytest
pluggy==1.4.0
    # via pytest
pytest==8.1.1
    # via
    #   -r requirements.in
    #   pytest-cov
pytest-cov==4.1.0
    # via -r requirements.in
# test/test_fut.py
import fut

def test_stuff():
    assert True

System info:

  • Ubuntu 20.04.2 LTS
  • Python 3.12.2 (main, Mar 29 2024, 11:53:27) [GCC 9.4.0] on linux

When run trough pytest tests take less than a second. With pytest -cov=. they take ~ 44 seconds. Removing import jieba makes tests run fast again.
To get output lines above I run (ts is from moreutils Ubuntu package):

  • pytest --cov=. --debug=/dev/stdout | ts -s '[%H:%M:%.S]' | tee pytest_debug.txt
  • python -v -m pytest --cov=. 2>&1 | ts -s '[%H:%M:%.S]' | tee python_debug.txt

@gaogaotiantian
Copy link
Member

Ah, this is interesting. The problem is that the file has a long const dict that expands into 30k+ lines. Each line will execute to produce part of the dict, and trigger a callback into the trace function, which will slow down the program significantly.

This is not a bug, and almost expected - it's just the unfortunate code that makes it weird.

diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
….settrace` (pythonGH-114986)" (pythonGH-116178)

Revert "pythongh-107674: Improve performance of `sys.settrace` (pythonGH-114986)"

This reverts commit 0a61e23.
@matthew-mcallister
Copy link

matthew-mcallister commented Apr 18, 2024

I believe this performance regression is causing debugging to become practically unusable in 3.12+ when using certain dependencies. This repro takes two and a half minutes to run on a Macbook pro, due to the geocoder module importing many large dictionaries.

# Run with python -m pdb repro.py
# Type n to execute the import statement with an active breakpoint
from phonenumbers import geocoder, parse
print(geocoder.description_for_number(parse('+13215555555'), 'en'))

Perhaps the library can be updated to use the fast C-based JSON scanner for performance, but that's a significant change to make to work around a language performance regression.

@gaogaotiantian
Copy link
Member

Just to confirm, what command did you use after python -m pdb repro.py? pdb should be brought up before importing and if you use c it should not add any overhead.

@matthew-mcallister
Copy link

Good point. I typed n to step one line, which sets a breakpoint.

@gaogaotiantian
Copy link
Member

Yes, n will trigger this. That's a known issue in pdb (step and breakpoint is costly) and some changes in 3.12 made it worse, especially for python file with a huge chunk of code in a single namespace (it's fine if it spreads in multiple functions). I am investigating any solution that can make this better, but I don't think we can backport any optimizations in 3.12.

@matthew-mcallister
Copy link

Performance improvements in any future version would be great. That said, it is conceivable to me that generating very large Python files will become a library anti-pattern due to this.

@gaogaotiantian
Copy link
Member

I came up with a patch to fix the long dict issue. I'm not sure if you are interested in trying the patch in #118127. If you are, let me know the result.

markshannon pushed a commit that referenced this issue May 3, 2024
* Check tracing in RESUME_CHECK

* Only change to RESUME_CHECK if not tracing
joostvanzwieten added a commit to evalf/nutils that referenced this issue May 7, 2024
Use PEP669's sys.monitoring for faster coverage analysis on GitHub Actions, if
available. This also fixes a significant regression in coverage analysis on
Python 3.12. Related issue: python/cpython#107674 .
SonicField pushed a commit to SonicField/cpython that referenced this issue May 8, 2024
* Check tracing in RESUME_CHECK

* Only change to RESUME_CHECK if not tracing
joostvanzwieten added a commit to evalf/nutils that referenced this issue May 13, 2024
Use PEP669's sys.monitoring for faster coverage analysis on GitHub Actions, if
available. This also fixes a significant regression in coverage analysis on
Python 3.12. Related issue: python/cpython#107674 .
joostvanzwieten added a commit to evalf/nutils that referenced this issue May 14, 2024
Python 3.12 has a significant regression in coverage analysis. Related issue:
python/cpython#107674 . This patch replaces coverage
with a custom framework built using `sys.monitoring`, which is only available
as of Python 3.12.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

10 participants