Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-31970: Reduce performance overhead of asyncio debug mode. #4314

Merged
merged 4 commits into from
Nov 7, 2017

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Nov 7, 2017

@1st1
Copy link
Member

1st1 commented Nov 7, 2017

So how much faster the new code is?

Never mind, I saw the explanation in the issue.

# This is heuristically the max number of entries we're gonna pop + 1
# (we're only interested in displaying the top of stack)
# If this is too small, tests will fail.
limit = 4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we bump this to 10? It's a debug mode after all, and seeing only 2-3 lines in traceback is sometimes not enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The traceback is never displayed. It's only used for Future.__repr__, which prints the line at which the future was created (i.e. the top-of-stack).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC Task.__del__ and CoroWrapper pass it to the loop's logger to print some debug info. In which case the full traceback would be printed to the user, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes, you're right. I agree with bumping to 10 then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add it as a constant to asyncio/constants.py? Also, we need to update loop.default_exception_handler() method to say that we only display 10 topmost frames.

Copy link
Member

@1st1 1st1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd only bump 4 to 10.

@pitrou
Copy link
Member Author

pitrou commented Nov 7, 2017

So how much faster the new code is?

Depends on your usage :-) With a small stack it is 2x faster. With a 50-deep stack it is 10x faster.
(note I'm testing with a debug Python, but the effect should be similar on a release one)

Here is a snippet:
https://gist.github.com/pitrou/296447d747f043cc62fd91a6c114983e
You can time it using ./python -m timeit -s "from aiodebugperf import func" "func()".
(you can also cProfile it to see where time is spent)

@asvetlov
Copy link
Contributor

asvetlov commented Nov 7, 2017

+1 for 10. Or maybe at least 7.
I feel that 4 is too small value.

Copy link
Member

@1st1 1st1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add it as a constant to asyncio/constants.py? Also, we need to update loop.default_exception_handler() method to say that we only display 10 topmost frames.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@1st1
Copy link
Member

1st1 commented Nov 7, 2017

@Mariatta I think I just saw a tiny bug in bedevere-bot:

  1. core-dev A requests changes; bedevere-bot assigns an awaiting changes label.
  2. core-dev B approves the PR; bedevere-bot removes the awaiting changes label.

The label should only be removed when all reviewers clear the PR.

@pitrou
Copy link
Member Author

pitrou commented Nov 7, 2017

@1st1 are you ok with backporting to 3.6 as well?

@1st1
Copy link
Member

1st1 commented Nov 7, 2017

@1st1 are you ok with backporting to 3.6 as well?

Yeah, I think it's fine.

@@ -1981,7 +1981,7 @@ def coro_noop():

regex = (r'^<CoroWrapper %s\(?\)? .* at %s:%s, .*> '
r'was never yielded from\n'
r'Coroutine object created at \(most recent call last\):\n'
r'Coroutine object created at \(most recent call last, maybe truncated\):\n'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace "maybe truncated" to "truncated to {} last lines"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"truncated to {DEBUG_STACK_DEPTH} frames" if I understood correctly

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, getting the previous behaviour is as simple as "asyncio.constants.DEBUG_STACK_DEPTH = None". Right?

Maybe mention this constant in Doc/library/asyncio-dev.rst to explain how to not get truncated tracebacks.

@@ -1981,7 +1981,7 @@ def coro_noop():

regex = (r'^<CoroWrapper %s\(?\)? .* at %s:%s, .*> '
r'was never yielded from\n'
r'Coroutine object created at \(most recent call last\):\n'
r'Coroutine object created at \(most recent call last, maybe truncated\):\n'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"truncated to {DEBUG_STACK_DEPTH} frames" if I understood correctly


# Number of stack entries to capture in debug mode.
# The large the number, the slower the operation in debug mode
# (see extract_stack() in events.py)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may document that this value is passed to the limit parameter of traceback.extract_stack(), or maybe just document that None means "no limit".

@vstinner
Copy link
Member

vstinner commented Nov 7, 2017

If you document the constant, I'm ok to merge this change.

@pitrou
Copy link
Member Author

pitrou commented Nov 7, 2017

I'm not sure the constant should be documented. Other constants are not. They don't seem meant to be changed by users.

@1st1
Copy link
Member

1st1 commented Nov 7, 2017

If you document the constant, I'm ok to merge this change.

I don't think we need to document every constant we have any asyncio. Those who debug super problematic cases can always go and look at the implementation and bump the number to 10000 themselves. For an average asyncio user all these little details about debug mode and its constants in the documentation is just noise.

@vstinner
Copy link
Member

vstinner commented Nov 7, 2017

I'm not sure the constant should be documented. Other constants are not. They don't seem meant to be changed by users.

I'm not proposing to document all constants. I asked you to document the constant since you change the behaviour of the debug mode of asyncio. It can be surprising to upgrade Python and suddenly tracebacks are truncated: how are you supposed to deal with this? Dig into asyncio source code to understand?

@pitrou
Copy link
Member Author

pitrou commented Nov 7, 2017

It can be surprising to upgrade Python and suddenly tracebacks are truncated: how are you supposed to deal with this?

Read the error message? It clearly says the traceback is truncated :-)
Do you mean it's sometimes necessary to have the full traceback?

@asvetlov
Copy link
Contributor

asvetlov commented Nov 7, 2017

-1 for constant documenting

@vstinner
Copy link
Member

vstinner commented Nov 7, 2017

Do you mean it's sometimes necessary to have the full traceback?

I think so. Try maybe yourself by truncating to 1 frame: are you always able to find the bug? :-)

Anyway, in case of doubt, I prefer to document the option.

Note: I'm planning to make faulthandler configurable to allow to display more than 100 frames, 100 threads and strings longer than 500 characters... But nobody complained. Maybe 100 frames is enough for everyone, or people who hit such limitations are able to modify CPython and recompile it :-)

Well, 100 frames is much higher than 10 frames :-)

I was going to say that my tracemalloc module was also limited to 100 frames, but I see that I removed this limitation to use a dynamically allocated array :-)

@vstinner
Copy link
Member

vstinner commented Nov 7, 2017

If @1st1 and @asvetlov are against documenting the constant, ignore my request ;-)

@pitrou
Copy link
Member Author

pitrou commented Nov 7, 2017

Thank you @Haypo :-)

@pitrou pitrou merged commit 921e943 into python:master Nov 7, 2017
@miss-islington
Copy link
Contributor

Thanks @pitrou for the PR 🌮🎉.. I'm working now to backport this PR to: 3.6.
🐍🍒⛏🤖

@pitrou pitrou deleted the aio_debug_perf branch November 7, 2017 16:23
@miss-islington
Copy link
Contributor

Sorry, @pitrou, I could not cleanly backport this to 3.6 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 921e9432a1461bbf312c9c6dcc2b916be6c05fa0 3.6

@bedevere-bot
Copy link

GH-4322 is a backport of this pull request to the 3.6 branch.

pitrou added a commit to pitrou/cpython that referenced this pull request Nov 7, 2017
…ythonGH-4314)

* bpo-31970: Reduce performance overhead of asyncio debug mode..
(cherry picked from commit 921e943)
pitrou added a commit that referenced this pull request Nov 7, 2017
…H-4314) (#4322)

* bpo-31970: Reduce performance overhead of asyncio debug mode..
(cherry picked from commit 921e943)
embray pushed a commit to embray/cpython that referenced this pull request Nov 9, 2017
…#4314)

* bpo-31970: Reduce performance overhead of asyncio debug mode.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants