Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-32591: Add native coroutine origin tracking #5250

Merged
merged 9 commits into from Jan 21, 2018

Conversation

@njsmith
Copy link
Contributor

njsmith commented Jan 20, 2018

After this PR, asyncio no longer uses sys.set_coroutine_wrapper.

There is one non-trivial semantic change: before in asycnio debug
mode, unawaited coroutine warnings were logged using
logger.error(...). Now they generate a regular warnings-module
warning.

This is best reviewed one commit at a time.

Note that there's also a fix for a very subtle bug in
BaseEventLoop.set_debug; see the 4th commit. (It's the one where the
commit message is 4x longer than the diff.)

Not done here, could be added here or in followups:

  • Support for enabling coroutine origin tracking via envvar or -X flag
  • Actually deprecating/removing sys.set_coroutine_wrapper

https://bugs.python.org/issue32591

njsmith added 6 commits Jan 20, 2018
This is fixing an old bug, that for some reason wasn't causing test
suite failures until the previous commit. The issue was:

Sometimes, for some reason, the tests are calling loop.set_debug()
from a random thread, different from the one where the event loop is
running.

When set_debug() is called and the event loop is running, it attempts
to immediately enable or disable debug stuff (previously
registering/unregistering the coroutine wrapper, now toggling the
coroutine origin tracking). However, it used to do this *in the thread
where set_debug() was called*.

Since the toggling uses a save/restore pattern, we'd end up saving the
value in one thread and then restoring it in another, while never
restoring it in the first thread, etc.

Now we always enable/disable the debugging mode inside the event loop
thread, which makes much more sense. And now the tests are passing
again.
msg_lines.append("Coroutine created at (most recent call last)\n")
msg_lines += traceback.format_list(list(extract()))
msg = "".join(msg_lines).rstrip("\n")
warn(msg, category=RuntimeWarning, stacklevel=2, source=coro)

This comment has been minimized.

Copy link
@njsmith

njsmith Jan 20, 2018

Author Contributor

Small, easy to miss point here: passing source=coro here means that if the user happens to have tracemalloc enabled, its origin traceback will be included in the warning. Kinda neat! It does mean if you have both set_coroutine_origin_tracking_depth and tracemalloc enabled you'll get two at-least-partially-redundant tracebacks in the same warning. I don't know that I care that much about this but FYI.

This comment has been minimized.

Copy link
@1st1

1st1 Jan 20, 2018

Member

I think a redundant warning is OK. tracemalloc is enabled quite rarely. I also think what you wrote in this comment should be reflected in a code comment in case someone debugs the double warnings issue :)

Copy link
Member

1st1 left a comment

Overall it looks good.

A couple more requests:

  • Let's raise a DeprecationWarning in sys.set_coroutine_wrapper. And we need to document that it has been deprecated and is scheduled to be removed in 3.8.

  • In what's new in 3.7, we need to add an entry about the deprecation of sys.set_coroutine_wrapper.

  • Minor nit: please fix C code style, mainly, add braces for single-statement "if" branches.


Returns the old value of *depth*.

This setting is thread-local.

This comment has been minimized.

Copy link
@1st1

1st1 Jan 20, 2018

Member

thread-local -> thread-specific

| | cr_origin | where coroutine was |
| | | created, if coroutine |
| | | origin tracking is enabled|
+-----------+-------------------+---------------------------+

This comment has been minimized.

Copy link
@1st1

1st1 Jan 20, 2018

Member

Can you add a link to the set_coroutine_origin_tracking_depth documentation snippet?

This comment has been minimized.

Copy link
@njsmith

njsmith Jan 21, 2018

Author Contributor

Well, that was my original thought. The problem is that

:func:`set_coroutine_origin_tracking_depth`

is too long to fit in the ascii-art box. So... either we need a shorter name for the function, or we need to redraw this whole giant table, and I couldn't think of a satisfactory way to do either in the 2 minutes I spent thinking about it :-). Any suggestions?

I guess we could use that weird ReST substitution thing? I'm not sure how that works.

This comment has been minimized.

enabled, the ``cr_origin`` attribute on coroutine objects will
contain a list of (filename, line number, function name) tuples
describing the traceback where the coroutine object was created.
When disabled, ``cr_origin`` will be None.

This comment has been minimized.

Copy link
@1st1

1st1 Jan 20, 2018

Member

Need to specify how the list is ordered.

@@ -56,6 +56,10 @@ PyErr_WarnExplicitFormat(PyObject *category,
#define PyErr_Warn(category, msg) PyErr_WarnEx(category, msg, 1)
#endif

#ifndef Py_LIMITED_API
PyAPI_FUNC(void) _PyErr_WarnUnawaitedCoroutine(PyObject *coro);

This comment has been minimized.

Copy link
@1st1

1st1 Jan 20, 2018

Member

Strictly speaking we don't need PyAPI_FUNC here, as warnings.c|h will be linked with genobject.c and we are not going to use it in extensions.

enabled = bool(enabled)
if self._coroutine_wrapper_set == enabled:
def _set_coroutine_origin_tracking(self, enabled):
if enabled == self._coroutine_origin_tracking_enabled:

This comment has been minimized.

Copy link
@1st1

1st1 Jan 20, 2018

Member

if enabled and self._coroutine_origin_tracking_enabled:?

I don't like using == for bools.

This comment has been minimized.

Copy link
@njsmith

njsmith Jan 21, 2018

Author Contributor

No, that would be different – the no-== equivalent would be:

if ((enabled and self._coroutine_origin_tracking_enabled) or (not enabled and not self._coroutine_origin_tracking_enabled)):
   ...

I find the == version easier to read. ("If the requested state equals the current state, return without doing anything.")

This comment has been minimized.

Copy link
@1st1

1st1 Jan 21, 2018

Member

Got it, then I'd cast them both to bool: if bool(enabled) == bool(self._coroutine_origin_tracking_enabled):


if (depth == 0) {
((PyCoroObject *)coro)->cr_origin = NULL;
} else {

This comment has been minimized.

Copy link
@1st1

1st1 Jan 20, 2018

Member

Again, per PEP 7:

if ( ... ) {
}
else {
}

This comment has been minimized.

Copy link
@1st1

1st1 Jan 20, 2018

Member

Also I'd like this whole else branch to be in a separate helper function.

} else {
PyObject *origin = PyList_New(depth);
/* Immediately pass ownership to coro, so on error paths we don't have
to worry about it separately. */

This comment has been minimized.

Copy link
@1st1

1st1 Jan 20, 2018

Member

The idea is neat, but it makes the code harder to read and it's not how we usually do it in CPython. While reviewing the code I've already added 2 comments asking you to add Py_DECREF(origin). Please do it the long way: create, populate, cleanup on errors, and finally assign it to cr_origin. After all it's just two places where an extra decref is needed.

frame = frame->f_back;
}
/* Truncate the list if necessary */
if (PyList_SetSlice(origin, i, depth, NULL) < 0) {

This comment has been minimized.

Copy link
@1st1

1st1 Jan 20, 2018

Member

Can we call PyList_SetSlice only when needed?

*/
fn = get_warnings_attr(&PyId__warn_unawaited_coroutine, 1);
if (fn) {
res = PyObject_CallFunctionObjArgs(fn, coro, NULL);

This comment has been minimized.

Copy link
@1st1

1st1 Jan 20, 2018

Member

Just add Py_DECREF(fn) after this line.

int
_PyEval_SetCoroutineOriginTrackingDepth(int new_depth)
{
PyThreadState *tstate = PyThreadState_GET();

This comment has been minimized.

Copy link
@1st1

1st1 Jan 20, 2018

Member

Add assert(new_depth >= 0) for cases when somebody uses the C API directly (sys.set_coroutine_origin_tracking_depth has a check for Python users).

@bedevere-bot

This comment has been minimized.

Copy link

bedevere-bot commented Jan 20, 2018

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

PyObject *origin = PyList_New(depth);
/* Immediately pass ownership to coro, so on error paths we don't have
to worry about it separately. */
((PyCoroObject *)coro)->cr_origin = origin;

This comment has been minimized.

Copy link
@1st1

1st1 Jan 21, 2018

Member

I just realized that exposing list object directly through cr_origin means that users can mutate it. How about we cycle through the frames two times: 1st time to determine the real depth we can capture; 2nd time to populate a cr_origin tuple (not list). Traversing a relatively short chain of frames should be pretty fast, and this is a debug mode after all. What do you think?

This comment has been minimized.

Copy link
@1st1

1st1 Jan 21, 2018

Member

Another option would be to keep things as is, but to change cr_origin to a getter, and return a copy of the list every time. But I like the tuple idea more.

This comment has been minimized.

Copy link
@njsmith

njsmith Jan 21, 2018

Author Contributor

Right, I originally left out the Py_VISIT call and then realized that mutation was possible, so figured I'd put it in just in case :-).

I don't think it matters too much whether we expose a tuple, copy the list, or just keep the current thing where someone can mutate it if they really want to. (The warnings code is robust against such shenanigans: worst case it'll print an error b/c cr_origin was corrupt, and then the real warning without a traceback.) Given this I have a mild preference not to do the tuple thing, because it leads to the most complicated C code. Doing an extra loop is totally doable of course, but it's also plenty complicated to hide some kind of off-by-one bug or something.

@@ -1512,6 +1542,7 @@ static PyMethodDef sys_methods[] = {
{"call_tracing", sys_call_tracing, METH_VARARGS, call_tracing_doc},
{"_debugmallocstats", sys_debugmallocstats, METH_NOARGS,
debugmallocstats_doc},
SYS_SET_COROUTINE_ORIGIN_TRACKING_DEPTH_METHODDEF

This comment has been minimized.

Copy link
@1st1

1st1 Jan 21, 2018

Member

Add sys.get_coroutine_origin_tracking_depth?

This comment has been minimized.

Copy link
@njsmith

njsmith Jan 21, 2018

Author Contributor

Eh, I guess. I was thinking this is sufficiently arcane functionality that I could be lazy and get away with the set-returns-the-old-value pattern (cf. signal.set_wakeup_fd), but you're probably right that set+get is a nicer API.

@njsmith

This comment has been minimized.

Copy link
Contributor Author

njsmith commented Jan 21, 2018

@njsmith

This comment has been minimized.

Copy link
Contributor Author

njsmith commented Jan 21, 2018

Okay, I think that last push addresses everything except:

  • Py_VISIT vs tuples vs copying lists
  • set vs set+get
@1st1

This comment has been minimized.

Copy link
Member

1st1 commented Jan 21, 2018

I still don't like the idea of returning the original mutable list :( Let's do some extra work and make this list a tuple of tuples. It's better than have a getter for cr_origin+copy.

And let's have a 'get' method, otherwise it's hard to capture the current sys settings for debug purposes or whatnot.

@njsmith

This comment has been minimized.

Copy link
Contributor Author

njsmith commented Jan 21, 2018

I'm not a huge fan b/c tuples shouldn't be used for homogenous variable-length sequences, but... fine

@njsmith

This comment has been minimized.

Copy link
Contributor Author

njsmith commented Jan 21, 2018

I have made the requested changes; please review again

@bedevere-bot

This comment has been minimized.

Copy link

bedevere-bot commented Jan 21, 2018

Thanks for making the requested changes!

@1st1: please review the changes made to this pull request.

@1st1

This comment has been minimized.

Copy link
Member

1st1 commented Jan 21, 2018

I'm not a huge fan b/c tuples shouldn't be used for homogenous variable-length sequences, but... fine

Yeah, I know, but we don't have a frozenlist.

@1st1
1st1 approved these changes Jan 21, 2018
@1st1 1st1 merged commit fc2f407 into python:master Jan 21, 2018
4 checks passed
4 checks passed
bedevere/issue-number Issue number 32591 found
Details
bedevere/news News entry found in Misc/NEWS.d
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
njsmith added a commit to njsmith/cpython that referenced this pull request Jan 24, 2018
1st1 added a commit that referenced this pull request Jan 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.