Skip to content

Conversation

DinoV
Copy link
Contributor

@DinoV DinoV commented Jun 8, 2017

multi-threaded scenarios involving deletion of code objects

Fixes: http://bugs.python.org/issue30604

bpo-30604

multi-threaded scenarios involving deletion of code objects
@DinoV DinoV requested review from brettcannon and 1st1 June 8, 2017 21:42
@DinoV DinoV changed the title Move co_extra_freefuncs to interpreter state to avoid crashes in Move co_extra_freefuncs to interpreter state to avoid crashes in (bpo-30604) Jun 8, 2017
@brettcannon brettcannon changed the title Move co_extra_freefuncs to interpreter state to avoid crashes in (bpo-30604) bpo-30604: Move co_extra_freefuncs to interpreter state to avoid crashes in threads Jun 8, 2017
@brettcannon brettcannon changed the title bpo-30604: Move co_extra_freefuncs to interpreter state to avoid crashes in threads Move co_extra_freefuncs to interpreter state to avoid crashes in threads Jun 8, 2017
@brettcannon brettcannon changed the title Move co_extra_freefuncs to interpreter state to avoid crashes in threads bpo-30604: Move co_extra_freefuncs to interpreter state to avoid crashes in threads Jun 8, 2017
@brettcannon brettcannon changed the title bpo-30604: Move co_extra_freefuncs to interpreter state to avoid crashes in threads Move co_extra_freefuncs to interpreter state to avoid crashes in threads Jun 8, 2017
@brettcannon brettcannon changed the title Move co_extra_freefuncs to interpreter state to avoid crashes in threads bpo-30604: Move co_extra_freefuncs to interpreter state to avoid crashes in threads Jun 8, 2017
@@ -825,8 +825,6 @@ _PyCode_CheckLineNumber(PyCodeObject* co, int lasti, PyAddrPair *bounds)
int
_PyCode_GetExtra(PyObject *code, Py_ssize_t index, void **extra)
{
assert(*extra == NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really just makes the API harder to use. extra is logically an out parameter here so it's input state is irrelevant. It means that you can't just re-use the same variable w/o zero initializing it to get the extra field (which is what I'm doing in the unit tests I've added).

I just also updated it so it'll be zero'd out in the case where there is no value actually as otherwise you might think it succeeds and get garbage out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just also updated it so it'll be zero'd out in the case where there is no value actually as otherwise you might think it succeeds and get garbage out.

Alright, makes sense.

@ned-deily
Copy link
Member

Is this something that can be safely changed in a 3.6.x maintenance release, i.e. are there compatibility issues?

@DinoV
Copy link
Contributor Author

DinoV commented Jun 8, 2017

@ned-deily It's debatable but Yury and I both think it should be okay to change it. It's not absolutely 100% compatible, but using the API as is right now can actually lead to crashes due to the threading issue, and there is no safe way for multiple consumers to use the API at all.

There's currently only 1 known consumer of this (PyCharm) and I've pinged them. Anyone using it in a single threaded fashion will be fine, and even consumers using it in programs with less than 255 threads ever created will also be fine.

@1st1
Copy link
Member

1st1 commented Jun 9, 2017

@ned-deily +1 to what Dino said; I think we should have this fixed in 3.6.2.

@AraHaan
Copy link
Contributor

AraHaan commented Jun 9, 2017

I agree there was many cases where this has bit me in the butt when a class subclassed threading.thread and the class getting called over 255 times and crash when the thread does certain things (that I cant say here as I don't know entirely what all it exactly does), But I do know for sure that it was using ffmpeg to send audio data to a server using packets that also are handled with opus. Long story short it caused a lot of crashes to python itself on Windows.

@ned-deily
Copy link
Member

I bring this up because we have already been burned by an inadvertent ABI compatibility break between 3.6.0 and 3.6.1; see the discussion starting here and continuing in bpo-29943. I'm certainly not an expert on the C-API/ABI and compatibility issues; but my understanding is that we strive to maintain compatibility such that extension modules compiled against any version of 3.6.x will work properly with any version of 3.6.y, where y can be less than, equal to, or greater than x. This is particular important to projects supplying extension modules via wheels. So, I'd really like to have at least one other pair of eyes reviewing this for 3.6.x. @ncoghlan, can you take a quick look?

Also, it looks like this should be going into master for 3.7. Shouldn't that be pushed first?

@ned-deily ned-deily requested a review from ncoghlan June 9, 2017 01:22
@AraHaan
Copy link
Contributor

AraHaan commented Jun 9, 2017

Yes, I say push to master, then 3.6, and maybe 3.5?. 3.5 and 3.4 also has this issue (crash) as well.

(This code can replicate it on linux too: https://github.com/DecoraterBot-devs/DecoraterBot-cogs/blob/master/plugins/voice.py (ignore the all overness and hacks on it to make it partially work))

@1st1
Copy link
Member

1st1 commented Jun 9, 2017

Yes, I say push to master, then 3.6, and maybe 3.5?. 3.5 and 3.4 also has this issue (crash) as well.

@AraHaan you are talking about something completely unrelated to this PR. This PR fixes the code that was added only in 3.6. Pease don't create extra noise in this PR unless you are 100% sure of this PR affecting some code.

but my understanding is that we strive to maintain compatibility such that extension modules compiled against any version of 3.6.x will work properly with any version of 3.6.y, where y can be less than, equal to, or greater than x.

@ned-deily @ncoghlan The compatibility should be preserved. We are only changing the internal implementation. The chance of something breaking is virtually 0; I'd call this PR fully backwards compatible.

Copy link
Contributor

@ncoghlan ncoghlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ned-deily's right - as written, this is an unacceptable ABI break, as it changes the sizes of two public structs (PyInterpreterState and PyThreadState) that may have fixed size storage allocated in extension modules and embedding applications.

@@ -142,9 +145,6 @@ typedef struct _ts {
PyObject *coroutine_wrapper;
int in_coroutine_wrapper;

Py_ssize_t co_extra_user_count;
freefunc co_extra_freefuncs[MAX_CO_EXTRA_USERS];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyThreadState is public, and you're changing the size of the struct, so this is a problematic ABI break. Keeping these fields the same size, but changing their names to _preserve_36_ABI_1 and _preserve_36_ABI_2 should address most of that concern.

@@ -48,6 +48,9 @@ typedef struct _is {
PyObject *import_func;
/* Initialized to PyEval_EvalFrameDefault(). */
_PyFrameEvalFunction eval_frame;

Py_ssize_t co_extra_user_count;
freefunc co_extra_freefuncs[MAX_CO_EXTRA_USERS];
} PyInterpreterState;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the issue with PyThreadState below, this changes the size of PyInterpreterState, and that's a public struct, and hence may have fixed size storage allocated in extension modules. That's significantly less likely to be the case for PyInterpreterState than it is for PyThreadState, but it's still an ABI break.

About the only way I see to avoid that is to to consider these two fields to be part of the interpreter state, and manipulate them accordingly, but leave the actual storage (at least for 3.6) inside tstate_head (copying them between threads states as necessary to cope with that indirection).

And then for 3.7, you can do this the simple way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possibility here would be maintaining a separate list for the code extra states. So:

struct PyCodeExtraState {
PyCodeExtraState* next;
PyInterpreterState* interp;
}

And then update the list when interpreter states are created/destroyed, and search the list for the current interprerter when getting the state. Then leave the thread state fields around with the preserve ABI names.

Does that sound better then risking changing PyInterpreterState?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also work for fixing 3.6.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow I talked myself out of thinking that approach would work when writing my original review comments, but you're right, if you did that, you'd just need a single C global pointer somewhere to get hold of the head of the list, and then scan it for the one that matched the interpreter state of interest. In the normal case (i.e. no subinterpreters), that's always going to be the one linked directly from the C global.

So +1 from me for doing that dance at least for 3.6.

For 3.7, it might be nicer to have it directly in the interpreter state, but that does have the downside of potentially making maintenance more difficult (due to the state storage inconsistency)

@1st1
Copy link
Member

1st1 commented Jun 9, 2017

@ned-deily's right - as written, this is an unacceptable ABI break, as it changes the sizes of two public structs (PyInterpreterState and PyThreadState) that may have fixed size storage allocated in extension modules and embedding applications.

Good call, Nick, I didn't consider changing the size of those structs to be a breaking change.

but leave the actual storage (at least for 3.6) inside tstate_head (copying them between threads states as necessary to cope with that indirection).

This should work, but will require a slightly complicated code.

Maybe we can only use the state for the main thread? We can easily backport (as a private helper) PyInterpreterState_Main to 3.6.

@ncoghlan
Copy link
Contributor

ncoghlan commented Jun 9, 2017

@1st1 Yeah, it should be possible to only use the interpreter's main thread for the state storage. It's going to have to be something fiddly like that though, as I don't see another way to get around the structs being public without completely breaking 3.6's subinterpreter support.

Add new linked list of code extra objects on a per-interpreter basis
  so that interpreter state size isn't changed
@codecov
Copy link

codecov bot commented Jun 9, 2017

Codecov Report

Merging #2015 into 3.6 will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##              3.6    #2015      +/-   ##
==========================================
+ Coverage   83.63%   83.63%   +<.01%     
==========================================
  Files        1370     1370              
  Lines      346245   346309      +64     
==========================================
+ Hits       289587   289650      +63     
- Misses      56658    56659       +1
Impacted Files Coverage Δ
Lib/test/test_code.py 96.42% <100%> (+3%) ⬆️
Lib/threading.py 82.17% <0%> (-0.52%) ⬇️
Lib/test/lock_tests.py 86.66% <0%> (-0.29%) ⬇️
Lib/test/test_buffer.py 96.53% <0%> (-0.18%) ⬇️
Lib/pydoc.py 62.27% <0%> (+0.06%) ⬆️
Lib/test/test_smtplib.py 86.71% <0%> (+0.22%) ⬆️
Lib/ctypes/__init__.py 71.29% <0%> (+0.3%) ⬆️
Lib/test/test_random.py 98.66% <0%> (+0.38%) ⬆️
Lib/test/test_poll.py 92.63% <0%> (+0.61%) ⬆️
Lib/wsgiref/handlers.py 89.27% <0%> (+0.85%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 570b1c9...a33106b. Read the comment docs.

@ned-deily
Copy link
Member

Hopefully, @ncoghlan will be awake soon and can say aye or nay. Also I see @brettcannon had been requested as a reviewer; any comments?

freefunc co_extra_freefuncs[MAX_CO_EXTRA_USERS];
} PyCodeExtraState;

PyCodeExtraState* _PyCodeExtraState_Get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe two underscores / comment that this method is a private API that will go way in 3.7.

@@ -105,7 +105,9 @@
import sys
import unittest
import weakref
from test.support import run_doctest, run_unittest, cpython_only
import threading
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: import should be alphabetically sorted in cpython codebase.

@1st1
Copy link
Member

1st1 commented Jun 12, 2017

LGTM (but wait until Nick gives it another look). Also, don't commit this to 3.7 -- let's make it the right (simple) way there.

@DinoV
Copy link
Contributor Author

DinoV commented Jun 12, 2017

Sounds good, I'll basically just port the original version (9ce5553) to 3.7

Copy link
Member

@brettcannon brettcannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Touched up some formatting but otherwise LGTM

Python/pystate.c Outdated
@@ -73,6 +74,12 @@ PyInterpreterState_New(void)
PyMem_RawMalloc(sizeof(PyInterpreterState));

if (interp != NULL) {
PyCodeExtraState* coExtra = PyMem_RawMalloc(sizeof(PyCodeExtraState));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use CamelCase for variable names, or do we?

Python/pystate.c Outdated
@@ -147,9 +158,10 @@ void
PyInterpreterState_Delete(PyInterpreterState *interp)
{
PyInterpreterState **p;
PyCodeExtraState **pExtra;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for nitpicking, but could you please fix this naming too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, of course, I forgot that one existed too!

Copy link
Contributor

@ncoghlan ncoghlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updates fix the ABI breakage (thanks!), but currently add a new struct definition to the public API (as there's no underscore prefix on PyCodeExtraState)


Py_ssize_t co_extra_user_count;
freefunc co_extra_freefuncs[MAX_CO_EXTRA_USERS];
} PyInterpreterState;
#endif
} PyCodeExtraState;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unfortunately exports PyCodeExtraState as a new public struct definition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed it to __PyCodeExtraState to match the C function for getting it

@ncoghlan
Copy link
Contributor

Thanks @DinoV - LGTM!

@ned-deily
Copy link
Member

All that's missing now is a Misc/NEWS entry and we're good to go. @DinoV can you add that?

@DinoV
Copy link
Contributor Author

DinoV commented Jun 13, 2017

NEWS updated, everything's ready to go!

@ned-deily ned-deily merged commit 2997fec into python:3.6 Jun 13, 2017
@ned-deily
Copy link
Member

OK! Thanks everyone for working through the nitty-gritty compatibility details!

@miss-islington
Copy link
Contributor

🐍🍒⛏🤖 Thanks @DinoV for the PR, and @ned-deily for merging it 🌮🎉.I'm working now to backport this PR to: 3.6.

@miss-islington
Copy link
Contributor

Sorry, @DinoV and @ned-deily, I could not cleanly backport this to 3.6 due to a conflict.
Please backport using cherry_picker on command line.

@ncoghlan
Copy link
Contributor

ncoghlan commented Sep 7, 2017

This didn't need backporting, as it was a commit to the 3.6 branch :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants