Skip to content
This repository has been archived by the owner on Aug 2, 2023. It is now read-only.

If threading is not imported in main thread, attach to pid does not work properly. #1542

Closed
fabioz opened this issue Jun 26, 2019 · 11 comments
Closed
Assignees
Labels

Comments

@fabioz
Copy link
Contributor

fabioz commented Jun 26, 2019

I was doing some manual testing for the attach to pid and in this situation, if the user never imported threading, we don't currently set the tracing properly.

The main reason for this is that threading seems to expect that it's imported from the actual main thread and not from another thread (because presumably one would import threading to create a thread in the first place) and it creates the initial _MainThread with the current thread (even if that's not the actual main thread).

Now, in the attach to pid, what happens is that we create a thread on the target process and then call the settrace in that thread... and if threading was never imported it'll consider that dummy thread as the main thread because we import threading in the dummy thread and the actual main thread will not be visible to threading.enumerate().

I still don't know the best way to overcome this...

Additional notes:

  • This happens in Python 2 and Python 3.
  • We can detect that if threading not in sys.modules during the attach.
  • In Linux we don't create a thread during the attach and use the current running thread, so, this shouldn't be a problem (although it could be an issue if the user has multiple threads not started through threading and we happen to stop in the wrong thread, then we'd import threading in the wrong thread).
@fabioz
Copy link
Contributor Author

fabioz commented Jun 26, 2019

As a note, I've also reported this to CPython: https://bugs.python.org/issue37416

@int19h
Copy link
Contributor

int19h commented Jun 27, 2019

I remember now that we had to deal with that exact problem in PyDebugAttach, and also in the old ptvsd.

https://github.com/microsoft/PTVS/blob/b26f5802baabfbaa31b2346ec432ea5ea4c5a2cd/Python/Product/PythonTools/ptvsd/__init__.py#L24-L28

https://github.com/microsoft/PTVS/blob/b26f5802baabfbaa31b2346ec432ea5ea4c5a2cd/Python/Product/PythonTools/ptvsd/debugger.py#L22-L26

https://github.com/microsoft/PTVS/blob/b26f5802baabfbaa31b2346ec432ea5ea4c5a2cd/Python/Product/PyDebugAttach/PyDebugAttach.cpp#L945-L965

https://github.com/microsoft/PTVS/blob/b26f5802baabfbaa31b2346ec432ea5ea4c5a2cd/Python/Product/PythonTools/ptvsd_loader.py#L25-L40

So the way those two worked in conjunction is that PyDebugAttach made sure that threads are initialized without importing threading; the loader that was injected made sure that threading was still unloaded; and finally PyDebugAttach invoked what's basically the equivalent of ptvsd.attach() directly on the injected thread.

The crucial part there was that at no point threading was loaded at all in any of this. If you look at various mentions of _threading in debugger.py above, you'll see where and when it lazily loaded it.

@fabioz
Copy link
Contributor Author

fabioz commented Jun 27, 2019

Well, we actually do use that code in the attach to pid (so, we initialize the low-level threading facilities the same way), the difference is that ptvsd really delays loading threading.

Still, I think lazy-loading threading is not really a good solution, the main reasons being:

I think that if the user does not use the threading module to start its threads it could end up deadlocking the debugger because of the lazy loading on Python 2.7 -- see: #1039 (comment) for the reason that pydevd imports everything on top-level (in the case of the old ptvsd, importing threading is still needed to enumerate the threads and it'll do that when blocked in a breakpoint inside of the trace function: https://github.com/microsoft/PTVS/blob/b26f5802baabfbaa31b2346ec432ea5ea4c5a2cd/Python/Product/PythonTools/ptvsd/debugger.py#L1265 which would deadlock in Python 2.7).

Also, there's still the case where even lazy-loading threading in the debugger module if the debugger is stopped at a secondary thread would make the same issue still occur (the main thread as seen by threading would be wrong), just on a different time -- the use cases I see is if the user is using the thread module directly or qt with a breakpoint directly on a secondary thread.

And to finish, I remember that delaying the load of threading was problem when dealing with gevent (although I don't have details right now, but it was related to the gevent monkeypatching).

I was thinking a bit about this and I think we should be able to just fix the threading module to have the proper info if we detect that we were the first to initialize it and the main thread is not the correct thread (i.e.: I think we can get the needed info from the CPython API and fix threading._active and threading._main_thread._ident in our secondary thread).

@fabioz
Copy link
Contributor Author

fabioz commented Jun 27, 2019

Or maybe a different approach would be just initializing threading on the main thread...

Instead of simply doing the initThreads on addPendingCall, maybe we could also do import threading so that the threading module would be initialized properly since we already go through all the trouble to call initThreads on the main thread.

@fabioz
Copy link
Contributor Author

fabioz commented Jun 27, 2019

I tried to go through the addPendingCall approach and it didn't work... it seems the issue lies deeper -- because we initialize the threading facilities at that secondary thread, it seems that the callback called on addPendingCall is not called in the main thread (or at least the utility used to get the current thread indent is returning the id of our secondary thread) and we still end up with the wrong thread id in threading.main_thread (our secondary thread also becomes the head of PyInterpreterState_ThreadHead).

Now, this got me thinking a bit more about the actual problem... the real reason we don't attach properly is because we're now using threading.enumerate() to get the ids of the threads to trace, but it should be possible to get all the thread ids through PyInterpreterState_ThreadHead and PyThreadState_Next even if they're not seen by the threading module, so, I'll create an API to provide that info and use it to set the tracing to those threads.

Then, in the case where threading is still not in sys.modules I'll import it and fix the threading._active and threading._main_thread._ident... there's still the problem where we don't know the id of the main thread -- if there's only the main thread and our secondary thread it should be easy to get, but if there are more threads, I'm not sure... I'll probably just get the first which is not our secondary thread in the list of threads provided by PyInterpreterState_ThreadHead/PyThreadState_Next and hope it's the correct one (at least for the time being it should be reasonable and we can revisit this later).

@karthiknadig
Copy link
Member

karthiknadig commented Jun 27, 2019

Looking at the implementation of _PyOS_IsMainThread it should be possible to get the main thread from the _PyRuntime state object via _PyRuntime.main_thread. This has the right information but the structure itself not exposed. You could get this using offset on the exposed _PyRuntime object pointer but it could be flaky.

@int19h
Copy link
Contributor

int19h commented Jun 27, 2019

We need to initialize threading on the main thread as well. But doesn't it already do so via AddPendingCall? In PyDebugAttach at least, PyEval_InitThreads is invoked via AddPendingCall

@fabioz
Copy link
Contributor Author

fabioz commented Jun 28, 2019

We need to initialize threading on the main thread as well. But doesn't it already do so via AddPendingCall? In PyDebugAttach at least, PyEval_InitThreads is invoked via AddPendingCall

In practice, what I see is that the threads are not initialized through the addPendingCall -- and it should be the same on PyDebugAttach... (I'll actually put links to PyDebugAttach because the code in pydevd is basically the same).

On Python 2.7 currPyThread == nullptr and it initializes the threading in the secondary thread we created (i.e.: in PyDebugAttach.cpp: https://github.com/microsoft/PTVS/blob/b26f5802baabfbaa31b2346ec432ea5ea4c5a2cd/Python/Product/PyDebugAttach/PyDebugAttach.cpp#L1019).

In Python 3.7 the initial check for if (!threadsInited()) (https://github.com/microsoft/PTVS/blob/b26f5802baabfbaa31b2346ec432ea5ea4c5a2cd/Python/Product/PyDebugAttach/PyDebugAttach.cpp#L923) returns true already and it doesn't really get there (I'm not sure where it initialized thought, I just know it got to that point already initialized)... I tried to add a pending call later, and it works but there's a big drawback in that if the user is not running code in the main thread (say he created multiple secondary threads and is waiting for them to finish or is in some deadlock), that code is never called and we don't attach successfully -- so, for that reason alone I think that we can't use that approach.

So, in practice it seems that all that code to use addPendingCall is not really used (at least on 2.7 and 3.7, I'm not sure about other versions)...

I'm not actually sure why in Python 3.7 it's already initialized before (I haven't actually made a debug version to check, but it seems that just getting the gil could do it: https://github.com/python/cpython/blob/36456df13843c5b8a1fb5a6022ab9ed1fe2a11c5/Python/pystate.c#L1275) -- and that's probably good because otherwise we woudn't be able to connect on cases where the main thread is not actually running because our callback to addPendingCall would never be called (although a lot of that code can probably be removed in that light).

@fabioz
Copy link
Contributor Author

fabioz commented Jun 28, 2019

Looking at the implementation of _PyOS_IsMainThread it should be possible to get the main thread from the _PyRuntime state object via _PyRuntime.main_thread. This has the right information but the structure itself not exposed. You could get this using offset on the exposed _PyRuntime object pointer but it could be flaky.

I think that going that way can be an option, but I don't think we should tackle that right now (as you said, it can be a bit on the flaky side and each Python version is probably different).

@fabioz
Copy link
Contributor Author

fabioz commented Jun 28, 2019

Maybe the right approach is trying to use the addPendingCall to do the threading import but in case it doesn't complete in a given amount of time we use the approach I outlined at #1542 (comment) as a fallback.

@int19h
Copy link
Contributor

int19h commented Jun 28, 2019

On 3.7+, the threads are always initialized because Py_Initialize now automatically invokes PyEval_InitThreads:
https://docs.python.org/3/c-api/init.html#c.PyEval_InitThreads

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants