Why are multiple debugging threads being started in integrated Python-debugging of VS Code? #349

AndreasLuckert · 2020-07-27T13:56:38Z

Environment data

VS Code version: 1.47.2
17299e413d5590b14ab0340ea477cdd86ff13daf
x64
Extension version (available under the Extensions sidebar): ms-python.python v2020.7.96456
OS and version: Operating System: Ubuntu 18.04.4 LTS
Kernel: Linux 5.4.0-42-generic
Architecture: x86-64
Python version (& distribution if applicable, e.g. Anaconda): Python 3.8.5 (default, Jul 22 2020, 18:54:26)
[GCC 5.4.0 20160609] on linux (from linuxbrew using pip)
Type of virtual environment used (N/A | venv | virtualenv | conda | ...): pip, main/general environment
Relevant/affected Python packages and their versions: -
Relevant/affected Python-related VS Code extensions and their versions: -
Value of the python.languageServer setting: "Jedi"

Expected behaviour

Debugging with a single main thread.

Actual behaviour

Multiple threads occur and slow down the process for executing the same code several times simultaneously.

Steps to reproduce:

[NOTE: Self-contained, minimal reproducing code samples are extremely helpful and will expedite addressing your issue]

Open a python-code (.py-file) in VS Code.
Start debugging via F5.
See how multiple threads are starting debugging the code simultaneously while slowing down the entire computer unnecessarily:

For all the other details, please visit my question on StackOverflow.

I'm asking for help in this issue here as well since on StackOverflow there was no significant activity regarding the solving of this matter. I'd also asked this question on VS Code GitHub which was not answered there but forwarded to other places.

Logs

Output for Python in the Output panel (View→Output, change the drop-down the upper-right of the Output panel to Python)

XXX

The text was updated successfully, but these errors were encountered:

int19h · 2020-07-27T17:43:24Z

Debugger does spawn background threads - that's an unavoidable part of the implementation. However, you wouldn't see those threads in the Call Stack window. They are also not running your code, so "executing the same code several times simultaneously" does not apply. Same thing goes for subprocesses.

This behavior doesn't reproduce for me on any simple app, so we'll need some more information to repro. Most likely, this has more to do with the code being debugged, or with the libraries that it uses.

In addition, I'm curious as to what lead you to conclude that those threads execute the same code. Did you try pausing and inspecting the call stacks in other threads to see what code is running there? If not, doing that might shed some light on what's spawning the threads.

AndreasLuckert · 2020-07-28T08:23:35Z

I deduced that it's executing the code several times simultaneously because of the console output, which indicates me exactly that.
For example:

...

*** File 4: AVAMET_station_c25m181e02_Oliva_poble_1-31-jan-2018.csv

Importing all meteo files:  29%|██▊       | 4/14 [00:04<00:08,  1.20it/s]

*** File 5: AVAMET_station_c25m181e02_Oliva_poble_1-apr-2-aug-2019.csv

Importing all meteo files:  29%|██▊       | 4/14 [00:04<00:08,  1.13it/s]

*** File 5: AVAMET_station_c25m181e02_Oliva_poble_1-apr-2-aug-2019.csv

Importing all meteo files:  29%|██▊       | 4/14 [00:04<00:08,  1.18it/s]

*** File 5: AVAMET_station_c25m181e02_Oliva_poble_1-apr-2-aug-2019.csv

Importing all meteo files:  29%|██▊       | 4/14 [00:04<00:09,  1.06it/s]

*** File 5: AVAMET_station_c25m181e02_Oliva_poble_1-apr-2-aug-2019.csv

...

As can be seen, in this extraction of my console output File 5 is being imported 4 times in a row, which matches exactly with the number of threads being displayed in the call stack window apart from the main thread.

Moreover, the yellow color of the transparent bar showing where the debugger stopped is less transparent and shows are way more intense yellow color since the debugger stopped 4 times at the same code line which causes an overlap of this indicator bar.
Next, when pressing F10 for Step Over, only one debugging thread reacts while the others remain at the same code line:

As for the call stack, I'm going to provide 3 screenshots of all juxtaposed threads and subprocesses being shown in the following (from top to bottom):

Part 1 (top):

Part 2 (middle):

Part 3 (bottom):

Furthermore, it is true that when selecting a main thread from another sub-process (of the 4 available in the call stack), indeed I can Step Over via F10 with another of the yellow debugging indicator bars.
For showcasing reasons, I did it in a way that all 4 debugging instances are exposed in the following screenshot:

I hope this helps in interpreting where this undesired behavior comes from. They seem to debug the same code and I would like to understand the reason and prevent that from happening.

fabioz · 2020-07-28T10:53:01Z

@AndreasLuckert I believe this issue is on your own code or some library you're using to accelerate your code using multiple processes -- the debugger is just tracking them as it should.

For the threads, it seems they're started in a _monitor.py file (do you have such a file in your project? -- if you click them on the call stack it should show where that file is).

If you share the logs you have from running I may be able to give you more insight.

i.e.:

Open VS Code
Select the command Extensions: Open Extensions Folder
Locate the Python extension directory, typically of the form ms-python.python-2020..***
In that directory ensure you do not have any debug*.log files, if you do, please delete them
Go back into VS Code and modify your launch.json to add the setting "logToFile": true, see below:

"version": "0.2.0",
"configurations": [
    {
        "name": "Python: Current File (Integrated Terminal)",
        "type": "python",
        "request": "launch",
        "program": "${file}",
        "stopOnEntry": true,
        "console": "integratedTerminal",
        "logToFile": true
    },

Start debugging
When done, go back into the extension directory and upload the debug*.log files into this GitHub issue.

AndreasLuckert · 2020-07-28T15:20:44Z

As far as I know, I don't have a file called _monitor.py in my project, but are my debugging logs from the last session when the multithread-issue happened again:
Multi_thread_python_debugging_LOGfiles.zip

I copied these files from my /home/andylu/.vscode/extensions/ms-python.python-2020.7.96456 - directory, which is hopefully the one you meant after activating the automatic logging via "logToFile": true in my launch.json - file.

Thanks in advance for analyzing these files.

fabioz · 2020-07-28T15:48:21Z

It seems you (or some library you have) is using the multiprocessing module.

i.e. the stack with full paths is:

  Stack: /home/andylu/Desktop/Python/Scripts/Master/Import_export/AERMOD/prepare_filtered_AERMOD_CSV_outputs_for_plotting.py, <module>, 368
  Stack: /home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/runpy.py, _run_code, 87
  Stack: /home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/runpy.py, _run_module_code, 97
  Stack: /home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/runpy.py, run_path, 265
  Stack: /home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/multiprocessing/spawn.py, _fixup_main_from_path, 287
  Stack: /home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/multiprocessing/spawn.py, prepare, 236
  Stack: /home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/multiprocessing/spawn.py, _main, 125
  Stack: /home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/multiprocessing/spawn.py, spawn_main, 116
  Stack: <string>, <module>, 1

I also see many Exception occurred during evaluation., so, maybe it only gets to that code when you are manually evaluating something in the debugger (so, it's possible that for instance you'd hover over something or add some watch that asks the debugger to evaluate something that uses multiprocessing, which in turn starts those processes).

One trick to evaluate where threads are being created could be adding:

import traceback
traceback.print_stack()

in threading.Thread.start to see where those threads are being started (note that some internal threads from the debugger are expected, but they're not the issue here).

For the multiprocessing, I think you could do something analogous, but I'm not sure the proper place in this case -- maybe process.BaseProcess.start would be the place to look at.

Anyways, this doesn't seem to be an issue in the debugger, so, I'm closing this issue.

fabioz · 2020-07-28T15:55:12Z

Also, something to double check in your code is if the main entry point is protected by the __name__=='__main__' check:

if __name__ == '__main__':
    main()

and depending on the structure, you could also need the multiprocessing.freeze_support() if you need to have multiprocessing support:

i.e.:

if __name__ == '__main__':
    multiprocessing.freeze_support()
    main()

(if you don't have that structure, it may explain why multiprocessing is executing your main entry point code multiple times).

AndreasLuckert · 2020-08-04T10:51:05Z

I have neither the first nor the second option implemented and it could happen with any script I execute I've just realized.
It seems like old debugging sessions are not really terminated behind the scenes, because in one session there is no multi-threading and then, later, a multi-session is launched when debugging.
This has also happened when executing the code normally and it exited with errors, and THEN I went (again) into a debugging session.

Do you recommend me to put

if __name__ == '__main__':
    multiprocessing.freeze_support()
    main()

in the beginning of all my python-scripts which I execute as main-script, i.e. my entry-point or top-level script?

AndreasLuckert · 2020-08-10T18:45:35Z

I'd be glad to learn if putting the above-mentioned code-lines into my main-scripts enable me to avoid the multi-threading during my debugging sessions.
Thanks in advance!

int19h · 2020-08-10T19:13:00Z

It should generally be at the end of your script, and all top-level code that's not already in a function should be inside main(). The reason is that your script is going to be imported as a module in every subprocess that multiprocessing spawns, so any top-level code will run many times without the guard. But only the instance that's directly started as a script will have __name__ == "__main__".

AndreasLuckert · 2020-08-11T12:31:42Z

Alright, I didn't know that even the top-level/main-script is going to be imported as a module possible several times during the debugging process involving multiprocessing.

I thought that I only needed to take care that sub-level scripts don't comprise code which are not within a function in order to prevent their multiple or generally undesired execution via e.g. import sublevel_script.py.
That even the main-script which I run in the first place will be imported more than once was new to me (if I understood that point correctly).

Moreover, as for the recommendation to put

if __name__ == '__main__':
    multiprocessing.freeze_support()
    main()

in the end of my main-script (and possibly all the other sub-level scripts as well, which comprise code outside of functions),
I've found the following docs for multiprocessing.freeze_support().

There, it states the following:

Calling freeze_support() has no effect when invoked on any operating system other than Windows. In addition, if the module is being run normally by the Python interpreter on Windows (the program has not been frozen), then freeze_support() has no effect.

This in turn means that I cannot apply this to my case since, as I'd mentioned in my initial question above, my OS is:

Ubuntu 18.04.4 LTS
Kernel: Linux 5.4.0-42-generic
Architecture: x86-64

Now I'm wondering if there is something equivalent for UNIX, i.e. Linux-systems.
I've run into this blog-post where it states among other things:

[...]
On Linux, when you start a child process, it is Forked. It means that the child process inherits the memory state of the parent process. On Windows (and by default on Mac), however, processes are Spawned. It means that a new interpreter starts and the code reruns.

It explains why, if we run the code on Windows, we get twice the line Before defining simple_func. As you may have noticed, this could have been much worse if we wouldn't include the if main at the end of the file [...]

int19h · 2020-08-11T17:15:27Z

The important part is calling main() from under the guard - freeze_support() is just a part of the standard verbiage that's written to be as platform-independent as possible.

Linux does indeed fork by default, but it can also use spawn via set_start_method(). Your call stack above - the one with multiprocessing/spawn.py in it - indicates that to be the case.

AndreasLuckert · 2020-08-11T17:24:27Z

Okay thanks, so I will implement in the end of my script the bespoke

if __name__ == '__main__':
    multiprocessing.freeze_support()
    main()

since the line multiprocessing.freeze_support() won't do harm on UNIX-systems and works on Windows.

By the way, the expression

from under the guard

stands for if __name__ == '__main__':?

int19h · 2020-08-11T17:35:54Z

Yep! For more details, see this section in Python docs.

brettcannon transferred this issue from microsoft/vscode-python Jul 27, 2020

fabioz closed this as completed Jul 28, 2020

afcruzs mentioned this issue Feb 9, 2024

Improve import speed with lazy initialization NVIDIA/TransformerEngine#624

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why are multiple debugging threads being started in integrated Python-debugging of VS Code? #349

Why are multiple debugging threads being started in integrated Python-debugging of VS Code? #349

AndreasLuckert commented Jul 27, 2020

int19h commented Jul 27, 2020 •

edited

AndreasLuckert commented Jul 28, 2020 •

edited

fabioz commented Jul 28, 2020

AndreasLuckert commented Jul 28, 2020

fabioz commented Jul 28, 2020 •

edited

fabioz commented Jul 28, 2020 •

edited

AndreasLuckert commented Aug 4, 2020

AndreasLuckert commented Aug 10, 2020 •

edited

int19h commented Aug 10, 2020

AndreasLuckert commented Aug 11, 2020

int19h commented Aug 11, 2020

AndreasLuckert commented Aug 11, 2020

int19h commented Aug 11, 2020

Why are multiple debugging threads being started in integrated Python-debugging of VS Code? #349

Why are multiple debugging threads being started in integrated Python-debugging of VS Code? #349

Comments

AndreasLuckert commented Jul 27, 2020

Environment data

Expected behaviour

Actual behaviour

Steps to reproduce:

Logs

int19h commented Jul 27, 2020 • edited

AndreasLuckert commented Jul 28, 2020 • edited

fabioz commented Jul 28, 2020

AndreasLuckert commented Jul 28, 2020

fabioz commented Jul 28, 2020 • edited

fabioz commented Jul 28, 2020 • edited

AndreasLuckert commented Aug 4, 2020

AndreasLuckert commented Aug 10, 2020 • edited

int19h commented Aug 10, 2020

AndreasLuckert commented Aug 11, 2020

int19h commented Aug 11, 2020

AndreasLuckert commented Aug 11, 2020

int19h commented Aug 11, 2020

int19h commented Jul 27, 2020 •

edited

AndreasLuckert commented Jul 28, 2020 •

edited

fabioz commented Jul 28, 2020 •

edited

fabioz commented Jul 28, 2020 •

edited

AndreasLuckert commented Aug 10, 2020 •

edited