Skip to content
This repository has been archived by the owner on Aug 2, 2023. It is now read-only.

subprocess: Support for fork in subprocess debugging #943

Closed
karthiknadig opened this issue Oct 19, 2018 · 21 comments
Closed

subprocess: Support for fork in subprocess debugging #943

karthiknadig opened this issue Oct 19, 2018 · 21 comments

Comments

@karthiknadig
Copy link
Member

karthiknadig commented Oct 19, 2018

Two possible options here:
Option 1: refactor main, daemon, and session code to allow teardown and restart.

  • Pro: Cleaner solution. We are going to refactor the above mentioned items irrespective of option 1 or 2
  • Con: This is a much bigger work item.

Option 2: Delete all loaded ptvsd and pydevd modules, and attempt ptvsd attach

  • Pro: Smaller work item.
  • Con: Potentially long bug tail due to unpredictable state when fork occurs.
@karthiknadig
Copy link
Member Author

To do this right, we need clean shutdown of ptvsd. #799 tracks the work needed for clean shutdown.

@sjdv1982
Copy link

As long as this is unsupported, could you make it fail loudly? For now, it just hangs

@fabioz
Copy link
Contributor

fabioz commented Apr 8, 2019

As a note for anyone using multiprocessing arriving here, on Python 3 it's usually possible to add:

import multiprocessing
multiprocessing.set_start_method('spawn', True)

to the start of the program to make multiprocessing work (because that way multiprocessing will not use fork).

@supertaz
Copy link

supertaz commented Apr 9, 2019

(because that way multiprocessing will not use fork).

True, but incompatible with many basic use cases for multiprocessing.Pool and can lead to running out of memory pretty quickly due to recursive process spawning and unexpected memory usage patterns. spawn() starts a new process with a full memory copy of the parent, which runs from the main entry point, whereas fork() splits off a process from the parent with only the context from the parent that the new process needs to run and continues execution with the next instruction. This difference is why spawn() is slow and fork() is fast. It also means that changing the setting usually results in processes that execute very different code paths from the time they are invoked.

The following common pattern is used in scripts that expect to run a single thread, except to occasionally fork() n scope-limited processes that allow data to be processed in parallel, and is based off a basic use case in the multiprocessing docs:

def parallel_df_func(df, func, cores = 4, partitions = 10):
    df_chunks = np.array_split(df, partitions)
    with Pool(cores) as pool:
        result = pool.map(func, df_chunks)
    return pd.concat(result)

The above code is written for fork(), and trying to debug it with spawn() creates a spawn bomb. Luckily spawn() is slow, so even recursively spawning ~20 more processes per iteration that each load a new copy of a multi-million row dataset from storage only leads to leaking an average of 2-3GB per second until around 10-15 seconds in, where the exponential nature starts to really accelerate the growth. A smaller dataset would lead to faster growth, though it would eventually bottleneck due to I/O if you had enough RAM.

This workaround is a Bad Idea(tm) to try if you're not running (and looking at) top and ready with killall python in a terminal, on a machine with copious amounts of RAM, unless your script is actually designed to use spawn() or to run its entirety in multiple processes. The debugger doesn't catch and stop the runaway processes, so if you're not watching for this behavior, you'll get a nasty surprise pretty quickly.

@int19h
Copy link
Contributor

int19h commented Apr 9, 2019

We're still actively working on code refactoring that is necessary for us to support fork properly. It's a tricky thing to get right, because of issues it has with multi-threading (orphaned locks etc), if it's not immediately followed by exec - and we use threads heavily in the debugger itself, so this applies even when debugging single-threaded programs.

I'm not sure I quite follow your code example, though. Wouldn't it create the same number of processes regardless of how they're spawned? Or are you saying that it's not such a big deal with fork, because they're all going to share most of their memory pages?

@fabioz
Copy link
Contributor

fabioz commented Apr 10, 2019

@supertaz I do agree with you that there may be caveats when changing fork for spawn, especially if you've a use-case optimized for it (so, thanks for that note and sorry if I didn't give the proper warnings before).

Although I'd also like to point that it's very easy to shoot yourself on the foot with fork unless you really do understand its pitfalls, so, I'd say my recommendation would be to use spawn as the default unless you really have a case which absolutely needs fork.

That's especially true on CPython because although fork gives you a copy on write memory, the fact that CPython is reference-counted ends up making it copy on read -- because the reference count of the object is changed that memory has to be effectively copied at that point -- see: https://pycon-2012-notes.readthedocs.io/en/latest/python_linkers_and_virtual_memory.html for a PyCon presentation on it...

And that's besides the regular caveats with locks/uis/file descriptors/buffers/threads/etc ;)

p.s.: that's not to say that there aren't cases where it's a better option nor that the debugger won't support it in the future, as @int19h pointed, we're working on that and I just wanted to present the current workaround to use the debugger until that work is finished.

samhippie added a commit to samhippie/shallow-red that referenced this issue May 12, 2019
even though vs code doesn't work with this project
microsoft/ptvsd#943
@supertaz
Copy link

Or are you saying that it's not such a big deal with fork, because they're all going to share most of their memory pages?

It's got a lot to do with the problem at hand, where there's processing of very large data structures that have been split into chunks. Because fork() is CoW (even if reference counting makes it CoR, this point will hold), when you're trying to accelerate processing of large datasets on NUMA systems with many cores (especially with HT pipelines), and you aren't overwriting data, but instead aggregating results and then handling them in the parent at completion, you have a near-linear improvement of processing times with fork() because you're not copying much (if any) memory into the child. The same can't be said of spawn(), as it is copying memory you don't need to access, and thus is slow. Also, spawn() and fork() work differently as to where the entry point is for the child. Because python's multiprocessing package abstracts some of this away (though it's accessible), and because the code is designed to be short-lived and operate as parallel inline execution of a single function, instead of being part of a long-lived pool that operates in a dispatcher-worker pattern, there are some pretty big differences in how the code behaves. fork() followed by exec() is similar in behavior to spawn() plus CoW, but other fork() usage patterns are not, and I think this is where any confusion is coming in. The above code isn't resetting state and entry point via exec(), because the children have one line of code to execute (which calls others, but there is no branching) before they return data to the parent and die. This is efficient with fork() only, and is a workaround for Python's GIL hamstringing threading.

The pattern I supplied is used in data science and data engineering to handle processing of very large datasets via pandas DataFrames and other similar structures. I fully acknowledge how someone who didn't know how to properly use fork() or spawn() could wreak havoc on a system (typically their own system, as most multi-occupancy systems limit a user's resources), but I've been using both for decades and understand the case for one over the other. spawn() is useful for cases where it is appropriate, it's just not useful in these types of cases, and I was trying to illustrate that switching blindly to code not written for spawn() could be just as bad as not knowing how to use fork(), just slower. It's a good illustration, however, of how either fork() or spawn() can be abused if the wrong one is used in code that is designed for a use case where only the other is appropriate.

Since someone who doesn't truly understand how fork() and spawn() work, but was trying to debug something based on a common pattern that is often recommended for speeding up processing under Python and wasn't watching system resources (and/or didn't know to) might not realize that they were creating a recursive memory monster, I thought it important to illustrate that the workaround isn't a workaround for anything that uses multiprocessing for short-lifetime, inline purposes where fork() without exec() is the appropriate solution. Also, I was trying to illustrate the importance of a solution to the debugging issue (as difficult as it may be to create one), over offering a workaround without explaining why said workaround is limited in scope and applicability.

Sorry it took me so long to respond, I hadn't noticed the notification icon, and I missed the notification emails. I hope I addressed the questions, and feel free to ask for more clarifications, as mine my have further obfuscated the point (my concentration is flagging at present).

@memeplex
Copy link

For me setting start method to spawn or forkserver just avoids the exception thrown for fork, but breakpoints in subprocesses are still not working at all. Do they work for you?

@memeplex
Copy link

memeplex commented Sep 29, 2019

Ah, it seems that I have to create a launch configuration and \enable subProcess. Is that true or can I avoid this step?

@karthiknadig
Copy link
Member Author

@memeplex that is correct. "subProcess": true, has to be set in the debug config for the workaround..

@Breich90
Copy link

Breich90 commented Nov 7, 2019

Any update here? Spawning isn't a feasible alternative.

@int19h
Copy link
Contributor

int19h commented Nov 8, 2019

@Breich90 My apologies - we haven't done a good job of tracking the work on the new implementation in a way that's easy to follow. At this point, it's mostly centralized in this issue, which references a bunch of others: #1706

TL;DR is that the new implementation is already committed and does fix the fork issue, but we need to do a few more bug fixes and polish before we can ship it as stable. There's a pre-release build for it, ptvsd 5.0.0a7, but it has to be mated with a supporting build of VSCode - and there isn't a ready-made one yet, so there's no easy way to test it without building things locally. We'll have something available for testing soon, though.

@AirVetra
Copy link

Hello, I'm very sorry, just don't understand what should we do - just wait for the solution and look for #1706?

The error is the following:

Traceback (most recent call last):
File "/home/airvetra/.vscode-server/extensions/ms-python.python-2019.11.50794/pythonFiles/ptvsd_launcher.py", line 43, in
main(ptvsdArgs)
File "/home/airvetra/.vscode-server/extensions/ms-python.python-2019.11.50794/pythonFiles/lib/python/old_ptvsd/ptvsd/main.py", line 432, in main
run()
File "/home/airvetra/.vscode-server/extensions/ms-python.python-2019.11.50794/pythonFiles/lib/python/old_ptvsd/ptvsd/main.py", line 316, in run_file
runpy.run_path(target, run_name='main')
File "/usr/lib/python3.7/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/usr/lib/python3.7/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/airvetra/ml/ML/autots/run_local_test.py", line 118, in
main()
File "/home/airvetra/ml/ML/autots/run_local_test.py", line 114, in main
run(dataset_dir, code_dir)
File "/home/airvetra/ml/ML/autots/run_local_test.py", line 80, in run
ingestion_process.start()
File "/usr/lib/python3.7/multiprocessing/process.py", line 112, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/lib/python3.7/multiprocessing/popen_fork.py", line 20, in init
self._launch(process_obj)
File "/usr/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/usr/lib/python3.7/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'run..run_ingestion'

@int19h
Copy link
Contributor

int19h commented Nov 26, 2019

In general, you'll need a recent pre-release of ptvsd, and the corresponding version of VSCode, to have fork supported. While the issue is still open because we're still working on some bits and pieces, the bulk of it is already there in ptvsd 5 alphas.

However, this particular call stack doesn't look to me like a typical ptvsd failure do to fork. It mentions pickling and popen_spawn_posix, so this sounds more like what happens if you do set_start_method("spawn")? In this case the error is due to some data that you're trying to share between your processes not being pickle-able.

@AirVetra
Copy link

Pavel, you are definitely right - i added this set_start_method ("spawn") and after adding it the above error started to occur. Before it there was the multiprocessing error RuntimeError: already started.

So, could you advise how to solve the issue with debugging such code - I could share if it could help...

@int19h
Copy link
Contributor

int19h commented Dec 10, 2019

This is complete. The remaining work to ship the new adapter is being tracked by #1706

@gauravmunjal13
Copy link

Hi Team,

I came across a strange issue using VS code for debugging PyTorch code on enumerator(data_loader) line.

The error is:
RuntimeError: already started
E00019.065: Exception escaped from start_client
...
AssertionError: can only join a child process

This is happening because of doing multi-processing in data-loader. This has happened specifically when I updated the VS code to 1.46.1 yesterday.

On searching on the net, I figured out the solution as to set num_workers=0 or using the following code before enumerating data loader:
import multiprocessing
multiprocessing.set_start_method('spawn', True)

Is there another way to resolve this issue?

Thanks and regards,
Gaurav Kumar

@galfaroth
Copy link

galfaroth commented Jun 22, 2020

I up @guaravmunjal13 After update of vscode I have the same exception. Adding his multiprocessing lines doesn't work for me. Before the debugging was working correctly. Setting num_workers to 0 works but is slow for my training.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests