Skip to content
This repository has been archived by the owner on Aug 2, 2023. It is now read-only.

torch.utils.data failing probably due to atexit issues #1056

Closed
brettcannon opened this issue Nov 28, 2018 · 14 comments
Closed

torch.utils.data failing probably due to atexit issues #1056

brettcannon opened this issue Nov 28, 2018 · 14 comments

Comments

@brettcannon
Copy link
Member

microsoft/vscode-python#3439

torch.utils.data registers an atexit callback. The output in the original issue suggests there's something not being handled quite right in this instance.

@fangxu622
Copy link

fangxu622 commented Jan 6, 2019

I have a similar issue.

VS Code version: Code 1.29. and 1.30
ubuntu 18.04
Python:2.7.15 and 2.7.13 :: Anaconda, pytorch 0.4 and 0.3
thr output like this

 cd /home/fangxu/fangxu/deep-speed-constrained-ins ; env "PYTHONIOENCODING=UTF-8" "PYTHONUNBUFFERED=1" /home/fangxu/anaconda3/envs/dl-imu/bin/python /home/fangxu/.vscode/extensions/ms-python.python-2018.12.1/pythonFiles/ptvsd_launcher.py --default --client --host localhost --port 40687 /home/fangxu/fangxu/deep-speed-constrained-ins/python/DCI-training-0.0.2.py
Backend Qt5Agg is interactive backend. Turning interactive mode on.
We've got an error while stopping in post-mortem: <type 'exceptions.RuntimeError'>
Traceback (most recent call last):

and i add the register code ,it's not solved

@yonidishon
Copy link

I found this workaround to resolve this issue: https://stackoverflow.com/questions/53660465/vscode-bug-with-pytorch-dataloader

@tlind
Copy link

tlind commented Jan 24, 2019

I am experiencing a similar problem with the MxNet DL framework: When I invoke a gluon.data.DataLoader with num_workers > 0 and while the VSCode debugger is attached, the Python code also crashes with an "error while stopping in post-mortem" (exceptions.RuntimeError). The stack trace contains:

Error in atexit._run_exitfuncs:Error in sys.exitfunc:
Error in sys.exitfunc:

@yoon28
Copy link

yoon28 commented Apr 4, 2019

isn't there any progress for this issue? I have the same problem.

@fabioz
Copy link
Contributor

fabioz commented Apr 5, 2019

As a note, right now it's not possible to use the debugger for subprocesses on Linux on Python 2.7 because fork is not supported (I'm just guessing here, but if num_workers==0 makes it work, that's probably what's happening).

Related issue: #943

@yoon28
Copy link

yoon28 commented Apr 6, 2019

As a note, right now it's not possible to use the debugger for subprocesses on Linux on Python 2.7 because fork is not supported (I'm just guessing here, but if num_workers==0 makes it work, that's probably what's happening).

Related issue: #943

I am using python 3.6 on Ubuntu 18.04 and has the SAME problem.

@fabioz
Copy link
Contributor

fabioz commented Apr 6, 2019

On Python 3.6 you have to put the code below at the start of your code:

import multiprocessing
multiprocessing.set_start_method('spawn', True)

to set it not to use fork... can you try that to see if it fixes it for you?

@yoon28
Copy link

yoon28 commented Apr 8, 2019

On Python 3.6 you have to put the code below at the start of your code:

import multiprocessing
multiprocessing.set_start_method('spawn', True)

to set it not to use fork... can you try that to see if it fixes it for you?

WOW! It works!
Your solution completely resolves the problem. Thanks.
May I ask why this solution works?
I am curious about how it works.

@fabioz
Copy link
Contributor

fabioz commented Apr 15, 2019

It works because instead of having multiprocess use fork to create a new process (which is still not supported by the debugger), it creates a new (clean) process (which is supported by the debugger).

This issue (supporting fork in the debugger) is being tracked in #943, so, closing this one as a duplicate.

@fabioz fabioz closed this as completed Apr 15, 2019
@zkailinzhang
Copy link

zkailinzhang commented Apr 18, 2019

i also put this code at the start of my code

    import multiprocessing 
    multiprocessing.set_start_method('spawn', True)

but


File "/home/anaconda3/envs/tfgpu12/lib/python3.6/multiprocessing/spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

@shield218
Copy link

On Python 3.6 you have to put the code below at the start of your code:

import multiprocessing
multiprocessing.set_start_method('spawn', True)

to set it not to use fork... can you try that to see if it fixes it for you?

It works for me, thank you so much for sharing this simple and effective solution

@MaxxRe3
Copy link

MaxxRe3 commented May 30, 2019

i also put this code at the start of my code

    import multiprocessing 
    multiprocessing.set_start_method('spawn', True)

but


File "/home/anaconda3/envs/tfgpu12/lib/python3.6/multiprocessing/spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

I had the same issue, that it would just raise the exception with the freeze_support() suggestion. I was able to solve this by setting num_workers = 0 in my DataLoader, as mentioned here

@karthiknadig
Copy link
Member

karthiknadig commented May 30, 2019

@MaxxRe3 This occurs if the spwan happens before the main is fully loaded. you may have to reorganize your code a bit like below.

def main():
    # your code here

if __name__ == '__main__':
    main()

for example:

import multiprocessing
import concurrent.futures

FIBS = [28, 10]

def fib(n):
    print(n)
    if n < 2:
        return 1
    return fib(n - 1) + fib(n - 2)


def main():
    multiprocessing.set_start_method('spawn', True)
    with concurrent.futures.ProcessPoolExecutor() as executor:
        executor.map(fib, FIBS)


if __name__ == '__main__':
    main()

Update: changed example based on comment by @int19h below.

@int19h
Copy link
Contributor

int19h commented May 30, 2019

Note that set_start_method must also be inside main(), or otherwise guarded by if __name__ == '__main__':. Otherwise it gets executed twice in the parent when it imports the module while preparing to run the child, and the method will raise an exception on the second call.

https://docs.python.org/3/library/multiprocessing.html

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests