Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cx_Freeze with torch.multiprocessing using wrong source in child processes #2376

Closed
dmagee opened this issue May 1, 2024 · 31 comments · Fixed by #2382
Closed

cx_Freeze with torch.multiprocessing using wrong source in child processes #2376

dmagee opened this issue May 1, 2024 · 31 comments · Fixed by #2382

Comments

@dmagee
Copy link

dmagee commented May 1, 2024

Prerequisite

  • Make sure no duplicated issue has already been reported. You should look for closed issues, too.
  • Make sure you are not asking us to help to solve your specific issue. GitHub issues are opened mainly for development purposes. If you want to ask someone to help to solve your problem, go to some community site like StackOverflow, etc.
  • Make sure your problem is not derived from packaging (e.g. Homebrew).

Describe the bug
On linux when I use cx_Freeze with a python script that uses torch.multiprocessing to create multiple threads (which essentially calls multiprocessing) the child processes seem to try to use the original python files (for the program) and original python environment (for python modules), not the ones in the build directory. The initial result of this is errors about the program source .py files not being found. Other errors can occur if the source is copied into the build folder.

To Reproduce
Environment is linux, python 3.11, pytorch v2.2.2+cu121 [Note: This problem does not occur on windows]

Minimal source (Minimal.py):

import os
os.environ['KERAS_BACKEND']="torch"


import torch

def per_device_launch_fn(current_gpu_index, num_gpu):

    for i in range(1,1000):
        print("Train...")

num_gpu =4

if __name__ == "__main__":

    print("Starting multiprocessing:"+str(num_gpu))
    torch.multiprocessing.start_processes(
                    per_device_launch_fn,
                    args=(num_gpu,),
                    nprocs=num_gpu,
                    join=True,
                    start_method="spawn",
            )

build script is

import sys
from cx_Freeze import setup, Executable

import sys
sys.setrecursionlimit(5000)

import os
os.environ['KERAS_BACKEND']="torch"

build_exe_options = {"packages": ["onnx","numpy","torch","PIL", "torchvision","keras","sympy","integr
als","multiprocessing"]}

setup(name="Mimimal",version="1.0",description="Minimal",options={"build_exe": build_exe_options},exe
cutables=[Executable("Minimal.py")])

Expected behavior
I would expect the pyc versions of code in the build folder to be used under all circumstances (even by child processes), not the original ones.

Desktop (please complete the following information):

  • Platform information (e.g. Ubuntu Linux 22.04): Linux (not sure version, it's an HPC)
  • OS architecture (e.g. amd64): intel64
  • cx_Freeze version [e.g. 6.11]:6.15.16
  • Python version [e.g. 3.10]: 3.11
@dmagee dmagee closed this as completed May 1, 2024
@dmagee dmagee reopened this May 5, 2024
@dmagee
Copy link
Author

dmagee commented May 5, 2024

Re-opening as my fix I previously posted doesn't actually work (unnless source is present in launch folder).

@marcelotduarte
Copy link
Owner

marcelotduarte commented May 5, 2024

On linux when I use cx_Freeze with a python script that uses torch.multiprocessing to create multiple threads (which essentially calls multiprocessing) the child processes seem to try to use the original python files (for the program) and original python environment (for python modules), not the ones in the build directory. The initial result of this is errors about the program source .py files not being found. Other errors can occur if the source is copied into the build folder.

This information is for debug. This can be changed with replace_paths.

The real bug however must be the use of multiprocessing. Using stdlib's multiprocessing, we need to use freeze_support, but torch.multiprocessing should not have this function and so a way around this must be analyzed.

@dmagee
Copy link
Author

dmagee commented May 5, 2024

I tried freeze_support(), which works/is needed on windows, but not linux. I'm not sure what paths I would replace. It appears to look for the python files of the app in the folder that the executable is run from, and throws an error that itcan't find them.

e.g. running from the build folder....

FileNotFoundError: [Errno 2] No such file or directory: '/some/folder/build/exe.linux-x86_64-3.11/Minimal.py'
(repeated once per child process)

If you run it from another location it complains they are not in that location (always with the full path of that location).

D.

@marcelotduarte
Copy link
Owner

Can you test with cx_Freeze 7.0 and with dev release?

You can test with the latest development build:
pip install --force --no-cache --pre --extra-index-url https://marcelotduarte.github.io/packages/ cx_Freeze
For conda-forge the command is:
conda install -y --no-channel-priority -S -c https://marcelotduarte.github.io/packages/conda cx_Freeze

@dmagee
Copy link
Author

dmagee commented May 5, 2024

There's still an issue:

Still issue with finding the source:

$ ./Minimal
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.dat
--MAXPATHLEN: 4096
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.zip
Starting multiprocessing:4
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.dat
--MAXPATHLEN: 4096
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.zip
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.dat
--MAXPATHLEN: 4096
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.zip
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.dat
--MAXPATHLEN: 4096
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.zip
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.dat
--MAXPATHLEN: 4096
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.zip
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.dat
--MAXPATHLEN: 4096
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.zip
Traceback (most recent call last):
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/cx_Freeze/initscripts/__startup__.py", line 141, in run
Traceback (most recent call last):
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/cx_Freeze/initscripts/__startup__.py", line 141, in run
    module_init.run(name + "__main__")
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/cx_Freeze/initscripts/console.py", line 19, in run
    module_init.run(name + "__main__")
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/cx_Freeze/initscripts/console.py", line 19, in run
    exec(code, module_main.__dict__)
  File "Minimal.py", line 4, in <module>
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/__init__.py", line 49, in <module>
    exec(code, module_main.__dict__)
  File "Minimal.py", line 4, in <module>
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/__init__.py", line 49, in <module>
Traceback (most recent call last):
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/cx_Freeze/initscripts/__startup__.py", line 141, in run
    module_init.run(name + "__main__")
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/cx_Freeze/initscripts/console.py", line 19, in run
    exec(code, module_main.__dict__)
  File "Minimal.py", line 4, in <module>
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/__init__.py", line 49, in <module>
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 79, in freeze_support
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 79, in freeze_support
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 79, in freeze_support
Traceback (most recent call last):
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/cx_Freeze/initscripts/__startup__.py", line 141, in run
    module_init.run(name + "__main__")
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/cx_Freeze/initscripts/console.py", line 19, in run
    exec(code, module_main.__dict__)
  File "Minimal.py", line 4, in <module>
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/__init__.py", line 49, in <module>
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 79, in freeze_support
    spawn_main(**kwds)
    spawn_main(**kwds)
    spawn_main(**kwds)
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
    spawn_main(**kwds)
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
    exitcode = _main(fd, parent_sentinel)
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 131, in _main
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 131, in _main
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
    exitcode = _main(fd, parent_sentinel)
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 131, in _main
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 131, in _main
    prepare(preparation_data)
    prepare(preparation_data)
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare
    prepare(preparation_data)
    prepare(preparation_data)
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
    _fixup_main_from_path(data['init_main_from_path'])
    _fixup_main_from_path(data['init_main_from_path'])
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
    _fixup_main_from_path(data['init_main_from_path'])
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
    main_content = runpy.run_path(main_path,
    main_content = runpy.run_path(main_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 290, in run_path
  File "<frozen runpy>", line 290, in run_path
  File "<frozen runpy>", line 254, in _get_code_from_file
  File "<frozen runpy>", line 254, in _get_code_from_file
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/Minimal.py'
FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/Minimal.py'
  File "<frozen runpy>", line 290, in run_path
  File "<frozen runpy>", line 254, in _get_code_from_file
FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/Minimal.py'
    main_content = runpy.run_path(main_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 290, in run_path
  File "<frozen runpy>", line 254, in _get_code_from_file
FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/Minimal.py'
Traceback (most recent call last):
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/cx_Freeze/initscripts/__startup__.py", line 141, in run
    module_init.run(name + "__main__")
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/cx_Freeze/initscripts/console.py", line 19, in run
    exec(code, module_main.__dict__)
  File "Minimal.py", line 28, in <module>
  File "Minimal.py", line 18, in main
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
    while not context.join():
              ^^^^^^^^^^^^^^
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 148, in join
    raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 255

@marcelotduarte
Copy link
Owner

From what I understand you are using conda for Linux. What command did you use to install this specific version of Torch?

@dmagee
Copy link
Author

dmagee commented May 6, 2024

Actually I set up the environment with conda, but used pip to install the modules as I couldn't get the versions I needed with conda. I think the command was:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
(from https://pytorch.org/get-started/locally/)

@marcelotduarte
Copy link
Owner

Using your Minimal.py and command line: cxfreeze --script Minimal.py build_exe --replace-paths '*='
The patch works with Linux pip:
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cpu

Will be available in cx_Freeze 7.1.0.dev16

@marcelotduarte
Copy link
Owner

marcelotduarte commented May 7, 2024

https://cx-freeze--2382.org.readthedocs.build/en/2382/faq.html#multiprocessing-support

You can test the patch in the latest development build:
pip install --force --no-cache --pre --extra-index-url https://marcelotduarte.github.io/packages/ cx_Freeze
For conda-forge the command is:
conda install -y --no-channel-priority -S -c https://marcelotduarte.github.io/packages/conda cx_Freeze

@dmagee
Copy link
Author

dmagee commented May 7, 2024

So I did:

cxfreeze --script Minimal.py build_exe --replace-paths '*='

And, I now get the error:

FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/=/Minimal.py'

@marcelotduarte
Copy link
Owner

marcelotduarte commented May 7, 2024

Please check if you have cx_Freeze 7.1.0.dev16 with:
cxfreeze --version

@dmagee
Copy link
Author

dmagee commented May 7, 2024

Actually it was cxfreeze 7.1.0-dev15. Not sure how that happened, as I followed your instructions. I just tried it again and now I have 7.1.0-dev16. However, same output:

FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/=/Minimal.py'

@marcelotduarte
Copy link
Owner

Uninstall cx_Freeze and reinstall. Are you using the pip or conda version? Probably some conflict.

@dmagee
Copy link
Author

dmagee commented May 7, 2024

I was using PIP. I uninstalled and re-installed via pip, and same error. I then tried uninstalling via pip, and installing via conda, and I get:

$ conda install -y --no-channel-priority -S -c https://marcelotduarte.github.io/packages/conda cx_Freeze
Retrieving notices: ...working... done
Collecting package metadata (current_repodata.json): failed

UnavailableInvalidChannel: HTTP 404 NOT FOUND for channel packages/conda https://marcelotduarte.github.io/packages/conda

The channel is not accessible or is invalid.

You will need to adjust your conda configuration to proceed.
Use conda config --show channels to view your configuration's current state,
and use conda config --show-sources to view config file locations.

@marcelotduarte
Copy link
Owner

Initially, I did two tests. If you can do the same, to eliminate any bugs. I created a new environment using the system python and another using Conda. But if you test this second option the way I tested it, it's already good.
Then, I installed cx_Freeze and PyTorch:

pip install --force --no-cache --pre --extra-index-url https://marcelotduarte.github.io/packages/ cx_Freeze
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cpu

Note that I used the cpu version, use that too. Then I will test using Cuda.

@dmagee
Copy link
Author

dmagee commented May 7, 2024

In the meantime I re-installed using conda by doing:

wget https://marcelotduarte.github.io/packages/conda/linux-64/cx_freeze-7.1.0.dev16-py311h459d7ec_0.conda
conda install cx_freeze-7.1.0.dev16-py311h459d7ec_0.conda
(after uninstalling using pip)

I also got the same error.

Note: The whole point of torch.multiprocessing is to use multiple GPUs, so it working just on CPU isn't that useful.

I'll try to create an entirely new environment from scratch with conda and see if it works...

@dmagee
Copy link
Author

dmagee commented May 7, 2024

I created an entirely new environment with just cx_Freeze and torch (GPU version) with the same issue, this is my history;

1050 conda create --name cxtest python=3.11
1051 conda activate cxtest
1052 pip install --force --no-cache --pre --extra-index-url https://marcelotduarte.github.io/packages/ cx_Freeze
1053 pip3 install torch torchvision torchaudio
1054 cd ../..
1055 python Minimal.py ---- Note: This works fine
1056 rm -r build
1057 cxfreeze --script Minimal.py build_exe --replace-paths '*='
1058 cd build/exe.linux-x86_64-3.11/
1059 ./Minimal

Output (Note: Ever so slightly different from before as getting SIGTERM that I didn't before, same file missing error though):

$ ./Minimal
Starting multiprocessing:4
Starting multiprocessing:4
Traceback (most recent call last):
File "=/startup.py", line 141, in run
File "=/console.py", line 19, in run
File "=/Minimal.py", line 28, in
File "=/Minimal.py", line 18, in main
File "=/torch/multiprocessing/spawn.py", line 208, in start_processes
File "=/multiprocessing/context.py", line 243, in get_context
File "=/multiprocessing/init.py", line 56, in
File "=/multiprocessing/init.py", line 53, in _get_freeze_context
File "=/multiprocessing/spawn.py", line 79, in freeze_support
File "=/multiprocessing/spawn.py", line 122, in spawn_main
File "=/multiprocessing/spawn.py", line 131, in _main
File "=/multiprocessing/spawn.py", line 246, in prepare
File "=/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
File "", line 290, in run_path
File "", line 254, in _get_code_from_file
FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/=/Minimal.py'
Starting multiprocessing:4
Traceback (most recent call last):
File "=/startup.py", line 141, in run
File "=/console.py", line 19, in run
File "=/Minimal.py", line 28, in
File "=/Minimal.py", line 18, in main
File "=/torch/multiprocessing/spawn.py", line 208, in start_processes
File "=/multiprocessing/context.py", line 243, in get_context
File "=/multiprocessing/init.py", line 56, in
File "=/multiprocessing/init.py", line 53, in _get_freeze_context
File "=/multiprocessing/spawn.py", line 79, in freeze_support
File "=/multiprocessing/spawn.py", line 122, in spawn_main
File "=/multiprocessing/spawn.py", line 131, in _main
File "=/multiprocessing/spawn.py", line 246, in prepare
File "=/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
File "", line 290, in run_path
File "", line 254, in _get_code_from_file
FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/=/Minimal.py'
Starting multiprocessing:4
Traceback (most recent call last):
File "=/startup.py", line 141, in run
File "=/console.py", line 19, in run
File "=/Minimal.py", line 28, in
File "=/Minimal.py", line 18, in main
File "=/torch/multiprocessing/spawn.py", line 208, in start_processes
File "=/multiprocessing/context.py", line 243, in get_context
File "=/multiprocessing/init.py", line 56, in
File "=/multiprocessing/init.py", line 53, in _get_freeze_context
File "=/multiprocessing/spawn.py", line 79, in freeze_support
File "=/multiprocessing/spawn.py", line 122, in spawn_main
File "=/multiprocessing/spawn.py", line 131, in _main
File "=/multiprocessing/spawn.py", line 246, in prepare
File "=/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
File "", line 290, in run_path
File "", line 254, in _get_code_from_file
FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/=/Minimal.py'
Starting multiprocessing:4
Traceback (most recent call last):
File "=/startup.py", line 141, in run
File "=/console.py", line 19, in run
File "=/Minimal.py", line 28, in
File "=/Minimal.py", line 18, in main
File "=/torch/multiprocessing/spawn.py", line 208, in start_processes
File "=/multiprocessing/context.py", line 243, in get_context
File "=/multiprocessing/init.py", line 56, in
File "=/multiprocessing/init.py", line 53, in _get_freeze_context
File "=/multiprocessing/spawn.py", line 79, in freeze_support
File "=/multiprocessing/spawn.py", line 122, in spawn_main
File "=/multiprocessing/spawn.py", line 131, in _main
File "=/multiprocessing/spawn.py", line 246, in prepare
File "=/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
File "", line 290, in run_path
File "", line 254, in _get_code_from_file
FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/=/Minimal.py'
W0507 18:55:52.264000 140603076392768 ../=/torch/multiprocessing/spawn.py:145] Terminating process 2386939 via signal SIGTERM
W0507 18:55:52.264000 140603076392768 ../=/torch/multiprocessing/spawn.py:145] Terminating process 2386941 via signal SIGTERM
Traceback (most recent call last):
File "=/startup.py", line 141, in run
File "=/console.py", line 19, in run
File "=/Minimal.py", line 28, in
File "=/Minimal.py", line 18, in main
File "=/torch/multiprocessing/spawn.py", line 237, in start_processes
File "=/torch/multiprocessing/spawn.py", line 177, in join
torch.multiprocessing.spawn.ProcessExitedException: process 3 terminated with exit code 255

@marcelotduarte
Copy link
Owner

In the meantime I re-installed using conda by doing:

The conda version has a bug, I'll try to solve it.

cxfreeze --script Minimal.py build_exe --replace-paths '*='
...
Output (Note: Ever so slightly different from before as getting SIGTERM that I didn't before, same file missing error though):

I had told you to use replace_paths exactly to remove the complete path information in the traceback, but I see that it now causes the (previous) error or the SIGTERM. I'll investigate it.

But, using only:
cxfreeze --script Minimal.py build_exe
or with other parameters, like:
cxfreeze --script Minimal.py build_exe --silent
It worked...
It should work with your original or modified setup without 'packages' too.

@dmagee
Copy link
Author

dmagee commented May 8, 2024

Using cxfreeze --script Minimal.py build_exe I get a slightly different error with my new environment (cx_Freeze installed with pip as history above):

(cxtest) $ ./Minimal
Starting multiprocessing:4
Starting multiprocessing:4
Traceback (most recent call last):
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/cx_Freeze/initscripts/startup.py", line 141, in run
module_init.run(name + "main")
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/cx_Freeze/initscripts/console.py", line 19, in run
exec(code, module_main.dict)
File "Minimal.py", line 28, in
File "Minimal.py", line 18, in main
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 208, in start_processes
Starting multiprocessing:4
Traceback (most recent call last):
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/cx_Freeze/initscripts/startup.py", line 141, in run
module_init.run(name + "main")
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/cx_Freeze/initscripts/console.py", line 19, in run
exec(code, module_main.dict)
File "Minimal.py", line 28, in
Starting multiprocessing:4
File "Minimal.py", line 18, in main
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 208, in start_processes
Traceback (most recent call last):
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/cx_Freeze/initscripts/startup.py", line 141, in run
module_init.run(name + "main")
mp = multiprocessing.get_context(start_method)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/context.py", line 243, in get_context
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/cx_Freeze/initscripts/console.py", line 19, in run
mp = multiprocessing.get_context(start_method)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/context.py", line 243, in get_context
exec(code, module_main.dict)
File "Minimal.py", line 28, in
File "Minimal.py", line 18, in main
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 208, in start_processes
mp = multiprocessing.get_context(start_method)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/context.py", line 243, in get_context
Starting multiprocessing:4
Traceback (most recent call last):
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/cx_Freeze/initscripts/startup.py", line 141, in run
module_init.run(name + "main")
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/cx_Freeze/initscripts/console.py", line 19, in run
exec(code, module_main.dict)
File "Minimal.py", line 28, in
File "Minimal.py", line 18, in main
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 208, in start_processes
return super().get_context(method)
return super().get_context(method)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/init.py", line 56, in
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/init.py", line 56, in
return super().get_context(method)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/init.py", line 56, in
mp = multiprocessing.get_context(start_method)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/context.py", line 243, in get_context
return super().get_context(method)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/init.py", line 56, in
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/init.py", line 53, in _get_freeze_context
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/init.py", line 53, in _get_freeze_context
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/init.py", line 53, in _get_freeze_context
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/init.py", line 53, in _get_freeze_context
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 79, in freeze_support
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 79, in freeze_support
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 79, in freeze_support
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 79, in freeze_support
spawn_main(**kwds)
spawn_main(**kwds)
spawn_main(**kwds)
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
spawn_main(**kwds)
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 131, in _main
exitcode = _main(fd, parent_sentinel)
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 131, in _main
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 131, in _main
exitcode = _main(fd, parent_sentinel)
prepare(preparation_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 131, in _main
prepare(preparation_data)
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare
prepare(preparation_data)
prepare(preparation_data)
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
_fixup_main_from_path(data['init_main_from_path'])
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
_fixup_main_from_path(data['init_main_from_path'])
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
_fixup_main_from_path(data['init_main_from_path'])
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
main_content = runpy.run_path(main_path,
main_content = runpy.run_path(main_path,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 290, in run_path
^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 254, in _get_code_from_file
File "", line 290, in run_path
File "", line 290, in run_path
File "", line 254, in _get_code_from_file
FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/Minimal.py'
File "", line 254, in _get_code_from_file
FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/Minimal.py'
FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/Minimal.py'
main_content = runpy.run_path(main_path,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 290, in run_path
File "", line 254, in _get_code_from_file
FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/Minimal.py'
W0508 09:27:04.490000 140290065565504 ../../../../.conda/envs/cxtest/lib/python3.11/site-packages/torch/multiprocessing/spawn.py:145] Terminating process 1901879 via signal SIGTERM
W0508 09:27:04.491000 140290065565504 ../../../../.conda/envs/cxtest/lib/python3.11/site-packages/torch/multiprocessing/spawn.py:145] Terminating process 1901880 via signal SIGTERM
W0508 09:27:04.491000 140290065565504 ../../../../.conda/envs/cxtest/lib/python3.11/site-packages/torch/multiprocessing/spawn.py:145] Terminating process 1901881 via signal SIGTERM
Traceback (most recent call last):
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/cx_Freeze/initscripts/startup.py", line 141, in run
module_init.run(name + "main")
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/cx_Freeze/initscripts/console.py", line 19, in run
exec(code, module_main.dict)
File "Minimal.py", line 28, in
File "Minimal.py", line 18, in main
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 237, in start_processes
while not context.join():
^^^^^^^^^^^^^^
File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 177, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 3 terminated with exit code 255

@marcelotduarte
Copy link
Owner

(cxtest) marcelo@teste7:/mnt/81da54df-d490-4cc4-a259-ffbee7f55c92/testes/2376$ python -VV
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0]
(cxtest) marcelo@teste7:/mnt/81da54df-d490-4cc4-a259-ffbee7f55c92/testes/2376$ pip list
Package                  Version
------------------------ -----------
cx_Freeze                7.1.0.dev16
filelock                 3.14.0
fsspec                   2024.3.1
Jinja2                   3.1.4
MarkupSafe               2.1.5
mpmath                   1.3.0
networkx                 3.3
numpy                    1.26.4
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        8.9.2.26
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.4.127
nvidia-nvtx-cu12         12.1.105
patchelf                 0.17.2.1
pillow                   10.3.0
pip                      24.0
setuptools               69.5.1
sympy                    1.12
torch                    2.3.0
torchaudio               2.3.0
torchvision              0.18.0
triton                   2.3.0
typing_extensions        4.11.0
wheel                    0.43.0
(cxtest) marcelo@teste7:/mnt/81da54df-d490-4cc4-a259-ffbee7f55c92/testes/2376$ 

@dmagee
Copy link
Author

dmagee commented May 8, 2024

I can't see a difference!

(cxtest) ]$ python -VV
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0]
(cxtest) $ pip list
Package Version


cx_Freeze 7.1.0.dev16
filelock 3.14.0
fsspec 2024.3.1
Jinja2 3.1.4
MarkupSafe 2.1.5
mpmath 1.3.0
networkx 3.3
numpy 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.1.105
patchelf 0.17.2.1
pillow 10.3.0
pip 24.0
setuptools 69.5.1
sympy 1.12
torch 2.3.0
torchaudio 2.3.0
torchvision 0.18.0
triton 2.3.0
typing_extensions 4.11.0
wheel 0.43.0

@dmagee
Copy link
Author

dmagee commented May 8, 2024

Are you sure you don't have the source files in the folder you are running the executable from? It's the only thing I can think of.

@marcelotduarte
Copy link
Owner

Now, I understand the situation.
You can test the fix in the latest development build (cx_Freeze 7.1.0.dev18):
pip install --force --no-cache --pre --extra-index-url https://marcelotduarte.github.io/packages/ cx_Freeze

@dmagee
Copy link
Author

dmagee commented May 10, 2024

This version just hangs. You run the program and it outputs absolutely nothing to the screen, and doesn't return.

EDIT: If you leave it long enough, it does actually run ok. I'm just timing it now to see how log, but it was more than a few miniutes.

History:
1002 conda activate cxtest
1003 pip uninstall cx_Freeze
1004 pip install --force --no-cache --pre --extra-index-url https://marcelotduarte.github.io/packages/ cx_Freeze
1005 cxfreeze --script Minimal.py build_exe
1006 cd build/exe.linux-x86_64-3.11/
1007 ./Minimal

@marcelotduarte
Copy link
Owner

I changed the code a bit to check __file__:

import torch

def per_device_launch_fn(current_gpu_index, num_gpu):

    for i in range(1, 1000):
        print("Train...")

num_gpu = 4

if __name__ == "__main__":
    print("Starting multiprocessing:", num_gpu, __file__)
    torch.multiprocessing.start_processes(
                    per_device_launch_fn,
                    args=(num_gpu,),
                    nprocs=num_gpu,
                    join=True,
                    start_method="spawn",
            )

$ time python Minimal.py

real	0m2,951s
user	0m6,175s
sys	0m2,158s

$ cxfreeze --script Minimal.py build_exe
$ (cd build/exe.linux-x86_64-3.11/ && time ./Minimal)

real	0m8,011s
user	0m6,434s
sys	0m2,768s

In the next run, the time is similar to the time used by the python command:

real	0m2,645s
user	0m5,945s
sys	0m2,318s

And next time too:

real	0m2,875s
user	0m5,661s
sys	0m2,496s

But, using to build:
$ cxfreeze --script Minimal.py build_exe --silent --no-compress --zip-filename=
Running time is a little shorter on the first run:

real	0m5,146s
user	0m6,148s
sys	0m2,584s

@dmagee
Copy link
Author

dmagee commented May 13, 2024

I think the timing thing may have been system related (it's a shared computer) one run took 1.5hours last week, but today it's not taking that long. One other issue I did notice though is that every sub-process in the frozen version the following is true:

if name == "main":

resulting in this code being called N times, whereas in the python version it's only called once. This doesn't matter in the minimal example (the training loop is called 4 times with different values of current_gpu_index), but for my real program the logic is a bit more complex in main() as it checks sys.argv in the main process, which results in different behaviour in python and frozen version. I maybe able to re-write the code to get round this, but it does strike me as a bug, as presumably torch.multiprocessing must be doing something to ensure per_device_launch_fn() is called in the python version, whereas in the frozen version it is being called via main(). I'm doing some testing to see if this is significant.

Edit: Child processed seem to be called with the following (additional*?) arguments:

--multiprocessing-fork tracker_fd=XX pipe_handle=YY

Where XX is the same for all children, and YY is different for each child. I'm assuming in the python version the torch.multiprocessing code reads these and puts sys.argv back how you might expect.

[* My program has no arguments, so it's not clear if they are additional, or replacements]

@marcelotduarte
Copy link
Owner

The hook that I used to patch multiprocessing is based on #264 and later I discovered a patch similar (##501 (comment)), even open python/cpython#104607.
So I don't see much to do. I thought of a possibility, I did some tests and I didn't see a difference, of course, using that test you gave me, not something big.

... but for my real program the logic is a bit more complex in main() as it checks sys.argv in the main process, which results in different behaviour in python and frozen version ...

I don't think it's very different, see how the spawn is described.

@dmagee
Copy link
Author

dmagee commented May 13, 2024

Update: Simply doing this works round this:

if name == "main":
    no_args = len(sys.argv)
    if no_args>1 and sys.argv[1]=="--multiprocessing-fork":
            print("Is fork-child")
            torch.multiprocessing.start_processes(
                                per_device_launch_fn,
                                args=(num_gpu,),
                                nprocs=num_gpu,
                                join=True,
                                start_method="spawn",
                        )
    else:
    # Normal initialisation for parent process

To be clear, this is not necessary in the python version.

@marcelotduarte
Copy link
Owner

Release 7.1.0 is out!
Documentation

I'll continue to work on pytorch hook to optimize it.

@marcelotduarte
Copy link
Owner

marcelotduarte commented Jun 7, 2024

Based on information from you and others, I improved the hook for multiprocessing.
You can test the patch in the latest development build:
pip install --force --no-cache --pre --extra-index-url https://marcelotduarte.github.io/packages/ cx_Freeze
The provisional documentation: https://cx-freeze--2443.org.readthedocs.build/en/2443/faq.html#multiprocessing-support
The updated script is:

import torch
from multiprocessing import freeze_support

def per_device_launch_fn(current_gpu_index, num_gpu):

    for i in range(1, 1000):
        print("Train...")

num_gpu = 4

if __name__ == "__main__":
    freeze_support()
    print("Starting multiprocessing:", num_gpu, __file__)
    torch.multiprocessing.start_processes(
                    per_device_launch_fn,
                    args=(num_gpu,),
                    nprocs=num_gpu,
                    join=True,
                    start_method="spawn",
            )

@marcelotduarte
Copy link
Owner

Release 7.1.1 is out!
Documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants