Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running compiled pyflex under cuda-11.1 failed, undefined symbol: cudaSetupArgument #6

Open
julyfun opened this issue Sep 18, 2023 · 7 comments

Comments

@julyfun
Copy link

julyfun commented Sep 18, 2023

Environment

  • uname -a result
Linux julyfun-Lenovo-XiaoXinAir-14IIL-2020 5.19.0-46-generic #47~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jun 21 15:35:31 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
  • GPU

nvidia MX-350, cuda version 12.2 (I was building pyflex in docker so it shouldn't matter)

Error

In #5, I modified the Dockerfile FROM command into nvidia/cuda:11.1.1-devel-ubuntu18.04 and ran . ./prepare.sh && ./compile.sh successfully. But when I was trying to run the demo(python ... command in README), it failed with:

Traceback (most recent call last):
  File "/home/julyfun/Documents/GitHub/julyfun/cloth-funnels-test/cloth_funnels/run_sim.py", line 1, in <module>
    from cloth_funnels.utils.utils import (
  File "/home/julyfun/Documents/GitHub/julyfun/cloth-funnels-test/cloth_funnels/utils/utils.py", line 3, in <module>
    from cloth_funnels.environment import SimEnv
  File "/home/julyfun/Documents/GitHub/julyfun/cloth-funnels-test/cloth_funnels/environment/__init__.py", line 1, in <module>
    from .simEnv import SimEnv
  File "/home/julyfun/Documents/GitHub/julyfun/cloth-funnels-test/cloth_funnels/environment/simEnv.py", line 9, in <module>
    from cloth_funnels.utils.env_utils import (
  File "/home/julyfun/Documents/GitHub/julyfun/cloth-funnels-test/cloth_funnels/utils/env_utils.py", line 8, in <module>
    import pyflex
ImportError: /home/julyfun/Documents/GitHub/julyfun/cloth-funnels-test/cloth_funnels/PyFlex/bindings/build/pyflex.cpython-39-x86_64-linux-gnu.so: undefined symbol: cudaSetupArgument

That could be because cudaSetupArgument is deprecated since a newer version than cuda9.2.

@Rudy112
Copy link

Rudy112 commented Oct 1, 2023

Hi @julyfun , I also have the same problem. After changing the Docker file and successfully compilation, the pyflex does not work because of the cudaSetupArgument issue. Have you eventually solved this?

@julyfun
Copy link
Author

julyfun commented Oct 6, 2023

Hi @julyfun , I also have the same problem. After changing the Docker file and successfully compilation, the pyflex does not work because of the cudaSetupArgument issue. Have you eventually solved this?

you can try pulling docker pull yunzhuli/pyflex_16_04_cuda_9_1 (I this got from a README in pyflex official repo), this cuda version is old enough where cudaSetupArgument is not deprecated...

@zcswdt
Copy link

zcswdt commented Nov 20, 2023

Hi @julyfun , I also have the same problem. After changing the Docker file and successfully compilation, the pyflex does not work because of the cudaSetupArgument issue. Have you eventually solved this?

Hello, I’m asking for paid help. Have you successfully run through the author’s training code?

@julyfun
Copy link
Author

julyfun commented Nov 20, 2023

Hi @julyfun , I also have the same problem. After changing the Docker file and successfully compilation, the pyflex does not work because of the cudaSetupArgument issue. Have you eventually solved this?

Hello, I’m asking for paid help. Have you successfully run through the author’s training code?

yes. I ran it under Ubuntu 16.04 GTX 1080Ti, cuda 9.2, python==3.7.16, pytorch==1.7.1+cu92, pytorch-lightning==1.5

@zcswdt
Copy link

zcswdt commented Nov 20, 2023 via email

@zcswdt
Copy link

zcswdt commented Nov 20, 2023

Hi @julyfun , I also have the same problem. After changing the Docker file and successfully compilation, the pyflex does not work because of the cudaSetupArgument issue. Have you eventually solved this?

Hello, I’m asking for paid help. Have you successfully run through the author’s training code?

yes. I ran it under Ubuntu 16.04 GTX 1080Ti, cuda 9.2, python==3.7.16, pytorch==1.7.1+cu92, pytorch-lightning==1.5

Hello, can you add your qq? My qq number is 810190882. Please add it if you are kind and I will describe to you the problems I encountered. By the way, I'll give you a tip to say thank you.

@zcswdt
Copy link

zcswdt commented Nov 22, 2023

cuda 9.2, python==3.7.16, pytorch==1.7.1+cu92, pytorch-lightning==1.5

Hello, sorry to bother you. I would like to ask you a few questions. I installed the environment on the ubuntu18 system exactly according to the author's tutorial. After the installation, the version supported by torch is 11.7, and the cuda version of my local nvcc -V is 10.0. I can also run through the author's test and training code, but as the number of steps increases during training, it will eat up my memory and then kill my program. I've been configuring it for two months and still haven't gotten it right. I saw in your last reply that you passed the training. Will there be a memory leak during training? If possible, can you add contact information? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants