Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during executing train.py #33

Closed
JaehyeokJang opened this issue Jan 7, 2021 · 3 comments
Closed

Error during executing train.py #33

JaehyeokJang opened this issue Jan 7, 2021 · 3 comments
Assignees
Labels
question Further information is requested

Comments

@JaehyeokJang
Copy link

Hi, I have some problem during executing train.py of e3d.

After I executed following commands of train.py

model = torch.nn.parallel.DistributedDataParallel(
    model.cuda(),
    device_ids=[dist.local_rank()],
    find_unused_parameters=True)


I got following error messages related to nvidia-modprobe.

/usr/bin/nvidia-modprobe: unrecognized option: "-f"

ERROR: Invalid commandline, please run /usr/bin/nvidia-modprobe --help
for usage information.

/usr/bin/nvidia-modprobe: unrecognized option: "-f"

ERROR: Invalid commandline, please run /usr/bin/nvidia-modprobe --help
for usage information.

Do you have any idea for solving this problem? Please check and let me know.

Thank you in advance for your help.

@JaehyeokJang
Copy link
Author

FYI, current version of NVIDIA drivers, python and pytorch are as follows.

----------------------- NVIDIA -------------------------
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_May__6_19:09:25_PDT_2020
Cuda compilation tools, release 11.0, V11.0.167
Build cuda_11.0_bu.TC445_37.28358933_0

Python : 3.7.7
Pytorch : 1.6.0

@zhijian-liu zhijian-liu self-assigned this Mar 8, 2021
@zhijian-liu
Copy link

This seems to be related to PyTorch. Are you able to run other PyTorch training that uses distributed data-parallel?

@zhijian-liu zhijian-liu added the question Further information is requested label Mar 8, 2021
@JaehyeokJang
Copy link
Author

I got reference library from you and it is solved now. So I will close this issue. Thank you for your big effort!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants