-
Notifications
You must be signed in to change notification settings - Fork 26k
Description
🐛 Bug
To Reproduce
Steps to reproduce the behavior:
- Attempt to install from source on a fresh Jetpack 3.3 on nVidia Jetson TX2
- Instead of
python setup.py install, install withpython3 setup.py install(Tried with both, same error)
Errors are:
...about 100 NVLink errors, listing the last few below along with final error log.
nvlink error : entry function '_Z28ncclAllReduceLLKernel_sum_i88ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z29ncclAllReduceLLKernel_sum_i328ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z29ncclAllReduceLLKernel_sum_f168ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z29ncclAllReduceLLKernel_sum_u328ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z29ncclAllReduceLLKernel_sum_f328ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z29ncclAllReduceLLKernel_sum_u648ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z28ncclAllReduceLLKernel_sum_u88ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
Makefile:83: recipe for target '/home/nvidia/jetson-reinforcement/build/pytorch/third_party/build/nccl/obj/collectives/device/devlink.o' failed
make[5]: *** [/home/nvidia/jetson-reinforcement/build/pytorch/third_party/build/nccl/obj/collectives/device/devlink.o] Error 255
Makefile:45: recipe for target 'devicelib' failed
make[4]: *** [devicelib] Error 2
Makefile:24: recipe for target 'src.build' failed
make[3]: *** [src.build] Error 2
CMakeFiles/nccl.dir/build.make:60: recipe for target 'lib/libnccl.so' failed
make[2]: *** [lib/libnccl.so] Error 2
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/nccl.dir/all' failed
make[1]: *** [CMakeFiles/nccl.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2
Failed to run 'bash ../tools/build_pytorch_libs.sh --use-cuda --use-nnpack nccl caffe2 libshm gloo c10d THD'
Expected behavior
Install should work so that I can open a Python 3 console and can succesfully do: import torch
Environment
Script does not run.
- PyTorch Version (e.g., 1.0): Latest master
- OS (e.g., Linux): nVidia Jetson TX2 Ubuntu, aarch64 architecture
- How you installed PyTorch (
conda,pip, source): source - Build command you used (if compiling from source):
python3 setup.py install - Python version: 3.5.3
- CUDA/cuDNN version: 9.0, 7.0
- GPU models and configuration:
- Any other relevant information:
There is no Conda build for aarch64, so have to use standard python libraries.