-
-
Notifications
You must be signed in to change notification settings - Fork 655
Closed
Labels
Description
Currently, "pytorchignite/msdp-apex:latest" docker image can not run cifar10 DeepSpeed example failing with error:
...
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 661, in _configure_basic_optimizer
optimizer = FusedAdam(model_parameters, **optimizer_parameters)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/ops/adam/fused_adam.py", line 72, in __init__
fused_adam_cuda = FusedAdamBuilder().load()
File "/opt/conda/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 174, in load
return self.jit_load(verbose)
...
File "/opt/conda/lib/python3.8/subprocess.py", line 1702, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/bin/nvcc'
raise child_exception_type(errno_num, err_msg, err_filename)