Skip to content

[docker] DeepSpeed image should contain nvcc #1710

@vfdev-5

Description

@vfdev-5

Currently, "pytorchignite/msdp-apex:latest" docker image can not run cifar10 DeepSpeed example failing with error:

...
    basic_optimizer = self._configure_basic_optimizer(model_parameters)                                                                                                                          
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 661, in _configure_basic_optimizer
    optimizer = FusedAdam(model_parameters, **optimizer_parameters)
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/ops/adam/fused_adam.py", line 72, in __init__
    fused_adam_cuda = FusedAdamBuilder().load()
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 174, in load
    return self.jit_load(verbose)
...
  File "/opt/conda/lib/python3.8/subprocess.py", line 1702, in _execute_child         
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/bin/nvcc'
    raise child_exception_type(errno_num, err_msg, err_filename)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions