Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Trying to finetune mistral using deepspeed but running into an error: Error building extension 'cpu_adam' #5429

Open
SarthakM320 opened this issue Apr 17, 2024 · 1 comment
Labels
bug Something isn't working training

Comments

@SarthakM320
Copy link

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.26it/s]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
You are using a CUDA device ('NVIDIA RTX A6000') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
[rank: 0] Seed set to 4
initializing deepspeed distributed: GLOBAL_RANK: 0, MEMBER: 1/1
Enabling DeepSpeed BF16. Model parameters and inputs will be cast to `bfloat16`.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [1,3]
Installed CUDA version 11.5 does not match the version torch was compiled with 11.8 but since the APIs are compatible, accepting this combination
Using /home/sarthak/.cache/torch_extensions/py39_cu118 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/sarthak/.cache/torch_extensions/py39_cu118/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/4] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output custom_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/usr/include -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/TH -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/THC -isystem /home/sarthak/miniconda3/envs/tmi/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ --threads=8 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -DBF16_AVAILABLE -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -c /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o 
FAILED: custom_cuda_kernel.cuda.o 
/usr/bin/nvcc --generate-dependencies-with-compile --dependency-output custom_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/usr/include -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/TH -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/THC -isystem /home/sarthak/miniconda3/envs/tmi/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ --threads=8 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -DBF16_AVAILABLE -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -c /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o 
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
  435 |         function(_Functor&& __f)
      |                                                                                                                                                 ^ 
/usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
  530 |         operator=(_Functor&& __f)
      |                                                                                                                                                  ^ 
/usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’
[2/4] c++ -MMD -MF cpu_adam_impl.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/usr/include -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/TH -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/THC -isystem /home/sarthak/miniconda3/envs/tmi/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -DBF16_AVAILABLE -c /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/cpu_adam_impl.cpp -o cpu_adam_impl.o 
FAILED: cpu_adam_impl.o 
c++ -MMD -MF cpu_adam_impl.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/usr/include -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/TH -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/THC -isystem /home/sarthak/miniconda3/envs/tmi/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -DBF16_AVAILABLE -c /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/cpu_adam_impl.cpp -o cpu_adam_impl.o 
In file included from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/c10/util/TypeList.h:3:0,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/c10/util/Metaprogramming.h:3,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/c10/core/DispatchKeySet.h:4,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/c10/core/Backend.h:5,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/c10/core/Layout.h:3,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/ATen/core/TensorBody.h:12,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/ATen/core/Tensor.h:3,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/function_hook.h:3,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/cpp_hook.h:2,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/variable.h:6,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/autograd.h:3,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/autograd.h:3,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/torch/extension.h:5,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/cpu_adam_impl.cpp:6:
/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/c10/util/C++17.h:16:2: error: #error "You're trying to build PyTorch with a too old version of GCC. We need GCC 9 or later."
 #error \
  ^~~~~
[3/4] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/usr/include -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/TH -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/THC -isystem /home/sarthak/miniconda3/envs/tmi/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -DBF16_AVAILABLE -c /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o 
FAILED: cpu_adam.o 
c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/usr/include -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/TH -isystem /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/THC -isystem /home/sarthak/miniconda3/envs/tmi/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -DBF16_AVAILABLE -c /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o 
In file included from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/c10/util/TypeList.h:3:0,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/c10/util/Metaprogramming.h:3,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/c10/core/DispatchKeySet.h:4,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/c10/core/Backend.h:5,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/c10/core/Layout.h:3,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/ATen/core/TensorBody.h:12,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/ATen/core/Tensor.h:3,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/function_hook.h:3,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/cpp_hook.h:2,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/variable.h:6,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/autograd.h:3,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/autograd.h:3,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/torch/extension.h:5,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adam.h:12,
                 from /home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp:6:
/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/include/c10/util/C++17.h:16:2: error: #error "You're trying to build PyTorch with a too old version of GCC. We need GCC 9 or later."
 #error \
  ^~~~~
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 2096, in _run_ninja_build
    subprocess.run(
  File "/home/sarthak/miniconda3/envs/tmi/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/disks/2/sarthak/transformer_multi_image/main.py", line 102, in <module>
    main(vars(args))
  File "/disks/2/sarthak/transformer_multi_image/main.py", line 74, in main
    trainer.fit(model,datamodule=dataset)
  File "/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 105, in launch
    return function(*args, **kwargs)
  File "/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 963, in _run
    self.strategy.setup(self)
  File "/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/lightning/pytorch/strategies/deepspeed.py", line 353, in setup
    self.init_deepspeed()
  File "/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/lightning/pytorch/strategies/deepspeed.py", line 454, in init_deepspeed
    self._initialize_deepspeed_train(self.model)
  File "/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/lightning/pytorch/strategies/deepspeed.py", line 486, in _initialize_deepspeed_train
    ) = self._init_optimizers()
  File "/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/lightning/pytorch/strategies/deepspeed.py", line 460, in _init_optimizers
    optimizers, lr_schedulers = _init_optimizers_and_lr_schedulers(self.lightning_module)
  File "/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/lightning/pytorch/core/optimizer.py", line 178, in _init_optimizers_and_lr_schedulers
    optim_conf = call._call_lightning_module_hook(model.trainer, "configure_optimizers", pl_module=model)
  File "/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/disks/2/sarthak/transformer_multi_image/model/arch.py", line 272, in configure_optimizers
    deepspeed.ops.op_builder.CPUAdamBuilder().load()
  File "/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 479, in load
    return self.jit_load(verbose)
  File "/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 523, in jit_load
    op_module = load(name=self.name,
  File "/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1306, in load
    return _jit_compile(
  File "/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1710, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1823, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 2112, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'cpu_adam'

ds_report

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2
 [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/torch']
torch version .................... 2.2.2+cu118
deepspeed install path ........... ['/home/sarthak/miniconda3/envs/tmi/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.14.0, unknown, unknown
torch cuda version ............... 11.8
torch hip version ................ None
nvcc version ..................... 11.5
deepspeed wheel compiled w. ...... torch 0.0, cuda 0.0
shared memory (/dev/shm) size .... 503.87 GB

Any idea on how to solve this?

@SarthakM320 SarthakM320 added bug Something isn't working training labels Apr 17, 2024
@xuanhua
Copy link

xuanhua commented Apr 29, 2024

#error "You're trying to build PyTorch with a too old version of GCC. We need GCC 9 or later."
You might need a newer version of gcc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working training
Projects
None yet
Development

No branches or pull requests

2 participants