inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii #452

Andronixs · 2024-03-31T16:38:36Z

Environment:
Ubuntu 22.04.4 LTS
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0
ds_report added at the end of the description

Issue: Not able to successfully run example scripts using MII. Getting the following error: inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii. However, I'm able to run the deepspeed inference directly (not using MII) without any issues. Tried different torch and cuda versions the result is the same.

Running the base example script:
import mii
pipe = mii.pipeline("mistralai/Mistral-7B-v0.1")
response = pipe(["DeepSpeed is", "Seattle is"], max_new_tokens=128)
print(response)

output
..............................................................................
[10/10] c++ core_ops.o bias_activation.o bias_activation_cuda.cuda.o layer_norm.o layer_norm_cuda.cuda.o rms_norm.o rms_norm_cuda.cuda.o gated_activation_kernels.o gated_activation_kernels_cuda.cuda.o -shared -L/home/andrew/.local/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda-12.1/lib64 -lcudart -o inference_core_ops.so
Loading extension module inference_core_ops...
Traceback (most recent call last):
File "/home/andrew/Projects/Deepspeed_examples/./ds_test.py", line 2, in
pipe = mii.pipeline("mistralai/Mistral-7B-v0.1")
File "/home/andrew/.local/lib/python3.10/site-packages/mii/api.py", line 207, in pipeline
inference_engine = load_model(model_config)
File "/home/andrew/.local/lib/python3.10/site-packages/mii/modeling/models.py", line 17, in load_model
inference_engine = build_hf_engine(
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/engine_factory.py", line 129, in build_hf_engine
return InferenceEngineV2(policy, engine_config)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/engine_v2.py", line 83, in init
self._model = self._policy.build_model(self._config, self._base_mp_group)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 156, in build_model
self.model = self.instantiate_model(engine_config, mp_group)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/mistral/policy.py", line 17, in instantiate_model
return MistralInferenceModel(config=self._model_config, engine_config=engine_config, base_mp_group=mp_group)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/inference_transformer_base.py", line 215, in init
self.make_norm_layer()
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/inference_transformer_base.py", line 518, in make_norm_layer
self.norm = heuristics.instantiate_pre_norm(norm_config, self._engine_config)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/modules/heuristics.py", line 167, in instantiate_pre_norm
return DSPreNormRegistry.instantiate_config(config)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/modules/module_registry.py", line 36, in instantiate_config
if not target_implementation.supports_config(config_bundle.config):
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/modules/implementations/pre_norm/cuda_pre_rms.py", line 36, in supports_config
_ = CUDARMSPreNorm(config.channels, config.residual_dtype)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/kernels/core_ops/cuda_rms_norm/rms_norm_base.py", line 36, in init
self.inf_module = InferenceCoreBuilder().load()
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 479, in load
return self.jit_load(verbose)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 523, in jit_load
op_module = load(name=self.name,
File "/home/andrew/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1306, in load
return _jit_compile(
File "/home/andrew/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1736, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/home/andrew/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2132, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
File "", line 571, in module_from_spec
File "", line 1176, in create_module
File "", line 241, in _call_with_frames_removed
ImportError: /home/andrew/.cache/torch_extensions/py310_cu121/inference_core_ops/inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii

DS_REPORT:
JIT compiled ops requires ninja
ninja .................. [OKAY]

op name ................ installed .. compatible

async_io ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
evoformer_attn ......... [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2
[WARNING] using untested triton version (2.2.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]

DeepSpeed general environment info:
torch install path ............... ['/home/andrew/.local/lib/python3.10/site-packages/torch']
torch version .................... 2.2.2+cu121
deepspeed install path ........... ['/home/andrew/.local/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.14.0, unknown, unknown
torch cuda version ............... 12.1
torch hip version ................ None
nvcc version ..................... 12.1
deepspeed wheel compiled w. ...... torch 2.2, cuda 12.1
shared memory (/dev/shm) size .... 172.11 GB

allanj · 2024-04-01T10:01:46Z

same problem here

yechong316 · 2024-04-11T16:42:42Z

same to you, I have no way to solve it

Andronixs · 2024-04-11T17:01:29Z

If I'm using Conda and Python 3.9, I'm not getting this error, but the process is stuck in the server starting phase.

allanj · 2024-04-12T01:28:08Z

I simply change to VLLM.. sorry Microsoft :(

Andronixs · 2024-04-12T01:45:29Z

Yep, VLLM and HF TGI are working with no issues.

Andronixs · 2024-04-12T02:16:23Z

It seems this issue was previously reported under different titles:

#443

Fix the FP6 kernels compilation problem on non-Ampere GPUs. microsoft/DeepSpeed#5333

Proposed workaround:
Downgrading to this will work:
deepspeed 0.13.5
deepspeed-mii 0.2.2

Didn't work for me

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii #452

inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii #452

Andronixs commented Mar 31, 2024 •

edited

Loading

allanj commented Apr 1, 2024

yechong316 commented Apr 11, 2024

Andronixs commented Apr 11, 2024

allanj commented Apr 12, 2024

Andronixs commented Apr 12, 2024

Andronixs commented Apr 12, 2024

inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii #452

inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii #452

Comments

Andronixs commented Mar 31, 2024 • edited Loading

DS_REPORT: JIT compiled ops requires ninja ninja .................. [OKAY]

op name ................ installed .. compatible

allanj commented Apr 1, 2024

yechong316 commented Apr 11, 2024

Andronixs commented Apr 11, 2024

allanj commented Apr 12, 2024

Andronixs commented Apr 12, 2024

Andronixs commented Apr 12, 2024

Andronixs commented Mar 31, 2024 •

edited

Loading

DS_REPORT:
JIT compiled ops requires ninja
ninja .................. [OKAY]