Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii #452

Open
Andronixs opened this issue Mar 31, 2024 · 6 comments

Comments

@Andronixs
Copy link

Andronixs commented Mar 31, 2024

Environment:
Ubuntu 22.04.4 LTS
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0
ds_report added at the end of the description

Issue: Not able to successfully run example scripts using MII. Getting the following error: inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii. However, I'm able to run the deepspeed inference directly (not using MII) without any issues. Tried different torch and cuda versions the result is the same.

Running the base example script:
import mii
pipe = mii.pipeline("mistralai/Mistral-7B-v0.1")
response = pipe(["DeepSpeed is", "Seattle is"], max_new_tokens=128)
print(response)

output
..............................................................................
[10/10] c++ core_ops.o bias_activation.o bias_activation_cuda.cuda.o layer_norm.o layer_norm_cuda.cuda.o rms_norm.o rms_norm_cuda.cuda.o gated_activation_kernels.o gated_activation_kernels_cuda.cuda.o -shared -L/home/andrew/.local/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda-12.1/lib64 -lcudart -o inference_core_ops.so
Loading extension module inference_core_ops...
Traceback (most recent call last):
File "/home/andrew/Projects/Deepspeed_examples/./ds_test.py", line 2, in
pipe = mii.pipeline("mistralai/Mistral-7B-v0.1")
File "/home/andrew/.local/lib/python3.10/site-packages/mii/api.py", line 207, in pipeline
inference_engine = load_model(model_config)
File "/home/andrew/.local/lib/python3.10/site-packages/mii/modeling/models.py", line 17, in load_model
inference_engine = build_hf_engine(
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/engine_factory.py", line 129, in build_hf_engine
return InferenceEngineV2(policy, engine_config)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/engine_v2.py", line 83, in init
self._model = self._policy.build_model(self._config, self._base_mp_group)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 156, in build_model
self.model = self.instantiate_model(engine_config, mp_group)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/mistral/policy.py", line 17, in instantiate_model
return MistralInferenceModel(config=self._model_config, engine_config=engine_config, base_mp_group=mp_group)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/inference_transformer_base.py", line 215, in init
self.make_norm_layer()
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/inference_transformer_base.py", line 518, in make_norm_layer
self.norm = heuristics.instantiate_pre_norm(norm_config, self._engine_config)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/modules/heuristics.py", line 167, in instantiate_pre_norm
return DSPreNormRegistry.instantiate_config(config)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/modules/module_registry.py", line 36, in instantiate_config
if not target_implementation.supports_config(config_bundle.config):
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/modules/implementations/pre_norm/cuda_pre_rms.py", line 36, in supports_config
_ = CUDARMSPreNorm(config.channels, config.residual_dtype)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/kernels/core_ops/cuda_rms_norm/rms_norm_base.py", line 36, in init
self.inf_module = InferenceCoreBuilder().load()
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 479, in load
return self.jit_load(verbose)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 523, in jit_load
op_module = load(name=self.name,
File "/home/andrew/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1306, in load
return _jit_compile(
File "/home/andrew/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1736, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/home/andrew/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2132, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
File "", line 571, in module_from_spec
File "", line 1176, in create_module
File "", line 241, in _call_with_frames_removed
ImportError: /home/andrew/.cache/torch_extensions/py310_cu121/inference_core_ops/inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii

DS_REPORT:
JIT compiled ops requires ninja
ninja .................. [OKAY]

op name ................ installed .. compatible

async_io ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
evoformer_attn ......... [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2
[WARNING] using untested triton version (2.2.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]

DeepSpeed general environment info:
torch install path ............... ['/home/andrew/.local/lib/python3.10/site-packages/torch']
torch version .................... 2.2.2+cu121
deepspeed install path ........... ['/home/andrew/.local/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.14.0, unknown, unknown
torch cuda version ............... 12.1
torch hip version ................ None
nvcc version ..................... 12.1
deepspeed wheel compiled w. ...... torch 2.2, cuda 12.1
shared memory (/dev/shm) size .... 172.11 GB

@allanj
Copy link

allanj commented Apr 1, 2024

same problem here

@yechong316
Copy link

same to you, I have no way to solve it

@Andronixs
Copy link
Author

If I'm using Conda and Python 3.9, I'm not getting this error, but the process is stuck in the server starting phase.
MII_server_log

@allanj
Copy link

allanj commented Apr 12, 2024

I simply change to VLLM.. sorry Microsoft :(

@Andronixs
Copy link
Author

Yep, VLLM and HF TGI are working with no issues.

@Andronixs
Copy link
Author

It seems this issue was previously reported under different titles:

#443

Fix the FP6 kernels compilation problem on non-Ampere GPUs. microsoft/DeepSpeed#5333

Proposed workaround:
Downgrading to this will work:
deepspeed 0.13.5
deepspeed-mii 0.2.2

Didn't work for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants