/h2ogpt_conda/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol #1348

ctxwing · 2024-02-03T00:24:02Z

thansk for ur great job!
I set up fresh install in docker in ubuntu22.04.3
Question :
I got error then container keep rebooting: how to solve it ?
is this becuase of cuda verson 12.2 ?
i can not set it up to 12.1 for nvidia drvier 535.154.05 and cuda_12.2.0_535.54.03_linux.run make it 12.2
(with ubuntu-drivers devices, i see nvidia-driver-530 but install failed. )

/h2ogpt_conda/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros ......

i goggled and found something that seems related below issue:

thanks in advance.

Details:

Host:

VERSION="22.04.3 LTS (Jammy Jellyfish)"
Docker version 25.0.2, build 29cf629
nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+

--

Docker Container :

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
h2ogpt@d1225cb3fb68:~$ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+

I bumped below messages:

h2ogpt-h2ogpt-1  | Using Model h2oai/h2ogpt-4096-llama2-7b-chat
h2ogpt-h2ogpt-1  | fatal: not a git repository (or any of the parent directories): .git
h2ogpt-h2ogpt-1  | load INSTRUCTOR_Transformer
h2ogpt-h2ogpt-1  | max_seq_length  512
h2ogpt-h2ogpt-1  | Traceback (most recent call last):
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1364, in _get_module
h2ogpt-h2ogpt-1  |     return importlib.import_module("." + module_name, self.__name__)
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/importlib/__init__.py", line 126, in import_module
h2ogpt-h2ogpt-1  |     return _bootstrap._gcd_import(name[level:], package, level)
h2ogpt-h2ogpt-1  |   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
h2ogpt-h2ogpt-1  |   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
h2ogpt-h2ogpt-1  |   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
h2ogpt-h2ogpt-1  |   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
h2ogpt-h2ogpt-1  |   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
h2ogpt-h2ogpt-1  |   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/transformers/models/whisper/modeling_whisper.py", line 50, in <module>
h2ogpt-h2ogpt-1  |     from flash_attn import flash_attn_func, flash_attn_varlen_func
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/flash_attn/__init__.py", line 3, in <module>
h2ogpt-h2ogpt-1  |     from flash_attn.flash_attn_interface import (
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 10, in <module>
h2ogpt-h2ogpt-1  |     import flash_attn_2_cuda as flash_attn_cuda
h2ogpt-h2ogpt-1  | ImportError: /h2ogpt_conda/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
h2ogpt-h2ogpt-1  | 
h2ogpt-h2ogpt-1  | The above exception was the direct cause of the following exception:
h2ogpt-h2ogpt-1  | 
h2ogpt-h2ogpt-1  | Traceback (most recent call last):
h2ogpt-h2ogpt-1  |   File "/workspace/generate.py", line 16, in <module>
h2ogpt-h2ogpt-1  |     entrypoint_main()
h2ogpt-h2ogpt-1  |   File "/workspace/generate.py", line 12, in entrypoint_main
h2ogpt-h2ogpt-1  |     H2O_Fire(main)
h2ogpt-h2ogpt-1  |   File "/workspace/src/utils.py", line 65, in H2O_Fire
h2ogpt-h2ogpt-1  |     fire.Fire(component=component, command=args)
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
h2ogpt-h2ogpt-1  |     component_trace = _Fire(component, args, parsed_flag_args, context, name)
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
h2ogpt-h2ogpt-1  |     component, remaining_args = _CallAndUpdateTrace(
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
h2ogpt-h2ogpt-1  |     component = fn(*varargs, **kwargs)
h2ogpt-h2ogpt-1  |   File "/workspace/src/gen.py", line 1701, in main
h2ogpt-h2ogpt-1  |     transcriber = get_transcriber(model=stt_model,
h2ogpt-h2ogpt-1  |   File "/workspace/src/stt.py", line 15, in get_transcriber
h2ogpt-h2ogpt-1  |     transcriber = pipeline("automatic-speech-recognition", model=model, device_map=device_map)
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/transformers/pipelines/__init__.py", line 870, in pipeline
h2ogpt-h2ogpt-1  |     framework, model = infer_framework_load_model(
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/transformers/pipelines/base.py", line 249, in infer_framework_load_model
h2ogpt-h2ogpt-1  |     _class = getattr(transformers_module, architecture, None)
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1355, in __getattr__
h2ogpt-h2ogpt-1  |     value = getattr(module, name)
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1354, in __getattr__
h2ogpt-h2ogpt-1  |     module = self._get_module(self._class_to_module[name])
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1366, in _get_module
h2ogpt-h2ogpt-1  |     raise RuntimeError(
h2ogpt-h2ogpt-1  | RuntimeError: Failed to import transformers.models.whisper.modeling_whisper because of the following error (look up to see its traceback):
h2ogpt-h2ogpt-1  | /h2ogpt_conda/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
h2ogpt-h2ogpt-1 exited with code 0

The text was updated successfully, but these errors were encountered:

pseudotensor · 2024-02-03T00:42:31Z

Hi, I only got that when I had newer torch 2.2.0 with flash attn. This happened because some langchain package was updating to 2.2.0 (because it could) and I only set constraints in requirements.txt not langchain one. But that's there now and I no longer that that issue.

I see that the latest docker image I made has the same problem of still going to torch 2.2.0. So should be fixable.

(h2ogpt) jon@gpu:~/h2ogpt$ docker run -ti --entrypoint=bash gcr.io/vorvan/h2oai/h2ogpt-runtime:0.1.0
h2ogpt@f0692c4f43b7:~$ python
Python 3.10.9 (main, Jan 11 2023, 15:21:40) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import flash_attn
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/h2ogpt_conda/lib/python3.10/site-packages/flash_attn/__init__.py", line 3, in <module>
    from flash_attn.flash_attn_interface import (
  File "/h2ogpt_conda/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 10, in <module>
    import flash_attn_2_cuda as flash_attn_cuda
ImportError: /h2ogpt_conda/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
>>> import torch
>>> torch.__version__
'2.2.0+cu121'
>>>

pseudotensor · 2024-02-03T00:50:30Z

new docker image with these changes is being built and should be done in less than a few hours.

ctxwing · 2024-02-03T04:53:21Z

@pseudotensor thanks for the quick and kind answers:
i newly pulled gcr.io/vorvan/h2oai/h2ogpt-runtime:0.1.0 ( 83d8a1900178 | 0.1.0 | be4c495 | latest | 0.1.0-309 )

that above errors are gone !

so much thanks. bless you!

pseudotensor closed this as completed in be4c495 Feb 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/h2ogpt_conda/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol #1348

/h2ogpt_conda/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol #1348

ctxwing commented Feb 3, 2024

pseudotensor commented Feb 3, 2024 •

edited

Loading

pseudotensor commented Feb 3, 2024

ctxwing commented Feb 3, 2024

/h2ogpt_conda/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol #1348

/h2ogpt_conda/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol #1348

Comments

ctxwing commented Feb 3, 2024

Details:

Host:

Docker Container :

pseudotensor commented Feb 3, 2024 • edited Loading

pseudotensor commented Feb 3, 2024

ctxwing commented Feb 3, 2024

pseudotensor commented Feb 3, 2024 •

edited

Loading