Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/h2ogpt_conda/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol #1348

Closed
ctxwing opened this issue Feb 3, 2024 · 3 comments

Comments

@ctxwing
Copy link

ctxwing commented Feb 3, 2024

thansk for ur great job!
I set up fresh install in docker in ubuntu22.04.3
Question :
I got error then container keep rebooting: how to solve it ?
is this becuase of cuda verson 12.2 ?
i can not set it up to 12.1 for nvidia drvier 535.154.05 and cuda_12.2.0_535.54.03_linux.run make it 12.2
(with ubuntu-drivers devices, i see nvidia-driver-530 but install failed. )

/h2ogpt_conda/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros ......

i goggled and found something that seems related below issue:

thanks in advance.

Details:

Host:

  • VERSION="22.04.3 LTS (Jammy Jellyfish)"
  • Docker version 25.0.2, build 29cf629
  • nvidia-smi
    +---------------------------------------------------------------------------------------+
    | NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 |
    |-----------------------------------------+----------------------+----------------------+

--

Docker Container :

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
h2ogpt@d1225cb3fb68:~$ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+

I bumped below messages:

h2ogpt-h2ogpt-1  | Using Model h2oai/h2ogpt-4096-llama2-7b-chat
h2ogpt-h2ogpt-1  | fatal: not a git repository (or any of the parent directories): .git
h2ogpt-h2ogpt-1  | load INSTRUCTOR_Transformer
h2ogpt-h2ogpt-1  | max_seq_length  512
h2ogpt-h2ogpt-1  | Traceback (most recent call last):
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1364, in _get_module
h2ogpt-h2ogpt-1  |     return importlib.import_module("." + module_name, self.__name__)
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/importlib/__init__.py", line 126, in import_module
h2ogpt-h2ogpt-1  |     return _bootstrap._gcd_import(name[level:], package, level)
h2ogpt-h2ogpt-1  |   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
h2ogpt-h2ogpt-1  |   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
h2ogpt-h2ogpt-1  |   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
h2ogpt-h2ogpt-1  |   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
h2ogpt-h2ogpt-1  |   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
h2ogpt-h2ogpt-1  |   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/transformers/models/whisper/modeling_whisper.py", line 50, in <module>
h2ogpt-h2ogpt-1  |     from flash_attn import flash_attn_func, flash_attn_varlen_func
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/flash_attn/__init__.py", line 3, in <module>
h2ogpt-h2ogpt-1  |     from flash_attn.flash_attn_interface import (
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 10, in <module>
h2ogpt-h2ogpt-1  |     import flash_attn_2_cuda as flash_attn_cuda
h2ogpt-h2ogpt-1  | ImportError: /h2ogpt_conda/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
h2ogpt-h2ogpt-1  | 
h2ogpt-h2ogpt-1  | The above exception was the direct cause of the following exception:
h2ogpt-h2ogpt-1  | 
h2ogpt-h2ogpt-1  | Traceback (most recent call last):
h2ogpt-h2ogpt-1  |   File "/workspace/generate.py", line 16, in <module>
h2ogpt-h2ogpt-1  |     entrypoint_main()
h2ogpt-h2ogpt-1  |   File "/workspace/generate.py", line 12, in entrypoint_main
h2ogpt-h2ogpt-1  |     H2O_Fire(main)
h2ogpt-h2ogpt-1  |   File "/workspace/src/utils.py", line 65, in H2O_Fire
h2ogpt-h2ogpt-1  |     fire.Fire(component=component, command=args)
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
h2ogpt-h2ogpt-1  |     component_trace = _Fire(component, args, parsed_flag_args, context, name)
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
h2ogpt-h2ogpt-1  |     component, remaining_args = _CallAndUpdateTrace(
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
h2ogpt-h2ogpt-1  |     component = fn(*varargs, **kwargs)
h2ogpt-h2ogpt-1  |   File "/workspace/src/gen.py", line 1701, in main
h2ogpt-h2ogpt-1  |     transcriber = get_transcriber(model=stt_model,
h2ogpt-h2ogpt-1  |   File "/workspace/src/stt.py", line 15, in get_transcriber
h2ogpt-h2ogpt-1  |     transcriber = pipeline("automatic-speech-recognition", model=model, device_map=device_map)
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/transformers/pipelines/__init__.py", line 870, in pipeline
h2ogpt-h2ogpt-1  |     framework, model = infer_framework_load_model(
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/transformers/pipelines/base.py", line 249, in infer_framework_load_model
h2ogpt-h2ogpt-1  |     _class = getattr(transformers_module, architecture, None)
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1355, in __getattr__
h2ogpt-h2ogpt-1  |     value = getattr(module, name)
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1354, in __getattr__
h2ogpt-h2ogpt-1  |     module = self._get_module(self._class_to_module[name])
h2ogpt-h2ogpt-1  |   File "/h2ogpt_conda/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1366, in _get_module
h2ogpt-h2ogpt-1  |     raise RuntimeError(
h2ogpt-h2ogpt-1  | RuntimeError: Failed to import transformers.models.whisper.modeling_whisper because of the following error (look up to see its traceback):
h2ogpt-h2ogpt-1  | /h2ogpt_conda/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
h2ogpt-h2ogpt-1 exited with code 0
@pseudotensor
Copy link
Collaborator

pseudotensor commented Feb 3, 2024

Hi, I only got that when I had newer torch 2.2.0 with flash attn. This happened because some langchain package was updating to 2.2.0 (because it could) and I only set constraints in requirements.txt not langchain one. But that's there now and I no longer that that issue.

I see that the latest docker image I made has the same problem of still going to torch 2.2.0. So should be fixable.

(h2ogpt) jon@gpu:~/h2ogpt$ docker run -ti --entrypoint=bash gcr.io/vorvan/h2oai/h2ogpt-runtime:0.1.0
h2ogpt@f0692c4f43b7:~$ python
Python 3.10.9 (main, Jan 11 2023, 15:21:40) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import flash_attn
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/h2ogpt_conda/lib/python3.10/site-packages/flash_attn/__init__.py", line 3, in <module>
    from flash_attn.flash_attn_interface import (
  File "/h2ogpt_conda/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 10, in <module>
    import flash_attn_2_cuda as flash_attn_cuda
ImportError: /h2ogpt_conda/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
>>> import torch
>>> torch.__version__
'2.2.0+cu121'
>>> 

@pseudotensor
Copy link
Collaborator

new docker image with these changes is being built and should be done in less than a few hours.

@ctxwing
Copy link
Author

ctxwing commented Feb 3, 2024

@pseudotensor thanks for the quick and kind answers:
i newly pulled gcr.io/vorvan/h2oai/h2ogpt-runtime:0.1.0 ( 83d8a1900178 | 0.1.0 | be4c495 | latest | 0.1.0-309 )

that above errors are gone !

so much thanks. bless you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants