Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Causal LM text generation example errors out #406

Closed
NishantPrabhuFujitsu opened this issue Apr 30, 2024 · 5 comments
Closed

Causal LM text generation example errors out #406

NishantPrabhuFujitsu opened this issue Apr 30, 2024 · 5 comments

Comments

@NishantPrabhuFujitsu
Copy link

I am trying to run text generation using text_generaion/causal_lm/cpp/greedy_causal_lm.cpp without any modifications. I followed the build instructions in the README and ran this command, after which I was shown the following error.

$ ./build/greedy_causal_lm ./TinyLlama-1.1B-Chat-v1.0/pytorch/dldt/FP16 "Why is the Sun yellow?"
Exception from src/inference/src/core.cpp:85:
Exception from src/frontends/ir/src/ir_deserializer.cpp:438:
Invalid IR! ScatterNDUpdate_15 name is not unique!

I have limited knowledge of the toolkit's internal processes, and would like to get some indication of where this issue might be arising from (and how it could be resolved).

System information

  • Platform: Linux (Ubuntu 22.04) running on an x86 IceLake CPU
  • OpenVINO 2024.1.0, installed by following instructions here
  • gcc and g++ versions 11.4.0

The python environment has the following packages among others:

torch==2.3.0
openvino==2024.1.0
openvino-tokenizers==2024.1.0
transformers==4.37.2
optimum==1.19.1

Other things to note

  • I happen to have access to two machines with aarch64 and x86 processors and a shared file system. The tokenizer conversion using openvino-tokenizers (command below) runs successfully every time on the aarch64 machine but often results in a seg fault on the x86 machine. I could not find any pattern in the occurrence of seg faults.
convert_tokenizer ./TinyLlama-1.1B-Chat-v1.0/pytorch/dldt/FP16/ --output ./TinyLlama-1.1B-Chat-v1.0/pytorch/dldt/FP16/ --with-detokenizer --trust-remote-code
  • On the aarch64 machine, trying to build greedy_causal_lm.cpp along with other files using cmake -DCMAKE_BUILD_TYPE=Release -S ./ -B ./build/ && cmake --build ./build/ -j results in the following error. This never occurs on the x86 machine.
CMake Error at /home/nishant/workspace/llm/openvino.genai/thirdparty/openvino_tokenizers/CMakeLists.txt:15 (find_package):
  By not providing "FindOpenVINO.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "OpenVINO",
  but CMake did not find one.

  Could not find a package configuration file provided by "OpenVINO" with any
  of the following names:

    OpenVINOConfig.cmake
    openvino-config.cmake

  Add the installation prefix of "OpenVINO" to CMAKE_PREFIX_PATH or set
  "OpenVINO_DIR" to a directory containing one of the above files.  If
  "OpenVINO" provides a separate development package or SDK, be sure it has
  been installed.

-- Configuring incomplete, errors occurred!

I've had to switch between the two machines to execute specific commands that run on those machines successfully. This could have resulted in some issues as well.

@NishantPrabhuFujitsu NishantPrabhuFujitsu changed the title Model conversion issue in causal LM text generation Causal LM text generation example inference errors out Apr 30, 2024
@NishantPrabhuFujitsu NishantPrabhuFujitsu changed the title Causal LM text generation example inference errors out Causal LM text generation example errors out Apr 30, 2024
@Wovchena
Copy link
Collaborator

  1. Invalid IR! ScatterNDUpdate_15 name is not unique!

    I've modified the test that covers your scenario to run on Ubuntu 22 Python 3.10 and it passed. The problem is likely to be caused by your mixture of arm and x86 environments. Split them.

  2. seg fault on the x86 machine

    You can't share the same openvino package between arm and x86. They are different. x86: https://storage.openvinotoolkit.org/repositories/openvino/packages/2024.1/linux/l_openvino_toolkit_ubuntu22_2024.1.0.15008.f4afc983258_x86_64.tgz, aarch64: https://storage.openvinotoolkit.org/repositories/openvino/packages/2024.1/linux/l_openvino_toolkit_ubuntu20_2024.1.0.15008.f4afc983258_arm64.tgz. Note, there's no Ubuntu 22 for arm, use Ubuntu 20. I believe the same goes to other libraries and tools GenAi depends on like g++.

  3. cmake -DCMAKE_BUILD_TYPE=Release -S ./ -B ./build/ && cmake --build ./build/ -j

    I'd expect it to be resolved after you reinstall the environment.

@NishantPrabhuFujitsu
Copy link
Author

NishantPrabhuFujitsu commented May 2, 2024

@Wovchena thanks for your response and apologies for the delay in my reply.

I separated the environments this time and ensured the right packages are used in each. The pipeline now runs without errors on both x86 and arm machines, but we still have some problems.

While inference on x86 with the example prompt generates a coherent answer with good speed, the same on arm generates garbled text at a slow 1 token/sec (approx.). Any ideas on why that might be?

More info

  • The build on arm was still failing initially with the error mentioned in point (2) under "Other things to note" in my first message. After a little digging, I noticed that cmake would error out on this line in thirdparty/openvino_tokenizers after failing to find OpenVINO runtime. The build succeeded only when I exported OpenVINO_DIR=<INSTALL_DIR>/runtime before running the cmake command. I did not have to do this for x86. To the best of my knowledge, this is not done in setupvars.sh either (activating that environment had no effect on this error). Am I missing something?
  • The python packages installed by the second command in this block error out on arm when attempting to install auto-gptq with the error below. It's unable to find torch even if an appropriate version of it is installed. Again, this did not happen on x86.
$ pip install --upgrade-strategy eager "transformers<4.38" -r ../../../llm_bench/python/requirements.txt "../../../thirdparty/openvino_tokenizers/[transformers]" --extra-index-url https://download.pytorch.org/whl/cpu
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cpu, https://download.pytorch.org/whl/cpu
Processing /home/nishant/workspace/llm/openvino.genai/thirdparty/openvino_tokenizers
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Collecting optimum-intel (from -r ../../../llm_bench/python/requirements.txt (line 10))
  Using cached optimum_intel-1.17.0.dev0+e1b6a59-py3-none-any.whl
Collecting nncf (from -r ../../../llm_bench/python/requirements.txt (line 11))
  Cloning https://github.com/openvinotoolkit/nncf.git to /tmp/pip-install-drpdnodj/nncf_9c1b5e7af8394707bed42dc339c951aa
  Running command git clone --filter=blob:none --quiet https://github.com/openvinotoolkit/nncf.git /tmp/pip-install-drpdnodj/nncf_9c1b5e7af8394707bed42dc339c951aa
  Resolved https://github.com/openvinotoolkit/nncf.git to commit 9c00000cb0fd69eca8240953db5f9808991e64e4
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: transformers<4.38 in /home/nishant/envs/openvino/lib/python3.11/site-packages (4.37.2)
Requirement already satisfied: numpy in /home/nishant/envs/openvino/lib/python3.11/site-packages (from -r ../../../llm_bench/python/requirements.txt (line 2)) (1.26.4)
Requirement already satisfied: openvino>=2024.0.0 in /home/nishant/envs/openvino/lib/python3.11/site-packages (from -r ../../../llm_bench/python/requirements.txt (line 3)) (2024.1.0)
Collecting auto-gptq>=0.5.1 (from -r ../../../llm_bench/python/requirements.txt (line 4))
  Using cached auto_gptq-0.7.1.tar.gz (126 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [1 lines of output]
      Building cuda extension requires PyTorch (>=1.13.0) being installed, please install PyTorch first: No module named 'torch'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

To get this to work, I had to remove auto-gptq from the requirements. Could this be the cause of the poor output generated on arm?

[UPDATE] I just checked Auto-GPTQ's repo and noticed an issue from last year discussing lack of CPU-only support for their package. If this is the cause of the installation issue, I'm still not sure why it worked on x86 since both of my machines are CPU-only.

@Wovchena
Copy link
Collaborator

Wovchena commented May 2, 2024

@allnes, @alvoron, please take a look at

While inference on x86 with the example prompt generates a coherent answer with good speed, the same on arm generates garbled text at a slow 1 token/sec (approx.).

and

The build succeeded only when I exported OpenVINO_DIR=<INSTALL_DIR>/runtime before running the cmake command. I did not have to do this for x86. To the best of my knowledge, this is not done in setupvars.sh either (activating that environment had no effect on this error).

@wgzintel, @eaidova, can auto-gptq become an optional dependency? I tried --no-build-isolation solution: #415, but it didn't work out.

@peterchen-intel
Copy link
Collaborator

peterchen-intel commented May 6, 2024

The build succeeded only when I exported OpenVINO_DIR=<INSTALL_DIR>/runtime before running the cmake command. I did not have to do this for x86. To the best of my knowledge, this is not done in setupvars.sh either (activating that environment had no effect on this error).

@wgzintel, @eaidova, can auto-gptq become an optional dependency? I tried --no-build-isolation solution: #415, but it didn't work out.

Maybe this one is the solution https://stackoverflow.com/questions/29222269/is-there-a-way-to-have-a-conditional-requirements-txt-file-for-my-python-applica

@NishantPrabhuFujitsu
Copy link
Author

Hi @Wovchena

  1. The slow inference issue on ARM has been resolved in FullyConnected nodes use slow reference kernel on ARM #438 with @alvoron's support.
  2. As far as auto-gptq is concerned, I was able to build it from source with this modification to the pip install command:
BUILD_CUDA_EXT=0 pip install -vvv --no-build-isolation --use-deprecated=backtrack-on-build-failures -e .

However, skipping its installation doesn't seem to affect the quality of generated text. In general, the output quality degrades significantly as the model is compressed to lower precisions. That, however, is an issue for another day.

Thank you for your support. Marking this issue as closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants