Causal LM text generation example errors out #406

NishantPrabhuFujitsu · 2024-04-30T12:16:25Z

I am trying to run text generation using text_generaion/causal_lm/cpp/greedy_causal_lm.cpp without any modifications. I followed the build instructions in the README and ran this command, after which I was shown the following error.

$ ./build/greedy_causal_lm ./TinyLlama-1.1B-Chat-v1.0/pytorch/dldt/FP16 "Why is the Sun yellow?"
Exception from src/inference/src/core.cpp:85:
Exception from src/frontends/ir/src/ir_deserializer.cpp:438:
Invalid IR! ScatterNDUpdate_15 name is not unique!

I have limited knowledge of the toolkit's internal processes, and would like to get some indication of where this issue might be arising from (and how it could be resolved).

System information

Platform: Linux (Ubuntu 22.04) running on an x86 IceLake CPU
OpenVINO 2024.1.0, installed by following instructions here
gcc and g++ versions 11.4.0

The python environment has the following packages among others:

torch==2.3.0
openvino==2024.1.0
openvino-tokenizers==2024.1.0
transformers==4.37.2
optimum==1.19.1

Other things to note

I happen to have access to two machines with aarch64 and x86 processors and a shared file system. The tokenizer conversion using openvino-tokenizers (command below) runs successfully every time on the aarch64 machine but often results in a seg fault on the x86 machine. I could not find any pattern in the occurrence of seg faults.

convert_tokenizer ./TinyLlama-1.1B-Chat-v1.0/pytorch/dldt/FP16/ --output ./TinyLlama-1.1B-Chat-v1.0/pytorch/dldt/FP16/ --with-detokenizer --trust-remote-code

On the aarch64 machine, trying to build greedy_causal_lm.cpp along with other files using cmake -DCMAKE_BUILD_TYPE=Release -S ./ -B ./build/ && cmake --build ./build/ -j results in the following error. This never occurs on the x86 machine.

CMake Error at /home/nishant/workspace/llm/openvino.genai/thirdparty/openvino_tokenizers/CMakeLists.txt:15 (find_package):
  By not providing "FindOpenVINO.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "OpenVINO",
  but CMake did not find one.

  Could not find a package configuration file provided by "OpenVINO" with any
  of the following names:

    OpenVINOConfig.cmake
    openvino-config.cmake

  Add the installation prefix of "OpenVINO" to CMAKE_PREFIX_PATH or set
  "OpenVINO_DIR" to a directory containing one of the above files.  If
  "OpenVINO" provides a separate development package or SDK, be sure it has
  been installed.

-- Configuring incomplete, errors occurred!

I've had to switch between the two machines to execute specific commands that run on those machines successfully. This could have resulted in some issues as well.

The text was updated successfully, but these errors were encountered:

Wovchena · 2024-04-30T13:29:51Z

Invalid IR! ScatterNDUpdate_15 name is not unique!

I've modified the test that covers your scenario to run on Ubuntu 22 Python 3.10 and it passed. The problem is likely to be caused by your mixture of arm and x86 environments. Split them.
seg fault on the x86 machine

You can't share the same openvino package between arm and x86. They are different. x86: https://storage.openvinotoolkit.org/repositories/openvino/packages/2024.1/linux/l_openvino_toolkit_ubuntu22_2024.1.0.15008.f4afc983258_x86_64.tgz, aarch64: https://storage.openvinotoolkit.org/repositories/openvino/packages/2024.1/linux/l_openvino_toolkit_ubuntu20_2024.1.0.15008.f4afc983258_arm64.tgz. Note, there's no Ubuntu 22 for arm, use Ubuntu 20. I believe the same goes to other libraries and tools GenAi depends on like g++.
cmake -DCMAKE_BUILD_TYPE=Release -S ./ -B ./build/ && cmake --build ./build/ -j

I'd expect it to be resolved after you reinstall the environment.

NishantPrabhuFujitsu · 2024-05-02T10:38:23Z

@Wovchena thanks for your response and apologies for the delay in my reply.

I separated the environments this time and ensured the right packages are used in each. The pipeline now runs without errors on both x86 and arm machines, but we still have some problems.

While inference on x86 with the example prompt generates a coherent answer with good speed, the same on arm generates garbled text at a slow 1 token/sec (approx.). Any ideas on why that might be?

More info

The build on arm was still failing initially with the error mentioned in point (2) under "Other things to note" in my first message. After a little digging, I noticed that cmake would error out on this line in thirdparty/openvino_tokenizers after failing to find OpenVINO runtime. The build succeeded only when I exported OpenVINO_DIR=<INSTALL_DIR>/runtime before running the cmake command. I did not have to do this for x86. To the best of my knowledge, this is not done in setupvars.sh either (activating that environment had no effect on this error). Am I missing something?
The python packages installed by the second command in this block error out on arm when attempting to install auto-gptq with the error below. It's unable to find torch even if an appropriate version of it is installed. Again, this did not happen on x86.

$ pip install --upgrade-strategy eager "transformers<4.38" -r ../../../llm_bench/python/requirements.txt "../../../thirdparty/openvino_tokenizers/[transformers]" --extra-index-url https://download.pytorch.org/whl/cpu
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cpu, https://download.pytorch.org/whl/cpu
Processing /home/nishant/workspace/llm/openvino.genai/thirdparty/openvino_tokenizers
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Collecting optimum-intel (from -r ../../../llm_bench/python/requirements.txt (line 10))
  Using cached optimum_intel-1.17.0.dev0+e1b6a59-py3-none-any.whl
Collecting nncf (from -r ../../../llm_bench/python/requirements.txt (line 11))
  Cloning https://github.com/openvinotoolkit/nncf.git to /tmp/pip-install-drpdnodj/nncf_9c1b5e7af8394707bed42dc339c951aa
  Running command git clone --filter=blob:none --quiet https://github.com/openvinotoolkit/nncf.git /tmp/pip-install-drpdnodj/nncf_9c1b5e7af8394707bed42dc339c951aa
  Resolved https://github.com/openvinotoolkit/nncf.git to commit 9c00000cb0fd69eca8240953db5f9808991e64e4
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: transformers<4.38 in /home/nishant/envs/openvino/lib/python3.11/site-packages (4.37.2)
Requirement already satisfied: numpy in /home/nishant/envs/openvino/lib/python3.11/site-packages (from -r ../../../llm_bench/python/requirements.txt (line 2)) (1.26.4)
Requirement already satisfied: openvino>=2024.0.0 in /home/nishant/envs/openvino/lib/python3.11/site-packages (from -r ../../../llm_bench/python/requirements.txt (line 3)) (2024.1.0)
Collecting auto-gptq>=0.5.1 (from -r ../../../llm_bench/python/requirements.txt (line 4))
  Using cached auto_gptq-0.7.1.tar.gz (126 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [1 lines of output]
      Building cuda extension requires PyTorch (>=1.13.0) being installed, please install PyTorch first: No module named 'torch'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

To get this to work, I had to remove auto-gptq from the requirements. Could this be the cause of the poor output generated on arm?

[UPDATE] I just checked Auto-GPTQ's repo and noticed an issue from last year discussing lack of CPU-only support for their package. If this is the cause of the installation issue, I'm still not sure why it worked on x86 since both of my machines are CPU-only.

Wovchena · 2024-05-02T11:14:54Z

@allnes, @alvoron, please take a look at

While inference on x86 with the example prompt generates a coherent answer with good speed, the same on arm generates garbled text at a slow 1 token/sec (approx.).

and

The build succeeded only when I exported OpenVINO_DIR=<INSTALL_DIR>/runtime before running the cmake command. I did not have to do this for x86. To the best of my knowledge, this is not done in setupvars.sh either (activating that environment had no effect on this error).

@wgzintel, @eaidova, can auto-gptq become an optional dependency? I tried --no-build-isolation solution: #415, but it didn't work out.

peterchen-intel · 2024-05-06T10:29:59Z

The build succeeded only when I exported OpenVINO_DIR=<INSTALL_DIR>/runtime before running the cmake command. I did not have to do this for x86. To the best of my knowledge, this is not done in setupvars.sh either (activating that environment had no effect on this error).

@wgzintel, @eaidova, can auto-gptq become an optional dependency? I tried --no-build-isolation solution: #415, but it didn't work out.

Maybe this one is the solution https://stackoverflow.com/questions/29222269/is-there-a-way-to-have-a-conditional-requirements-txt-file-for-my-python-applica

NishantPrabhuFujitsu · 2024-05-23T05:02:47Z

Hi @Wovchena

The slow inference issue on ARM has been resolved in FullyConnected nodes use slow reference kernel on ARM #438 with @alvoron's support.
As far as auto-gptq is concerned, I was able to build it from source with this modification to the pip install command:

BUILD_CUDA_EXT=0 pip install -vvv --no-build-isolation --use-deprecated=backtrack-on-build-failures -e .

However, skipping its installation doesn't seem to affect the quality of generated text. In general, the output quality degrades significantly as the model is compressed to lower precisions. That, however, is an issue for another day.

Thank you for your support. Marking this issue as closed.

NishantPrabhuFujitsu changed the title ~~Model conversion issue in causal LM text generation~~ Causal LM text generation example inference errors out Apr 30, 2024

NishantPrabhuFujitsu changed the title ~~Causal LM text generation example inference errors out~~ Causal LM text generation example errors out Apr 30, 2024

NishantPrabhuFujitsu mentioned this issue May 14, 2024

FullyConnected nodes use slow reference kernel on ARM #438

Closed

NishantPrabhuFujitsu mentioned this issue May 22, 2024

the calculation of the FullyConnected layer takes a lot of time openvinotoolkit/openvino#7273

Closed

NishantPrabhuFujitsu closed this as completed May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Causal LM text generation example errors out #406

Causal LM text generation example errors out #406

NishantPrabhuFujitsu commented Apr 30, 2024

Wovchena commented Apr 30, 2024

NishantPrabhuFujitsu commented May 2, 2024 •

edited

Loading

Wovchena commented May 2, 2024

peterchen-intel commented May 6, 2024 •

edited

Loading

NishantPrabhuFujitsu commented May 23, 2024

Causal LM text generation example errors out #406

Causal LM text generation example errors out #406

Comments

NishantPrabhuFujitsu commented Apr 30, 2024

System information

Other things to note

Wovchena commented Apr 30, 2024

NishantPrabhuFujitsu commented May 2, 2024 • edited Loading

More info

Wovchena commented May 2, 2024

peterchen-intel commented May 6, 2024 • edited Loading

NishantPrabhuFujitsu commented May 23, 2024

NishantPrabhuFujitsu commented May 2, 2024 •

edited

Loading

peterchen-intel commented May 6, 2024 •

edited

Loading