Skip to content
This repository was archived by the owner on Oct 25, 2024. It is now read-only.
This repository was archived by the owner on Oct 25, 2024. It is now read-only.

Multiple issues with setting up the text chatbot service on SPR #1300

@tbykowsk

Description

@tbykowsk

Hi,

I have followed this instruction intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/docs/notebooks/setup_text_chatbot_service_on_spr.ipynb at main · intel/intel-extension-for-transformers (github.com) and written down a couple of issues with potential solutions, which you may want to consider implementing.

I am using Ubuntu 22.04 LTS and Python 3.10.12.

  1. Setup backend / Setup environment
    !git clone https://github.com/intel/intel-extension-for-transformers.git

The instruction says to used HEAD on the master branch, even though the repository is being actively developed. This causes problems like this 422 Unprocessable Entity using Neural Chat via OpenAI interface with meta--lama/llama-2-7b-chat-hf · Issue #1288 · intel/intel-extension-for-transformers (github.com)

I have also encountered the aforementioned issue, and then decided to use the latest release which is v1.3.1.
It would be useful to add information to the instruction about a commit/release it was validated with.

I was continuing with v1.3.1 from now on.

  1. Setup backend / Setup environment
%cd ./intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/ 
!pip install -r requirements.txt

The requirements.txt installs torch==2.1.0, but once the backend server is started, it complains about torch compatility:

ERROR! Intel® Extension for PyTorch* needs to work with PyTorch 2.2.*, but PyTorch 2.1.0+cu121 is found. Please switch to the matching version and run again.

The work-around is to reinstall torch and its dependencies manually to get compatible versions:

pip uninstall torch torchaudio torchvision xformers -y
pip install torch torchaudio torchvision xformers

  1. Setup backend / Setup environment
    !pip install nest_asyncio

It is a bit confusing that nest_syncio has to be installed manually and is not added to requirements.txt for the backend.

  1. Deploy frontend on your server / Install the required Python dependencies
    !pip install -r ./examples/deployment/textbot/frontend/requirements.txt

The requirements.txt again installs torch==2.1.0, which makes the backend unusable. Please consider using compatible packages for both components, or maybe suggest in the instruction to create a separate Python virtual environment for each component if the same host is used.

  1. Deploy frontend on your server / Run the frontend
    !nohup python app.py &

There is an issue with fastchat.utils when starting app.py:

Traceback (most recent call last):
File "[…]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/app.py", line 38, in
from fastchat.utils import (
ImportError: cannot import name 'violates_moderation' from 'fastchat.utils' ([…]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/fastchat/utils.py)

I have worked around this by removing the reference to violates_moderation in app.py, but you may want to investigate the problem further.

  1. Deploy frontend on your server / Run the frontend
    !nohup python app.py &

There is an issue with gradio package which occurs when a NeuralChat URL is loaded in a browser:

2024-02-21 13:18:12 | INFO | gradio_web_server | Models: ['Intel/neural-chat-7b-v3-1']
2024-02-21 13:18:13 | ERROR | stderr | sys:1: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead.
2024-02-21 13:18:13 | INFO | stdout | Running on local URL: http://0.0.0.0:8080/
2024-02-21 13:18:13 | INFO | stdout |
2024-02-21 13:18:13 | INFO | stdout | To create a public link, set share=True in launch().
2024-02-21 13:18:48 | INFO | stdout | 1 validation error for PredictBody
2024-02-21 13:18:48 | INFO | stdout | event_id
2024-02-21 13:18:48 | INFO | stdout | Field required [type=missing, input_value={'data': [{}], 'event_dat...on_hash': 'w9eie3cvduh'}, input_type=dict]
2024-02-21 13:18:48 | INFO | stdout | For further information visit https://errors.pydantic.dev/2.6/v/missing

The solution to this issue is updating gradio to at least version 3.50.2.
The incompatible version is installed in the source code of app.py, so this line has to be changed:

os.system("pip install gradio==3.36.0")

  1. Deploy frontend on your server / Run the frontend
    !nohup python app.py &

After the NeuralChat URL successfully loads in the browser, the chat replies only with:

NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE. (error_code: 4)

It is cause by the error in the backend:

ModuleNotFoundError: No module named 'neural_speed'

To fix this, one may add neural_speed to intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/requirements.txt, or install the package manually.

  1. Just wondering what is the reason that the frontend is able to handle Intel/neural-chat-7b-v3-1 and meta-llama/Llama-2-7b-chat-hf, but for example fails with meta-llama/Llama-2-13b-chat-hf:

2024-02-21 13:39:08 | ERROR | stderr | Traceback (most recent call last):
2024-02-21 13:39:08 | ERROR | stderr | File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/gradio/queueing.py", line 407, in call_prediction
2024-02-21 13:39:08 | ERROR | stderr | output = await route_utils.call_process_api(
2024-02-21 13:39:08 | ERROR | stderr | File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/gradio/route_utils.py", line 226, in call_process_api
2024-02-21 13:39:08 | ERROR | stderr | output = await app.get_blocks().process_api(
2024-02-21 13:39:08 | ERROR | stderr | File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/gradio/blocks.py", line 1550, in process_api
2024-02-21 13:39:08 | ERROR | stderr | result = await self.call_function(
2024-02-21 13:39:08 | ERROR | stderr | File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/gradio/blocks.py", line 1199, in call_function
2024-02-21 13:39:08 | ERROR | stderr | prediction = await utils.async_iteration(iterator)
2024-02-21 13:39:08 | ERROR | stderr | File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/gradio/utils.py", line 519, in async_iteration
2024-02-21 13:39:08 | ERROR | stderr | return await iterator.anext()
2024-02-21 13:39:08 | ERROR | stderr | File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/gradio/utils.py", line 512, in anext
2024-02-21 13:39:08 | ERROR | stderr | return await anyio.to_thread.run_sync(
2024-02-21 13:39:08 | ERROR | stderr | File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
2024-02-21 13:39:08 | ERROR | stderr | return await get_async_backend().run_sync_in_worker_thread(
2024-02-21 13:39:08 | ERROR | stderr | File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
2024-02-21 13:39:08 | ERROR | stderr | return await future
2024-02-21 13:39:08 | ERROR | stderr | File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run
2024-02-21 13:39:08 | ERROR | stderr | result = context.run(func, *args)
2024-02-21 13:39:08 | ERROR | stderr | File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/gradio/utils.py", line 495, in run_sync_iterator_async
2024-02-21 13:39:08 | ERROR | stderr | return next(iterator)
2024-02-21 13:39:08 | ERROR | stderr | File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/gradio/utils.py", line 649, in gen_wrapper
2024-02-21 13:39:08 | ERROR | stderr | yield from f(*args, **kwargs)
2024-02-21 13:39:08 | ERROR | stderr | File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/app.py", line 331, in http_bot
2024-02-21 13:39:08 | ERROR | stderr | new_state = get_conv_template(model_name.split('/')[-1])
2024-02-21 13:39:08 | ERROR | stderr | File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/./conversation.py", line 300, in get_conv_template
2024-02-21 13:39:08 | ERROR | stderr | return conv_templates[name].copy()
2024-02-21 13:39:08 | ERROR | stderr | KeyError: 'Llama-2-13b-chat-hf'

The backend seems to load meta-llama/Llama-2-13b-chat-hf correctly. Maybe enabling other models from the same family in the frontend would not require many changes.

Thank you for taking the time to read trough all this text :)

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions