Multiple issues with setting up the text chatbot service on SPR

Hi,

I have followed this instruction [intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/docs/notebooks/setup_text_chatbot_service_on_spr.ipynb at main · intel/intel-extension-for-transformers (github.com)](https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat/docs/notebooks/setup_text_chatbot_service_on_spr.ipynb) and written down a couple of issues with potential solutions, which you may want to consider implementing.

I am using Ubuntu 22.04 LTS and Python 3.10.12.

1. Setup backend / Setup environment
`!git clone https://github.com/intel/intel-extension-for-transformers.git`

The instruction says to used HEAD on the master branch, even though the repository is being actively developed. This causes problems like this [422 Unprocessable Entity using Neural Chat via OpenAI interface with meta--lama/llama-2-7b-chat-hf · Issue #1288 · intel/intel-extension-for-transformers (github.com)](https://github.com/intel/intel-extension-for-transformers/issues/1288)

I have also encountered the aforementioned issue, and then decided to use the latest release which is v1.3.1.
It would be useful to add information to the instruction about a commit/release it was validated with.

I was continuing with v1.3.1 from now on.


2.  Setup backend / Setup environment
```
%cd ./intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/ 
!pip install -r requirements.txt
```

The requirements.txt installs `torch==2.1.0`, but once the backend server is started, it complains about torch compatility:
> ERROR! Intel® Extension for PyTorch* needs to work with PyTorch 2.2.*, but PyTorch 2.1.0+cu121 is found. Please switch to the matching version and run again.

The work-around is to reinstall torch and its dependencies manually to get compatible versions:
> pip uninstall torch torchaudio torchvision xformers -y
pip install torch torchaudio torchvision xformers


3. Setup backend / Setup environment
`!pip install nest_asyncio`

It is a bit confusing that `nest_syncio` has to be installed manually and is not added to requirements.txt for the backend.


4. Deploy frontend on your server / Install the required Python dependencies
`!pip install -r ./examples/deployment/textbot/frontend/requirements.txt`

The requirements.txt again installs `torch==2.1.0`, which makes the backend unusable. Please consider using compatible packages for both components, or maybe suggest in the instruction to create a separate Python virtual environment for each component if the same host is used.

5. Deploy frontend on your server / Run the frontend
`!nohup python app.py &`

 There is an issue with `fastchat.utils` when starting app.py:
> Traceback (most recent call last):
  File "[…]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/app.py", line 38, in <module>
    from fastchat.utils import (
ImportError: cannot import name 'violates_moderation' from 'fastchat.utils' ([…]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/fastchat/utils.py)

I have worked around this by removing the reference to `violates_moderation` in app.py, but you may want to investigate the problem further.

6.  Deploy frontend on your server / Run the frontend
`!nohup python app.py &`

There is an issue with `gradio` package which occurs when a NeuralChat URL is loaded in a browser:

> 2024-02-21 13:18:12 | INFO | gradio_web_server | Models: ['Intel/neural-chat-7b-v3-1']
> 2024-02-21 13:18:13 | ERROR | stderr | sys:1: GradioDeprecationWarning: The `style` method is deprecated. Please set these arguments in the constructor instead.
> 2024-02-21 13:18:13 | INFO | stdout | Running on local URL:  http://0.0.0.0:8080/
> 2024-02-21 13:18:13 | INFO | stdout |
> 2024-02-21 13:18:13 | INFO | stdout | To create a public link, set `share=True` in `launch()`.
> 2024-02-21 13:18:48 | INFO | stdout | 1 validation error for PredictBody
> 2024-02-21 13:18:48 | INFO | stdout | event_id
> 2024-02-21 13:18:48 | INFO | stdout |   Field required [type=missing, input_value={'data': [{}], 'event_dat...on_hash': 'w9eie3cvduh'}, input_type=dict]
> 2024-02-21 13:18:48 | INFO | stdout |     For further information visit https://errors.pydantic.dev/2.6/v/missing


The solution to this issue is updating `gradio` to at least version 3.50.2.
The incompatible version is installed in the source code of app.py, so this line has to be changed:
> os.system("pip install gradio==3.36.0")

7.  Deploy frontend on your server / Run the frontend
` !nohup python app.py &`

After the NeuralChat URL successfully loads in the browser, the chat replies only with:

> NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE. (error_code: 4)

It is cause by the error in the backend:
> ModuleNotFoundError: No module named 'neural_speed'

To fix this, one may add `neural_speed` to `intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/requirements.txt`, or install the package manually.

8. Just wondering what is the reason that the frontend is able to handle `Intel/neural-chat-7b-v3-1` and `meta-llama/Llama-2-7b-chat-hf`, but for example fails with `meta-llama/Llama-2-13b-chat-hf`:


> 2024-02-21 13:39:08 | ERROR | stderr | Traceback (most recent call last):
> 2024-02-21 13:39:08 | ERROR | stderr |   File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/gradio/queueing.py", line 407, in call_prediction
> 2024-02-21 13:39:08 | ERROR | stderr |     output = await route_utils.call_process_api(
> 2024-02-21 13:39:08 | ERROR | stderr |   File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/gradio/route_utils.py", line 226, in call_process_api
> 2024-02-21 13:39:08 | ERROR | stderr |     output = await app.get_blocks().process_api(
> 2024-02-21 13:39:08 | ERROR | stderr |   File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/gradio/blocks.py", line 1550, in process_api
> 2024-02-21 13:39:08 | ERROR | stderr |     result = await self.call_function(
> 2024-02-21 13:39:08 | ERROR | stderr |   File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/gradio/blocks.py", line 1199, in call_function
> 2024-02-21 13:39:08 | ERROR | stderr |     prediction = await utils.async_iteration(iterator)
> 2024-02-21 13:39:08 | ERROR | stderr |   File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/gradio/utils.py", line 519, in async_iteration
> 2024-02-21 13:39:08 | ERROR | stderr |     return await iterator.__anext__()
> 2024-02-21 13:39:08 | ERROR | stderr |   File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/gradio/utils.py", line 512, in __anext__
> 2024-02-21 13:39:08 | ERROR | stderr |     return await anyio.to_thread.run_sync(
> 2024-02-21 13:39:08 | ERROR | stderr |   File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
> 2024-02-21 13:39:08 | ERROR | stderr |     return await get_async_backend().run_sync_in_worker_thread(
> 2024-02-21 13:39:08 | ERROR | stderr |   File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
> 2024-02-21 13:39:08 | ERROR | stderr |     return await future
> 2024-02-21 13:39:08 | ERROR | stderr |   File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run
> 2024-02-21 13:39:08 | ERROR | stderr |     result = context.run(func, *args)
> 2024-02-21 13:39:08 | ERROR | stderr |   File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/gradio/utils.py", line 495, in run_sync_iterator_async
> 2024-02-21 13:39:08 | ERROR | stderr |     return next(iterator)
> 2024-02-21 13:39:08 | ERROR | stderr |   File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/veee/lib/python3.10/site-packages/gradio/utils.py", line 649, in gen_wrapper
> 2024-02-21 13:39:08 | ERROR | stderr |     yield from f(*args, **kwargs)
> 2024-02-21 13:39:08 | ERROR | stderr |   File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/app.py", line 331, in http_bot
> 2024-02-21 13:39:08 | ERROR | stderr |     new_state = get_conv_template(model_name.split('/')[-1])
> 2024-02-21 13:39:08 | ERROR | stderr |   File "[...]/intel-extension-for-transformers-1.3.1/intel_extension_for_transformers/neural_chat/ui/gradio/basic/./conversation.py", line 300, in get_conv_template
> 2024-02-21 13:39:08 | ERROR | stderr |     return conv_templates[name].copy()
> 2024-02-21 13:39:08 | ERROR | stderr | KeyError: 'Llama-2-13b-chat-hf'

The backend seems to load `meta-llama/Llama-2-13b-chat-hf` correctly. Maybe enabling other models from the same family in the frontend would not require many changes.

Thank you for taking the time to read trough all this text :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiple issues with setting up the text chatbot service on SPR #1300

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multiple issues with setting up the text chatbot service on SPR #1300

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions