Skip to content

Commit

Permalink
Neural Chat API python SDK (#151)
Browse files Browse the repository at this point in the history
* Gha test (#83)

Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com>

* add neural chat code structure

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add more directories

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* delete redundant cli code

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* update directory name

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add config and chatbot

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add server code

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* refine code structure

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add neural chat audio plugin

* added finetuning API.

Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>

* Construct Restful API frameworks for neural chat

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* update

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add readme and server code

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* complete chatbot part of textchat api

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add preprocess normalizer

* update readme

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* fix ut issues

* use NeuralChatBot for restful APIs

Signed-off-by: LetongHan <letong.han@intel.com>

* Fix iomp UT issue

* Fix iomp UT issue

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* added examples for NeuralChat finetuning.

Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>

* fix a typo

* move test scripts to test directory

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* new_feature

* add cli code

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* fix command issue

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add command class into init files

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add model code

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add frontend code

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* support conversation

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add docker files

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add tools code

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* fix command line issues

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* support gpu english asr/tts

* add caching code and update chatbot implementation

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* update chatbot import for restful api

Signed-off-by: LetongHan <letong.han@intel.com>

* add ut

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* fix model name match issue

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* added UT for NeuralChat finetuning.

Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>

* fix model register

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* fix model issue

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add unit test for textchat & voicechat, update api files

Signed-off-by: LetongHan <letong.han@intel.com>

* refactor model code

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* fix typo

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add unit test for finetune & text2image

Signed-off-by: LetongHan <letong.han@intel.com>

* add log for restful api

Signed-off-by: LetongHan <letong.han@intel.com>

* add stress test for restful api

Signed-off-by: LetongHan <letong.han@intel.com>

* update model code

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add chat interface

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* fix typo

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add text chat example

Signed-off-by: LetongHan <letong.han@intel.com>

* refine model code

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add more test cases

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* update GenerationConfig

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add audio test case

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* fix audio issue

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* fix ut issues

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add features

Signed-off-by: XuhuiRen <xuhui.ren@intel.com>

* update audio code

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* update audio test case

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* update readme

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* update retrieval code

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* update restful api files

Signed-off-by: LetongHan <letong.han@intel.com>

* update ut for restful api textchat

Signed-off-by: LetongHan <letong.han@intel.com>

* updated finetune usage to adapt api change.

Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>

* update api based on comments

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* fix audio pipeline, stablize dependencies

* fix cli issues

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* fix test issues

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* fix voicechat issue

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add unit test for python api

Signed-off-by: LetongHan <letong.han@intel.com>

* Fix ut

* fix cli issue

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* fix ut

Signed-off-by: Spycsh <sihan.chen@intel.com>

* modify request type for restful api

Signed-off-by: LetongHan <letong.han@intel.com>

* update server part code

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* update README

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add unit test for neuralchat cli

Signed-off-by: LetongHan <letong.han@intel.com>

* update voicechat restful api

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* update config

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add register

Signed-off-by: XuhuiRen <xuhui.ren@intel.com>

* revision

Signed-off-by: XuhuiRen <xuhui.ren@intel.com>

* revision

Signed-off-by: XuhuiRen <xuhui.ren@intel.com>

* update client code

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* revision

Signed-off-by: XuhuiRen <xuhui.ren@intel.com>

* revision on the file names for NeuralChat and add code description (#143)

* revision

Signed-off-by: XuhuiRen <xuhui.ren@intel.com>

* revision

Signed-off-by: XuhuiRen <xuhui.ren@intel.com>

---------

Signed-off-by: XuhuiRen <xuhui.ren@intel.com>

* modify textchat api

Signed-off-by: LetongHan <letong.han@intel.com>

* fix syntax error of base_model

Signed-off-by: LetongHan <letong.han@intel.com>

* revision

Signed-off-by: XuhuiRen <xuhui.ren@intel.com>

* add batching algorithm for tts on long texts

Signed-off-by: Spycsh <sihan.chen@intel.com>

* Yuxiang/neural chat api (#146)

* modify Dockerfile in finetuning

* Update README.md

* Update Dockerfile

* Update README.md

* Update Dockerfile

---------

Co-authored-by: sys-lpot-val <sys_lpot_val@intel.com>

* fix neuralchat client command issue

Signed-off-by: LetongHan <letong.han@intel.com>

* added amp in optimization.

Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>

* enable textchat restful api & client service

Signed-off-by: LetongHan <letong.han@intel.com>

* update README of server Pyhton API of textchat

Signed-off-by: LetongHan <letong.han@intel.com>

* enable voicechat and add unit test for restful api

Signed-off-by: LetongHan <letong.han@intel.com>

* Update langchain.py

fix small typo

* fix typo

* Update SensitiveChecker.py

* add sensitive dict

Signed-off-by: XuhuiRen <xuhui.ren@intel.com>

* fix

Signed-off-by: XuhuiRen <xuhui.ren@intel.com>

* fix path issue for sensitive word dict

* revision

* Implemented optimization API and added UT for it.

Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>

* update code and readme

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* update ut for finetune restful api

Signed-off-by: LetongHan <letong.han@intel.com>

* enable finetune on neuralchat client

Signed-off-by: LetongHan <letong.han@intel.com>

* update textchat example

Signed-off-by: LetongHan <letong.han@intel.com>

* add rag example (#150)

Signed-off-by: XuhuiRen <xuhui.ren@intel.com>

* add helloworld and talkingbot examples

Signed-off-by: root <root@aia-sdp-spr-10296.jf.intel.com>
Signed-off-by: lvliang-intel <liang1.lv@intel.com>

* update example and readme

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

* fix retrieval code introduced issues

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

---------

Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com>
Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
Signed-off-by: LetongHan <letong.han@intel.com>
Signed-off-by: XuhuiRen <xuhui.ren@intel.com>
Signed-off-by: Spycsh <sihan.chen@intel.com>
Signed-off-by: root <root@aia-sdp-spr-10296.jf.intel.com>
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: VincyZhang <wenxin.zhang@intel.com>
Co-authored-by: Spycsh <sihan.chen@intel.com>
Co-authored-by: Ye, Xinyu <xinyu.ye@intel.com>
Co-authored-by: LetongHan <letong.han@intel.com>
Co-authored-by: XuhuiRen <xuhui.ren@intel.com>
Co-authored-by: XuhuiRen <44249229+XuhuiRen@users.noreply.github.com>
Co-authored-by: Liangyx2 <106130696+Liangyx2@users.noreply.github.com>
Co-authored-by: sys-lpot-val <sys_lpot_val@intel.com>
  • Loading branch information
9 people committed Aug 18, 2023
1 parent e1da7e8 commit 08ba5d8
Show file tree
Hide file tree
Showing 310 changed files with 45,364 additions and 0 deletions.
214 changes: 214 additions & 0 deletions neural_chat/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
<div align="center">

Intel® Neural Chat
===========================
<h3> An open-source Python library that empowers you to customize your chatbot with a diverse range of plugins.</h3>

---
<div align="left">

NeuralChat is a general chat framework designed to create your own chatbot that can be efficiently deployed on Intel CPU/GPU, Habana HPU and Nvidia GPU. NeuralChat is built on top of large language models (LLMs) and provides a set of strong capabilities including LLM fine-tuning and LLM inference with a rich set of plugins such as knowledge retrieval, query caching, etc. With NeuralChat, you can easily create a text-based or audio-based chatbot and deploy on Intel platforms rapidly. Here is the flow of NeuralChat:

<a target="_blank" href="./assets/pictures/neuralchat.png">
<p align="center">
<img src="./assets/pictures/neuralchat.png" alt="NeuralChat" width=600 height=200>
</p>
</a>

NeuralChat is under active development with some experimental features (APIs are subject to change).

# Installation

NeuralChat is seamlessly integrated into the Intel Extension for Transformers. Getting started is quick and simple, just simply install 'intel-extension-for-transformers'.

## Install from Pypi
```bash
pip install intel-extension-for-transformers
```
> For more installation method, please refer to [Installation Page](../docs/installation.md)
<a name="quickstart"></a>
# Quick Start

Users can have a try of NeuralChat with [NeuralChat Command Line](./cli/README.md) or Python API.

## Install from source

```
export PYTHONPATH=<PATH TO intel-extension-for-transformers>
conda create -n neural_chat python==3.10
pip install -r requirements.txt
pip install librosa==0.10.0
```

## Inference

### Text Chat

Giving NeuralChat the textual instruction, it will respond with the textual response.

**command line experience**

```shell
neuralchat textchat --query "Tell me about Intel Xeon Scalable Processors."
```

**Python API experience**

```python
>>> from neural_chat import build_chatbot
>>> chatbot = build_chatbot()
>>> response = chatbot.predict("Tell me about Intel Xeon Scalable Processors.")
```

### Text Chat With Retreival

Giving NeuralChat the textual instruction, it will respond with the textual response.

**command line experience**

```shell
neuralchat textchat --retrieval_type sparse --retrieval_document_path ./assets/docs/ --query "Tell me about Intel Xeon Scalable Processors."
```

**Python API experience**

```python
>>> from neural_chat import PipelineConfig
>>> from neural_chat import build_chatbot
>>> config = PipelineConfig(retrieval=True, retrieval_document_path="./assets/docs/")
>>> chatbot = build_chatbot(config)
>>> response = chatbot.predict("How many cores does the Intel® Xeon® Platinum 8480+ Processor have in total?")
```

### Voice Chat

In the context of voice chat, users have the option to engage in various modes: utilizing input audio and receiving output audio, employing input audio and receiving textual output, or providing input in textual form and receiving audio output.

**command line experience**

- audio in and audio output
```shell
neuralchat voicechat --audio_input_path ./assets/audio/pat.wav --audio_output_path ./response.wav
```

- audio in and text output
```shell
neuralchat voicechat --audio_input_path ./assets/audio/pat.wav
```

- text in and audio output
```shell
neuralchat voicechat --query "Tell me about Intel Xeon Scalable Processors." --audio_output_path ./response.wav
```


**Python API experience**

For the Python API code, users have the option to enable different voice chat modes by setting audio_input to True for input or audio_output to True for output.

```python
>>> from neural_chat import PipelineConfig
>>> from neural_chat import build_chatbot
>>> config = PipelineConfig(audio_input=True, audio_output=True)
>>> chatbot = build_chatbot(config)
>>> result = chatbot.predict(query="./assets/audio/pat.wav")
```

We provide multiple plugins to augment the chatbot on top of LLM inference. Our plugins support [knowledge retrieval](./pipeline/plugins/retrievers/), [query caching](./pipeline/plugins/caching/), [prompt optimization](./pipeline/plugins/prompts/), [safety checker](./pipeline/plugins/security/), etc. Knowledge retrieval consists of document indexing for efficient retrieval of relevant information, including Dense Indexing based on LangChain and Sparse Indexing based on fastRAG, document rankers to prioritize the most relevant responses. Query caching enables the fast path to get the response without LLM inference and therefore improves the chat response time. Prompt optimization suppots auto prompt engineering to improve user prompts, instruction optimization to enhance the model's performance, and memory controller for efficient memory utilization.


## Finetuning

Finetune the pretrained large language model (LLM) with the instruction-following dataset for creating the customized chatbot is very easy for NeuralChat.

**command line experience**

```shell
neuralchat finetune --base_model "meta-llama/Llama-2-7b-chat-hf" --config pipeline/finetuning/config/finetuning.yaml
```


**Python API experience**

```python
>>> from neural_chat import FinetuningConfig
>>> from neural_chat import finetune_model
>>> finetune_cfg = FinetuningConfig()
>>> finetuned_model = finetune_model(finetune_cfg)
```

## Quantization

NeuralChat provides three quantization approaches respectively (PostTrainingDynamic, PostTrainingStatic, QuantAwareTraining) based on [Intel® Neural Compressor](https://github.com/intel/neural-compressor).

**command line experience**

```shell
neuralchat optimize --base_model "meta-llama/Llama-2-7b-chat-hf" --config pipeline/optimization/config/optimization.yaml
```


**Python API experience**

```python
>>> from neural_chat import OptimizationConfig
>>> from neural_chat import optimize_model
>>> opt_cfg = OptimizationConfig()
>>> optimized_model = optimize_model(opt_cfg)
```


<a name="quickstartserver"></a>
# Quick Start Server

Users can have a try of NeuralChat server with [NeuralChat Server Command Line](./server/README.md).


**Start Server**
- Command Line (Recommended)
```shell
neuralchat_server start --config_file ./server/config/neuralchat.yaml
```

- Python API
```python
from neural_chat import NeuralChatServerExecutor
server_executor = NeuralChatServerExecutor()
server_executor(config_file="./server/config/neuralchat.yaml", log_file="./log/neuralchat.log")
```

**Access Text Chat Service**

- Command Line
```shell
neuralchat_client textchat --server_ip 127.0.0.1 --port 8000 --query "Tell me about Intel Xeon Scalable Processors."
```

- Python API
```python
from neural_chat import TextChatClientExecutor
executor = TextChatClientExecutor()
result = executor(
prompt="Tell me about Intel Xeon Scalable Processors.",
server_ip="127.0.0.1",
port=8000)
print(result.text)
```

- Curl with Restful API
```shell
curl -X POST -H "Content-Type: application/json" -d '{"prompt": "Tell me about Intel Xeon Scalable Processors."}' http://127.0.0.1:80/v1/chat/completions
```

**Access Voice Chat Service**

```shell
neuralchat_client voicechat --server_ip 127.0.0.1 --port 8000 --audio_input_path ./assets/audio/pat.wav --audio_output_path response.wav
```

**Access Finetune Service**
```shell
neuralchat_client finetune --server_ip 127.0.0.1 --port 8000 --model_name_or_path "facebook/opt-125m" --train_file "/path/to/finetune/dataset.json"
```

27 changes: 27 additions & 0 deletions neural_chat/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Copyright (c) 2023 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from .config import PipelineConfig
from .config import GenerationConfig
from .config import FinetuningConfig
from .config import OptimizationConfig
from .chatbot import build_chatbot
from .chatbot import finetune_model
from .chatbot import optimize_model
from .server.neuralchat_server import NeuralChatServerExecutor
from .server.neuralchat_client import TextChatClientExecutor, VoiceChatClientExecutor, FinetuingClientExecutor

Binary file added neural_chat/assets/audio/pat.wav
Binary file not shown.
Binary file added neural_chat/assets/audio/welcome.wav
Binary file not shown.
Loading

0 comments on commit 08ba5d8

Please sign in to comment.