Neural Chat API python SDK (#151)

* Gha test (#83) Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com> * add neural chat code structure Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * add more directories Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * delete redundant cli code Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * update directory name Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * add config and chatbot Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * add server code Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * refine code structure Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * add neural chat audio plugin * added finetuning API. Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> * Construct Restful API frameworks for neural chat Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * update Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * add readme and server code Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * complete chatbot part of textchat api Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * add preprocess normalizer * update readme Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * fix ut issues * use NeuralChatBot for restful APIs Signed-off-by: LetongHan <letong.han@intel.com> * Fix iomp UT issue * Fix iomp UT issue Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * added examples for NeuralChat finetuning. Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> * fix a typo * move test scripts to test directory Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * new_feature * add cli code Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * fix command issue Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * add command class into init files Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * add model code Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * add frontend code Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * support conversation Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * add docker files Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * add tools code Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * fix command line issues Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * support gpu english asr/tts * add caching code and update chatbot implementation Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * update chatbot import for restful api Signed-off-by: LetongHan <letong.han@intel.com> * add ut Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * fix model name match issue Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * added UT for NeuralChat finetuning. Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> * fix model register Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * fix model issue Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * add unit test for textchat & voicechat, update api files Signed-off-by: LetongHan <letong.han@intel.com> * refactor model code Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * fix typo Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * add unit test for finetune & text2image Signed-off-by: LetongHan <letong.han@intel.com> * add log for restful api Signed-off-by: LetongHan <letong.han@intel.com> * add stress test for restful api Signed-off-by: LetongHan <letong.han@intel.com> * update model code Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * add chat interface Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * fix typo Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * add text chat example Signed-off-by: LetongHan <letong.han@intel.com> * refine model code Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * add more test cases Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * update GenerationConfig Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * add audio test case Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * fix audio issue Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * fix ut issues Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * add features Signed-off-by: XuhuiRen <xuhui.ren@intel.com> * update audio code Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * update audio test case Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * update readme Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * update retrieval code Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * update restful api files Signed-off-by: LetongHan <letong.han@intel.com> * update ut for restful api textchat Signed-off-by: LetongHan <letong.han@intel.com> * updated finetune usage to adapt api change. Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> * update api based on comments Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * fix audio pipeline, stablize dependencies * fix cli issues Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * fix test issues Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * fix voicechat issue Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * add unit test for python api Signed-off-by: LetongHan <letong.han@intel.com> * Fix ut * fix cli issue Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * fix ut Signed-off-by: Spycsh <sihan.chen@intel.com> * modify request type for restful api Signed-off-by: LetongHan <letong.han@intel.com> * update server part code Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * update README Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * add unit test for neuralchat cli Signed-off-by: LetongHan <letong.han@intel.com> * update voicechat restful api Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * update config Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * add register Signed-off-by: XuhuiRen <xuhui.ren@intel.com> * revision Signed-off-by: XuhuiRen <xuhui.ren@intel.com> * revision Signed-off-by: XuhuiRen <xuhui.ren@intel.com> * update client code Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * revision Signed-off-by: XuhuiRen <xuhui.ren@intel.com> * revision on the file names for NeuralChat and add code description (#143) * revision Signed-off-by: XuhuiRen <xuhui.ren@intel.com> * revision Signed-off-by: XuhuiRen <xuhui.ren@intel.com> --------- Signed-off-by: XuhuiRen <xuhui.ren@intel.com> * modify textchat api Signed-off-by: LetongHan <letong.han@intel.com> * fix syntax error of base_model Signed-off-by: LetongHan <letong.han@intel.com> * revision Signed-off-by: XuhuiRen <xuhui.ren@intel.com> * add batching algorithm for tts on long texts Signed-off-by: Spycsh <sihan.chen@intel.com> * Yuxiang/neural chat api (#146) * modify Dockerfile in finetuning * Update README.md * Update Dockerfile * Update README.md * Update Dockerfile --------- Co-authored-by: sys-lpot-val <sys_lpot_val@intel.com> * fix neuralchat client command issue Signed-off-by: LetongHan <letong.han@intel.com> * added amp in optimization. Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> * enable textchat restful api & client service Signed-off-by: LetongHan <letong.han@intel.com> * update README of server Pyhton API of textchat Signed-off-by: LetongHan <letong.han@intel.com> * enable voicechat and add unit test for restful api Signed-off-by: LetongHan <letong.han@intel.com> * Update langchain.py fix small typo * fix typo * Update SensitiveChecker.py * add sensitive dict Signed-off-by: XuhuiRen <xuhui.ren@intel.com> * fix Signed-off-by: XuhuiRen <xuhui.ren@intel.com> * fix path issue for sensitive word dict * revision * Implemented optimization API and added UT for it. Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> * update code and readme Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * update ut for finetune restful api Signed-off-by: LetongHan <letong.han@intel.com> * enable finetune on neuralchat client Signed-off-by: LetongHan <letong.han@intel.com> * update textchat example Signed-off-by: LetongHan <letong.han@intel.com> * add rag example (#150) Signed-off-by: XuhuiRen <xuhui.ren@intel.com> * add helloworld and talkingbot examples Signed-off-by: root <root@aia-sdp-spr-10296.jf.intel.com> Signed-off-by: lvliang-intel <liang1.lv@intel.com> * update example and readme Signed-off-by: lvliang-intel <liang1.lv@intel.com> * fix retrieval code introduced issues Signed-off-by: lvliang-intel <liang1.lv@intel.com> --------- Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com> Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> Signed-off-by: LetongHan <letong.han@intel.com> Signed-off-by: XuhuiRen <xuhui.ren@intel.com> Signed-off-by: Spycsh <sihan.chen@intel.com> Signed-off-by: root <root@aia-sdp-spr-10296.jf.intel.com> Signed-off-by: lvliang-intel <liang1.lv@intel.com> Co-authored-by: VincyZhang <wenxin.zhang@intel.com> Co-authored-by: Spycsh <sihan.chen@intel.com> Co-authored-by: Ye, Xinyu <xinyu.ye@intel.com> Co-authored-by: LetongHan <letong.han@intel.com> Co-authored-by: XuhuiRen <xuhui.ren@intel.com> Co-authored-by: XuhuiRen <44249229+XuhuiRen@users.noreply.github.com> Co-authored-by: Liangyx2 <106130696+Liangyx2@users.noreply.github.com> Co-authored-by: sys-lpot-val <sys_lpot_val@intel.com>
intel · Aug 18, 2023 · 08ba5d8 · 08ba5d8
1 parent e1da7e8
commit 08ba5d8
Show file tree

Hide file tree

Showing 310 changed files with 45,364 additions and 0 deletions.
diff --git a/neural_chat/README.md b/neural_chat/README.md
@@ -0,0 +1,214 @@
+<div align="center">
+
+Intel® Neural Chat
+===========================
+<h3> An open-source Python library that empowers you to customize your chatbot with a diverse range of plugins.</h3>
+
+---
+<div align="left">
+
+NeuralChat is a general chat framework designed to create your own chatbot that can be efficiently deployed on Intel CPU/GPU, Habana HPU and Nvidia GPU. NeuralChat is built on top of large language models (LLMs) and provides a set of strong capabilities including LLM fine-tuning and LLM inference with a rich set of plugins such as knowledge retrieval, query caching, etc. With NeuralChat, you can easily create a text-based or audio-based chatbot and deploy on Intel platforms rapidly. Here is the flow of NeuralChat:
+
+<a target="_blank" href="./assets/pictures/neuralchat.png">
+<p align="center">
+  <img src="./assets/pictures/neuralchat.png" alt="NeuralChat" width=600 height=200>
+</p>
+</a>
+
+NeuralChat is under active development with some experimental features (APIs are subject to change).
+
+# Installation
+
+NeuralChat is seamlessly integrated into the Intel Extension for Transformers. Getting started is quick and simple, just simply install 'intel-extension-for-transformers'.
+
+## Install from Pypi
+```bash
+pip install intel-extension-for-transformers
+```
+> For more installation method, please refer to [Installation Page](../docs/installation.md)
+
+<a name="quickstart"></a>
+# Quick Start
+
+Users can have a try of NeuralChat with [NeuralChat Command Line](./cli/README.md) or Python API.
+
+## Install from source
+
+```
+export PYTHONPATH=<PATH TO intel-extension-for-transformers>
+conda create -n neural_chat python==3.10
+pip install -r requirements.txt
+pip install librosa==0.10.0
+```
+
+## Inference
+
+### Text Chat
+
+Giving NeuralChat the textual instruction, it will respond with the textual response.
+
+**command line experience**
+
+```shell
+neuralchat textchat --query "Tell me about Intel Xeon Scalable Processors."
+```
+
+**Python API experience**
+
+```python
+>>> from neural_chat import build_chatbot
+>>> chatbot = build_chatbot()
+>>> response = chatbot.predict("Tell me about Intel Xeon Scalable Processors.")
+```
+
+### Text Chat With Retreival
+
+Giving NeuralChat the textual instruction, it will respond with the textual response.
+
+**command line experience**
+
+```shell
+neuralchat textchat --retrieval_type sparse --retrieval_document_path ./assets/docs/ --query "Tell me about Intel Xeon Scalable Processors."
+```
+
+**Python API experience**
+
+```python
+>>> from neural_chat import PipelineConfig
+>>> from neural_chat import build_chatbot
+>>> config = PipelineConfig(retrieval=True, retrieval_document_path="./assets/docs/")
+>>> chatbot = build_chatbot(config)
+>>> response = chatbot.predict("How many cores does the Intel® Xeon® Platinum 8480+ Processor have in total?")
+```
+
+### Voice Chat
+
+In the context of voice chat, users have the option to engage in various modes: utilizing input audio and receiving output audio, employing input audio and receiving textual output, or providing input in textual form and receiving audio output.
+
+**command line experience**
+
+- audio in and audio output
+```shell
+neuralchat voicechat --audio_input_path ./assets/audio/pat.wav --audio_output_path ./response.wav
+```
+
+- audio in and text output
+```shell
+neuralchat voicechat --audio_input_path ./assets/audio/pat.wav
+```
+
+- text in and audio output
+```shell
+neuralchat voicechat --query "Tell me about Intel Xeon Scalable Processors." --audio_output_path ./response.wav
+```
+
+
+**Python API experience**
+
+For the Python API code, users have the option to enable different voice chat modes by setting audio_input to True for input or audio_output to True for output.
+
+```python
+>>> from neural_chat import PipelineConfig
+>>> from neural_chat import build_chatbot
+>>> config = PipelineConfig(audio_input=True, audio_output=True)
+>>> chatbot = build_chatbot(config)
+>>> result = chatbot.predict(query="./assets/audio/pat.wav")
+```
+
+We provide multiple plugins to augment the chatbot on top of LLM inference. Our plugins support [knowledge retrieval](./pipeline/plugins/retrievers/), [query caching](./pipeline/plugins/caching/), [prompt optimization](./pipeline/plugins/prompts/), [safety checker](./pipeline/plugins/security/), etc. Knowledge retrieval consists of document indexing for efficient retrieval of relevant information, including Dense Indexing based on LangChain and Sparse Indexing based on fastRAG, document rankers to prioritize the most relevant responses. Query caching enables the fast path to get the response without LLM inference and therefore improves the chat response time. Prompt optimization suppots auto prompt engineering to improve user prompts, instruction optimization to enhance the model's performance, and memory controller for efficient memory utilization.
+
+
+## Finetuning
+
+Finetune the pretrained large language model (LLM) with the instruction-following dataset for creating the customized chatbot is very easy for NeuralChat.
+
+**command line experience**
+
+```shell
+neuralchat finetune --base_model "meta-llama/Llama-2-7b-chat-hf" --config pipeline/finetuning/config/finetuning.yaml
+```
+
+
+**Python API experience**
+
+```python
+>>> from neural_chat import FinetuningConfig
+>>> from neural_chat import finetune_model
+>>> finetune_cfg = FinetuningConfig()
+>>> finetuned_model = finetune_model(finetune_cfg)
+```
+
+## Quantization
+
+NeuralChat provides three quantization approaches respectively (PostTrainingDynamic, PostTrainingStatic, QuantAwareTraining) based on [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
+
+**command line experience**
+
+```shell
+neuralchat optimize --base_model "meta-llama/Llama-2-7b-chat-hf" --config pipeline/optimization/config/optimization.yaml
+```
+
+
+**Python API experience**
+
+```python
+>>> from neural_chat import OptimizationConfig
+>>> from neural_chat import optimize_model
+>>> opt_cfg = OptimizationConfig()
+>>> optimized_model = optimize_model(opt_cfg)
+```
+
+
+<a name="quickstartserver"></a>
+# Quick Start Server
+
+Users can have a try of NeuralChat server with [NeuralChat Server Command Line](./server/README.md).
+
+
+**Start Server**
+- Command Line (Recommended)
+```shell
+neuralchat_server start --config_file ./server/config/neuralchat.yaml
+```
+
+- Python API
+```python
+from neural_chat import NeuralChatServerExecutor
+server_executor = NeuralChatServerExecutor()
+server_executor(config_file="./server/config/neuralchat.yaml", log_file="./log/neuralchat.log")
+```
+
+**Access Text Chat Service**
+
+- Command Line
+```shell
+neuralchat_client textchat --server_ip 127.0.0.1 --port 8000 --query "Tell me about Intel Xeon Scalable Processors."
+```
+
+- Python API
+```python
+from neural_chat import TextChatClientExecutor
+executor = TextChatClientExecutor()
+result = executor(
+    prompt="Tell me about Intel Xeon Scalable Processors.",
+    server_ip="127.0.0.1",
+    port=8000)
+print(result.text)
+```
+
+- Curl with Restful API
+```shell
+curl -X POST -H "Content-Type: application/json" -d '{"prompt": "Tell me about Intel Xeon Scalable Processors."}' http://127.0.0.1:80/v1/chat/completions
+```
+
+**Access Voice Chat Service**
+
+```shell
+neuralchat_client voicechat --server_ip 127.0.0.1 --port 8000 --audio_input_path ./assets/audio/pat.wav --audio_output_path response.wav
+```
+
+**Access Finetune Service**
+```shell
+neuralchat_client finetune --server_ip 127.0.0.1 --port 8000 --model_name_or_path "facebook/opt-125m" --train_file "/path/to/finetune/dataset.json"
+```
+
diff --git a/neural_chat/__init__.py b/neural_chat/__init__.py
@@ -0,0 +1,27 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+#
+# Copyright (c) 2023 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .config import PipelineConfig
+from .config import GenerationConfig
+from .config import FinetuningConfig
+from .config import OptimizationConfig
+from .chatbot import build_chatbot
+from .chatbot import finetune_model
+from .chatbot import optimize_model
+from .server.neuralchat_server import NeuralChatServerExecutor
+from .server.neuralchat_client import TextChatClientExecutor, VoiceChatClientExecutor, FinetuingClientExecutor
+
diff --git a/neural_chat/assets/audio/pat.wav b/neural_chat/assets/audio/pat.wav
diff --git a/neural_chat/assets/audio/welcome.wav b/neural_chat/assets/audio/welcome.wav