[NeuralChat] add example to use RAG+OpenAI LLM (#1347)

* add example to use RAG+OpenAI LLM
intel · Mar 7, 2024 · 3c59590 · 3c59590
1 parent afccd2d
commit 3c59590
Show file tree

Hide file tree

Showing 5 changed files with 234 additions and 0 deletions.
diff --git a/...xtension_for_transformers/neural_chat/examples/deployment/chatgpt_rag/README.md b/...xtension_for_transformers/neural_chat/examples/deployment/chatgpt_rag/README.md
@@ -0,0 +1,89 @@
+This example guides you through setting up the backend for a text chatbot using the NeuralChat framework and OpenAI LLM models such as `gpt-3.5-turbo` or `gpt-4`.
+Also, this example shows you how to feed your own corpus to RAG (Retrieval Augmented Generation) with NeuralChat retrieval plugin.
+
+# Setup Conda
+
+First, you need to install and configure the Conda environment:
+
+```shell
+# Download and install Miniconda
+wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
+bash Miniconda*.sh
+source ~/.bashrc
+```
+
+# Install numactl
+
+Next, install the numactl library:
+
+```shell
+sudo apt install numactl
+```
+
+# Install Python dependencies
+
+Install the following Python dependencies using Conda:
+
+```shell
+conda install astunparse ninja pyyaml mkl mkl-include setuptools cmake cffi typing_extensions future six requests dataclasses -y
+conda install jemalloc gperftools -c conda-forge -y
+```
+
+Install other dependencies using pip
+
+>**Note**: Please make sure transformers version is 4.34.1
+```bash
+pip install -r ../../../requirements.txt
+pip install transformers==4.34.1
+```
+
+# Configure YAML
+
+You can customize the configuration file 'askdoc.yaml' to match your environment setup. Here's a table to help you understand the configurable options:
+
+|  Item                             | Value                                  |
+| --------------------------------- | ---------------------------------------|
+| host                              | 127.0.0.1                              |
+| port                              | 8000                                   |
+| model_name_or_path                | "gpt-3.5-turbo"                 |
+| device                            | "auto"                                 |
+| retrieval.enable                  | true                                   |
+| retrieval.args.input_path         | "./docs"                               |
+| retrieval.args.persist_dir        | "./example_persist"                    |
+| retrieval.args.response_template  | "We cannot find suitable content to answer your query, please contact to find help."    |
+| retrieval.args.append             | True        |
+| tasks_list                        | ['textchat', 'retrieval']              |
+
+
+# Configure OpenAI keys
+
+Set your `OPENAI_API_KEY` and `OPENAI_ORG` environment variables (if applicable) for using OpenAI models.
+
+```
+export OPENAI_API_KEY=xxx
+export OPENAI_ORG=xxx
+```
+
+# Run the TextChat server
+To start the TextChat server, use the following command:
+
+```shell
+nohup bash run.sh &
+```
+
+# Test the TextChat server
+
+curl http://localhost:8000/v1/chat/completions     -H "Content-Type: application/json"     -d '{
+    "model": "Intel/neural-chat-7b-v3-1",
+    "messages": [
+        {
+            "role": "system",
+            "content": "You are a helpful assistant."
+        },
+        {
+            "role": "user",
+            "content": "What are the key features of the Intel® oneAPI DPC++/C++ Compiler?"
+        }
+    ],
+    "max_tokens": 20
+}'
diff --git a/..._extension_for_transformers/neural_chat/examples/deployment/chatgpt_rag/docs/test_doc.txt b/..._extension_for_transformers/neural_chat/examples/deployment/chatgpt_rag/docs/test_doc.txt
@@ -0,0 +1,13 @@
+This guide provides information about the Intel® oneAPI DPC++/C++ Compiler and runtime environment. This document is valid for version 2024.0 of the compilers.
+
+The Intel® oneAPI DPC++/C++ Compiler is available as part of the Intel® oneAPI Base Toolkit, Intel® oneAPI HPC Toolkit, Intel® oneAPI IoT Toolkit, or as a standalone compiler.
+
+Refer to the Intel® oneAPI DPC++/C++ Compiler product page and the Release Notes for more information about features, specifications, and downloads.
+
+
+The compiler supports these key features:
+Intel® oneAPI Level Zero: The Intel® oneAPI Level Zero (Level Zero) Application Programming Interface (API) provides direct-to-metal interfaces to offload accelerator devices.
+OpenMP* Support: Compiler support for OpenMP 5.0 Version TR4 features and some OpenMP Version 5.1 features.
+Pragmas: Information about directives to provide the compiler with instructions for specific tasks, including splitting large loops into smaller ones, enabling or disabling optimization for code, or offloading computation to the target.
+Offload Support: Information about SYCL*, OpenMP, and parallel processing options you can use to affect optimization, code generation, and more.
+Latest Standards: Use the latest standards including C++ 20, SYCL, and OpenMP 5.0 and 5.1 for GPU offload.
diff --git a/intel_extension_for_transformers/neural_chat/examples/deployment/chatgpt_rag/run.sh b/intel_extension_for_transformers/neural_chat/examples/deployment/chatgpt_rag/run.sh
@@ -0,0 +1,33 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+#
+# Copyright (c) 2023 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Kill the exist and re-run
+ps -ef |grep 'run_chatgpt_rag' |awk '{print $2}' |xargs kill -9
+
+# KMP
+export KMP_BLOCKTIME=1
+export KMP_SETTINGS=1
+export KMP_AFFINITY=granularity=fine,compact,1,0
+
+# OMP
+export OMP_NUM_THREADS=56
+export LD_PRELOAD=${CONDA_PREFIX}/lib/libiomp5.so
+
+# tc malloc
+export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
+
+numactl -l -C 0-55 python -m run_chatgpt_rag 2>&1 | tee run.log
diff --git a/...extension_for_transformers/neural_chat/examples/deployment/chatgpt_rag/run_chatgpt_rag.py b/...extension_for_transformers/neural_chat/examples/deployment/chatgpt_rag/run_chatgpt_rag.py
@@ -0,0 +1,26 @@
+# !/usr/bin/env python
+# -*- coding: utf-8 -*-
+#
+# Copyright (c) 2023 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from intel_extension_for_transformers.neural_chat import NeuralChatServerExecutor
+
+def main():
+    server_executor = NeuralChatServerExecutor()
+    server_executor(config_file="./textbot.yaml", log_file="./textbot.log")
+
+if __name__ == "__main__":
+    main()
diff --git a/intel_extension_for_transformers/neural_chat/examples/deployment/chatgpt_rag/textbot.yaml b/intel_extension_for_transformers/neural_chat/examples/deployment/chatgpt_rag/textbot.yaml
@@ -0,0 +1,73 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+#
+# Copyright (c) 2023 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# This is the parameter configuration file for NeuralChat Serving.
+
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8021
+
+model_name_or_path: "gpt-3.5-turbo"
+device: "auto"
+
+# users can choose one of the ipex int8, itrex int4, mix precision and
+# bitsandbytes to run the optimized model for inference speedup
+
+# itrex int4 llm runtime optimization
+# optimization:
+#     use_neural_speed: true
+#     optimization_type: "weight_only"
+#     compute_dtype: "int8"
+#     weight_dtype: "int4"
+#     use_cached_bin: true
+
+# ipex int8 optimization
+# optimization:
+#     ipex_int8: True
+
+# itrex int4 optimization
+# optimization:
+#     use_neural_speed: false
+#     optimization_type: "weight_only"
+#     compute_dtype: "int8"
+#     weight_dtype: "int4_fullrange"
+
+# mix precision bf16
+# optimization:
+#     optimization_type: "mix_precision"
+#     mix_precision_dtype: "bfloat16"
+
+# bits and bytes
+# optimization:
+#     optimization_type: "bits_and_bytes"
+#     load_in_4bit: True
+#     bnb_4bit_quant_type: 'nf4'
+#     bnb_4bit_use_double_quant: True
+#     bnb_4bit_compute_dtype: "bfloat16"
+
+retrieval:
+    enable: true
+    args:
+        input_path: "./docs"
+        persist_directory: "./docs_persist"
+        response_template: "We cannot find suitable content to answer your query at this moment."
+        append: True
+
+# task choices = ['textchat', 'voicechat', 'retrieval', 'text2image', 'finetune']
+tasks_list: ['textchat', 'retrieval']