Skip to content

Commit

Permalink
[NeuralChat] add example to use RAG+OpenAI LLM (#1347)
Browse files Browse the repository at this point in the history
* add example to use RAG+OpenAI LLM
  • Loading branch information
Spycsh committed Mar 7, 2024
1 parent afccd2d commit 3c59590
Show file tree
Hide file tree
Showing 5 changed files with 234 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
This example guides you through setting up the backend for a text chatbot using the NeuralChat framework and OpenAI LLM models such as `gpt-3.5-turbo` or `gpt-4`.
Also, this example shows you how to feed your own corpus to RAG (Retrieval Augmented Generation) with NeuralChat retrieval plugin.

# Setup Conda

First, you need to install and configure the Conda environment:

```shell
# Download and install Miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda*.sh
source ~/.bashrc
```

# Install numactl

Next, install the numactl library:

```shell
sudo apt install numactl
```

# Install Python dependencies

Install the following Python dependencies using Conda:

```shell
conda install astunparse ninja pyyaml mkl mkl-include setuptools cmake cffi typing_extensions future six requests dataclasses -y
conda install jemalloc gperftools -c conda-forge -y
```

Install other dependencies using pip

>**Note**: Please make sure transformers version is 4.34.1
```bash
pip install -r ../../../requirements.txt
pip install transformers==4.34.1
```

# Configure YAML

You can customize the configuration file 'askdoc.yaml' to match your environment setup. Here's a table to help you understand the configurable options:

| Item | Value |
| --------------------------------- | ---------------------------------------|
| host | 127.0.0.1 |
| port | 8000 |
| model_name_or_path | "gpt-3.5-turbo" |
| device | "auto" |
| retrieval.enable | true |
| retrieval.args.input_path | "./docs" |
| retrieval.args.persist_dir | "./example_persist" |
| retrieval.args.response_template | "We cannot find suitable content to answer your query, please contact to find help." |
| retrieval.args.append | True |
| tasks_list | ['textchat', 'retrieval'] |


# Configure OpenAI keys

Set your `OPENAI_API_KEY` and `OPENAI_ORG` environment variables (if applicable) for using OpenAI models.

```
export OPENAI_API_KEY=xxx
export OPENAI_ORG=xxx
```

# Run the TextChat server
To start the TextChat server, use the following command:

```shell
nohup bash run.sh &
```

# Test the TextChat server

curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "Intel/neural-chat-7b-v3-1",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What are the key features of the Intel® oneAPI DPC++/C++ Compiler?"
}
],
"max_tokens": 20
}'
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
This guide provides information about the Intel® oneAPI DPC++/C++ Compiler and runtime environment. This document is valid for version 2024.0 of the compilers.

The Intel® oneAPI DPC++/C++ Compiler is available as part of the Intel® oneAPI Base Toolkit, Intel® oneAPI HPC Toolkit, Intel® oneAPI IoT Toolkit, or as a standalone compiler.

Refer to the Intel® oneAPI DPC++/C++ Compiler product page and the Release Notes for more information about features, specifications, and downloads.


The compiler supports these key features:
Intel® oneAPI Level Zero: The Intel® oneAPI Level Zero (Level Zero) Application Programming Interface (API) provides direct-to-metal interfaces to offload accelerator devices.
OpenMP* Support: Compiler support for OpenMP 5.0 Version TR4 features and some OpenMP Version 5.1 features.
Pragmas: Information about directives to provide the compiler with instructions for specific tasks, including splitting large loops into smaller ones, enabling or disabling optimization for code, or offloading computation to the target.
Offload Support: Information about SYCL*, OpenMP, and parallel processing options you can use to affect optimization, code generation, and more.
Latest Standards: Use the latest standards including C++ 20, SYCL, and OpenMP 5.0 and 5.1 for GPU offload.
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Copyright (c) 2023 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Kill the exist and re-run
ps -ef |grep 'run_chatgpt_rag' |awk '{print $2}' |xargs kill -9

# KMP
export KMP_BLOCKTIME=1
export KMP_SETTINGS=1
export KMP_AFFINITY=granularity=fine,compact,1,0

# OMP
export OMP_NUM_THREADS=56
export LD_PRELOAD=${CONDA_PREFIX}/lib/libiomp5.so

# tc malloc
export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so

numactl -l -C 0-55 python -m run_chatgpt_rag 2>&1 | tee run.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# !/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Copyright (c) 2023 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


from intel_extension_for_transformers.neural_chat import NeuralChatServerExecutor

def main():
server_executor = NeuralChatServerExecutor()
server_executor(config_file="./textbot.yaml", log_file="./textbot.log")

if __name__ == "__main__":
main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Copyright (c) 2023 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# This is the parameter configuration file for NeuralChat Serving.

#################################################################################
# SERVER SETTING #
#################################################################################
host: 0.0.0.0
port: 8021

model_name_or_path: "gpt-3.5-turbo"
device: "auto"

# users can choose one of the ipex int8, itrex int4, mix precision and
# bitsandbytes to run the optimized model for inference speedup

# itrex int4 llm runtime optimization
# optimization:
# use_neural_speed: true
# optimization_type: "weight_only"
# compute_dtype: "int8"
# weight_dtype: "int4"
# use_cached_bin: true

# ipex int8 optimization
# optimization:
# ipex_int8: True

# itrex int4 optimization
# optimization:
# use_neural_speed: false
# optimization_type: "weight_only"
# compute_dtype: "int8"
# weight_dtype: "int4_fullrange"

# mix precision bf16
# optimization:
# optimization_type: "mix_precision"
# mix_precision_dtype: "bfloat16"

# bits and bytes
# optimization:
# optimization_type: "bits_and_bytes"
# load_in_4bit: True
# bnb_4bit_quant_type: 'nf4'
# bnb_4bit_use_double_quant: True
# bnb_4bit_compute_dtype: "bfloat16"

retrieval:
enable: true
args:
input_path: "./docs"
persist_directory: "./docs_persist"
response_template: "We cannot find suitable content to answer your query at this moment."
append: True

# task choices = ['textchat', 'voicechat', 'retrieval', 'text2image', 'finetune']
tasks_list: ['textchat', 'retrieval']

0 comments on commit 3c59590

Please sign in to comment.