Skip to content

Commit

Permalink
[Neuralchat] Support RAG with HF endpoint & add notebook (#1334)
Browse files Browse the repository at this point in the history
* support rag with hf endpoint & add notebook

Signed-off-by: LetongHan <letong.han@intel.com>
  • Loading branch information
letonghan committed Mar 3, 2024
1 parent a6c05b9 commit 34b3e9b
Show file tree
Hide file tree
Showing 5 changed files with 409 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Welcome to use Jupyter Notebooks to explore how to build and customize chatbots
| 1.4 | Building Chatbot on Habana Gaudi1/Gaudi2 | Learn how to create chatbot on Habana Gaudi1/Gaudi2 | [Notebook](./notebooks/build_chatbot_on_habana_gaudi.ipynb) |
| 1.5 | Building Chatbot on Nvidia A100 | Learn how to create chatbot on Nvidia A100 | [Notebook](./notebooks/build_chatbot_on_nv_a100.ipynb) |
| 1.6 | Building Chatbot on Intel CPU Windows PC | Learn how to create chatbot on Windows PC | [Notebook](./notebooks/build_talkingbot_on_pc.ipynb) |
| 1.7 | Building Chatbot with RAG | Learn how to create chatbot with RAG | [Notebook](./notebooks/build_chatbot_with_rag.ipynb) |
| 2 | Deploying Chatbots | | |
| 2.1 | Deploying Chatbot on Intel CPU ICX | Learn how to deploy chatbot on ICX | [Notebook](./notebooks/deploy_chatbot_on_icx.ipynb) |
| 2.2 | Deploying Chatbot on Intel CPU SPR | Learn how to deploy chatbot on SPR | [Notebook](./notebooks/deploy_chatbot_on_spr.ipynb) |
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,341 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"NeuralChat is a customizable chat framework designed to create user own chatbot within few minutes on multiple architectures. This notebook is used to demonstrate how to build a RAG application on 4th Generation of Intel® Xeon® Scalable Processors Sapphire Rapids and Habana's Gaudi processors(HPU)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Prepare Environment"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Install intel extension for transformers:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install intel-extension-for-transformers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Install Requirements:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!git clone https://github.com/intel/intel-extension-for-transformers.git"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%cd ./intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/\n",
"!pip install -r requirements.txt\n",
"%cd ../../../"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!conda list"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Consume RAG via NeuralChat Model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Consume RAG with Python API"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"User could leverage NeuralChat Retrieval plugin to do domain specific chat by feding with some documents like below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%cd ./intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/pipeline/plugins/retrieval/\n",
"!pip install -r requirements.txt\n",
"%cd ../../../../../../"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!mkdir docs\n",
"%cd docs\n",
"!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/docs/sample.jsonl\n",
"!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/docs/sample.txt\n",
"!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/docs/sample.xlsx\n",
"%cd .."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from intel_extension_for_transformers.neural_chat import PipelineConfig\n",
"from intel_extension_for_transformers.neural_chat import build_chatbot\n",
"from intel_extension_for_transformers.neural_chat import plugins\n",
"plugins.retrieval.enable=True\n",
"plugins.retrieval.args[\"input_path\"]=\"./docs/\"\n",
"config = PipelineConfig(plugins=plugins)\n",
"chatbot = build_chatbot(config)\n",
"response = chatbot.predict(\"How many cores does the Intel® Xeon® Platinum 8480+ Processor have in total?\")\n",
"print(response)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Consume RAG with HTTP Restfup API"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"User should start `neuralchat_server` to consume HTTP Restful APIs with the command below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "powershell"
}
},
"outputs": [],
"source": [
"cp -r ./docs /intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/examples/deployment/rag/docs\n",
"cd /intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/examples/deployment/rag\n",
"python askdock.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Neuralchat support HTTP Restful API with openai-protocol. Users can consume RAG HTTP Restful API with cURL command like this:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "powershell"
}
},
"outputs": [],
"source": [
"curl -X POST localhost:8000/v1/chat/completions -H 'Content-Type: application/json' \\\n",
" -d {\"model\": \"Intel/neural-chat-7b-v3-1\", \\\n",
" \"messages\": \"How many cores does the Intel® Xeon® Platinum 8480+ Processor have in total?\",}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Consume RAG with TGI service"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this part, we support two scenarios: run services on SPR / on Habana Gaudi. Before consuming RAG with TGI service, user need to launch a TGI service locally/remotely like below."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Launch TGI service on SPR"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "powershell"
}
},
"outputs": [],
"source": [
"model=Intel/neural-chat-7b-v3-1\n",
"volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run\n",
" \n",
"docker run --shm-size 1g -p 8080:80 -v $volume:/data -e https_proxy -e http_proxy -e HTTPS_PROXY -e HTTP_PROXY -e no_proxy -e NO_PROXY ghcr.io/huggingface/text-generation-inference:1.4 --model-id $model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Launch TGI service on Habana Gaudi"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For Habana Gaudi, you need to build a TGI docker image on your server first. Then start TGI service using this gaudi-docker."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "powershell"
}
},
"outputs": [],
"source": [
"git clone https://github.com/huggingface/tgi-gaudi.git\n",
"cd tgi-gaudi\n",
"docker build -t tgi_gaudi ."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "powershell"
}
},
"outputs": [],
"source": [
"model=Intel/neural-chat-7b-v3-1\n",
"volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run\n",
"tgi_habana_visible_devices=${your_visible_habana_devices} # define your visible habana devices for TGI service, such as `all`, `0,1`\n",
"tgi_sharded=true # boolean, whether to do sharding on more than one card\n",
"tgi_num_shard=2 # integer, between 1 and the number of your physical gaudi cards(usually 8)\n",
"\n",
"docker run -p 8080:80 --name tgi_service_gaudi -v $volume:/data -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true --runtime=habana -e HABANA_VISIBLE_DEVICES=$tgi_habana_visible_devices -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model --sharded $tgi_sharded --num-shard $tgi_num_shard"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Consume RAG service"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When TGI service is ready, you could leverage the endpoint of TGI to construct a Neuralchat chatbot. Please follow this link [huggingface token](https://huggingface.co/docs/hub/security-tokens) to get the access token and export your Huggingface API token."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "powershell"
}
},
"outputs": [],
"source": [
"export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from intel_extension_for_transformers.neural_chat import PipelineConfig\n",
"from intel_extension_for_transformers.neural_chat import build_chatbot\n",
"from intel_extension_for_transformers.neural_chat import plugins\n",
"\n",
"plugins.retrieval.enable=True\n",
"plugins.retrieval.args[\"input_path\"]=\"./docs/\"\n",
"config = PipelineConfig(\n",
" model_name_or_path=\"Intel/neural-chat-7b-v3-1\", \n",
" plugins=plugins, \n",
" hf_endpoint_url=\"http://localhost:8080/\", \n",
" hf_access_token=os.getenv(\"HUGGINGFACEHUB_API_TOKEN\", \"\")\n",
")\n",
"chatbot = build_chatbot(config)\n",
"response = chatbot.predict(\"How many cores does the Intel® Xeon® Platinum 8480+ Processor have in total?\")\n",
"print(response)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

0 comments on commit 34b3e9b

Please sign in to comment.