-
Notifications
You must be signed in to change notification settings - Fork 192
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Neuralchat] Support RAG with HF endpoint & add notebook (#1334)
* support rag with hf endpoint & add notebook Signed-off-by: LetongHan <letong.han@intel.com>
- Loading branch information
Showing
5 changed files
with
409 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
341 changes: 341 additions & 0 deletions
341
intel_extension_for_transformers/neural_chat/docs/notebooks/build_chatbot_with_rag.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,341 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"NeuralChat is a customizable chat framework designed to create user own chatbot within few minutes on multiple architectures. This notebook is used to demonstrate how to build a RAG application on 4th Generation of Intel® Xeon® Scalable Processors Sapphire Rapids and Habana's Gaudi processors(HPU)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Prepare Environment" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Install intel extension for transformers:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!pip install intel-extension-for-transformers" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Install Requirements:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!git clone https://github.com/intel/intel-extension-for-transformers.git" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%cd ./intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/\n", | ||
"!pip install -r requirements.txt\n", | ||
"%cd ../../../" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!conda list" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Consume RAG via NeuralChat Model" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Consume RAG with Python API" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"User could leverage NeuralChat Retrieval plugin to do domain specific chat by feding with some documents like below:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%cd ./intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/pipeline/plugins/retrieval/\n", | ||
"!pip install -r requirements.txt\n", | ||
"%cd ../../../../../../" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!mkdir docs\n", | ||
"%cd docs\n", | ||
"!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/docs/sample.jsonl\n", | ||
"!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/docs/sample.txt\n", | ||
"!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/docs/sample.xlsx\n", | ||
"%cd .." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from intel_extension_for_transformers.neural_chat import PipelineConfig\n", | ||
"from intel_extension_for_transformers.neural_chat import build_chatbot\n", | ||
"from intel_extension_for_transformers.neural_chat import plugins\n", | ||
"plugins.retrieval.enable=True\n", | ||
"plugins.retrieval.args[\"input_path\"]=\"./docs/\"\n", | ||
"config = PipelineConfig(plugins=plugins)\n", | ||
"chatbot = build_chatbot(config)\n", | ||
"response = chatbot.predict(\"How many cores does the Intel® Xeon® Platinum 8480+ Processor have in total?\")\n", | ||
"print(response)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Consume RAG with HTTP Restfup API" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"User should start `neuralchat_server` to consume HTTP Restful APIs with the command below." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"vscode": { | ||
"languageId": "powershell" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"cp -r ./docs /intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/examples/deployment/rag/docs\n", | ||
"cd /intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/examples/deployment/rag\n", | ||
"python askdock.py" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Neuralchat support HTTP Restful API with openai-protocol. Users can consume RAG HTTP Restful API with cURL command like this:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"vscode": { | ||
"languageId": "powershell" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"curl -X POST localhost:8000/v1/chat/completions -H 'Content-Type: application/json' \\\n", | ||
" -d {\"model\": \"Intel/neural-chat-7b-v3-1\", \\\n", | ||
" \"messages\": \"How many cores does the Intel® Xeon® Platinum 8480+ Processor have in total?\",}" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Consume RAG with TGI service" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"In this part, we support two scenarios: run services on SPR / on Habana Gaudi. Before consuming RAG with TGI service, user need to launch a TGI service locally/remotely like below." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Launch TGI service on SPR" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"vscode": { | ||
"languageId": "powershell" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"model=Intel/neural-chat-7b-v3-1\n", | ||
"volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run\n", | ||
" \n", | ||
"docker run --shm-size 1g -p 8080:80 -v $volume:/data -e https_proxy -e http_proxy -e HTTPS_PROXY -e HTTP_PROXY -e no_proxy -e NO_PROXY ghcr.io/huggingface/text-generation-inference:1.4 --model-id $model" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Launch TGI service on Habana Gaudi" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"For Habana Gaudi, you need to build a TGI docker image on your server first. Then start TGI service using this gaudi-docker." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"vscode": { | ||
"languageId": "powershell" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"git clone https://github.com/huggingface/tgi-gaudi.git\n", | ||
"cd tgi-gaudi\n", | ||
"docker build -t tgi_gaudi ." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"vscode": { | ||
"languageId": "powershell" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"model=Intel/neural-chat-7b-v3-1\n", | ||
"volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run\n", | ||
"tgi_habana_visible_devices=${your_visible_habana_devices} # define your visible habana devices for TGI service, such as `all`, `0,1`\n", | ||
"tgi_sharded=true # boolean, whether to do sharding on more than one card\n", | ||
"tgi_num_shard=2 # integer, between 1 and the number of your physical gaudi cards(usually 8)\n", | ||
"\n", | ||
"docker run -p 8080:80 --name tgi_service_gaudi -v $volume:/data -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true --runtime=habana -e HABANA_VISIBLE_DEVICES=$tgi_habana_visible_devices -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model --sharded $tgi_sharded --num-shard $tgi_num_shard" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Consume RAG service" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"When TGI service is ready, you could leverage the endpoint of TGI to construct a Neuralchat chatbot. Please follow this link [huggingface token](https://huggingface.co/docs/hub/security-tokens) to get the access token and export your Huggingface API token." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"vscode": { | ||
"languageId": "powershell" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"from intel_extension_for_transformers.neural_chat import PipelineConfig\n", | ||
"from intel_extension_for_transformers.neural_chat import build_chatbot\n", | ||
"from intel_extension_for_transformers.neural_chat import plugins\n", | ||
"\n", | ||
"plugins.retrieval.enable=True\n", | ||
"plugins.retrieval.args[\"input_path\"]=\"./docs/\"\n", | ||
"config = PipelineConfig(\n", | ||
" model_name_or_path=\"Intel/neural-chat-7b-v3-1\", \n", | ||
" plugins=plugins, \n", | ||
" hf_endpoint_url=\"http://localhost:8080/\", \n", | ||
" hf_access_token=os.getenv(\"HUGGINGFACEHUB_API_TOKEN\", \"\")\n", | ||
")\n", | ||
"chatbot = build_chatbot(config)\n", | ||
"response = chatbot.predict(\"How many cores does the Intel® Xeon® Platinum 8480+ Processor have in total?\")\n", | ||
"print(response)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.13" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 4 | ||
} |
Oops, something went wrong.