Skip to content

Commit 56f770c

Browse files
sgurunatchensuyue
andauthored
ChatQnA with Remote Inference Endpoints (Kubernetes) (#1149)
Signed-off-by: sgurunat <gurunath.s@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com>
1 parent 0cdeb94 commit 56f770c

File tree

3 files changed

+2437
-2
lines changed

3 files changed

+2437
-2
lines changed

ChatQnA/kubernetes/intel/README.md

Lines changed: 47 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
```
1616
cd GenAIExamples/ChatQnA/kubernetes/intel/cpu/xeon/manifest
1717
export HUGGINGFACEHUB_API_TOKEN="YourOwnToken"
18-
sed -i "s/insert-your-huggingface-token-here/${HUGGINGFACEHUB_API_TOKEN}/g" chatqna.yaml
18+
sed -i "s|insert-your-huggingface-token-here|${HUGGINGFACEHUB_API_TOKEN}|g" chatqna.yaml
1919
kubectl apply -f chatqna.yaml
2020
```
2121

@@ -35,10 +35,55 @@ kubectl apply -f chatqna_bf16.yaml
3535
```
3636
cd GenAIExamples/ChatQnA/kubernetes/intel/hpu/gaudi/manifest
3737
export HUGGINGFACEHUB_API_TOKEN="YourOwnToken"
38-
sed -i "s/insert-your-huggingface-token-here/${HUGGINGFACEHUB_API_TOKEN}/g" chatqna.yaml
38+
sed -i "s|insert-your-huggingface-token-here|${HUGGINGFACEHUB_API_TOKEN}|g" chatqna.yaml
3939
kubectl apply -f chatqna.yaml
4040
```
4141

42+
## Deploy on Xeon with Remote LLM Model
43+
44+
```
45+
cd GenAIExamples/ChatQnA/kubernetes/intel/cpu/xeon/manifest
46+
export HUGGINGFACEHUB_API_TOKEN="YourOwnToken"
47+
export vLLM_ENDPOINT="Your Remote Inference Endpoint"
48+
sed -i "s|insert-your-huggingface-token-here|${HUGGINGFACEHUB_API_TOKEN}|g" chatqna-remote-inference.yaml
49+
sed -i "s|insert-your-remote-inference-endpoint|${vLLM_ENDPOINT}|g" chatqna-remote-inference.yaml
50+
```
51+
52+
### Additional Steps for Remote Endpoints with Authentication (If No Authentication Skip This Step)
53+
54+
If your remote inference endpoint is protected with OAuth Client Credentials authentication, update CLIENTID, CLIENT_SECRET and TOKEN_URL with the correct values in "chatqna-llm-uservice-config" ConfigMap
55+
56+
57+
58+
### Deploy
59+
```
60+
kubectl apply -f chatqna-remote-inference.yaml
61+
```
62+
63+
## Deploy on Gaudi with TEI, Rerank, and vLLM Models Running Remotely
64+
65+
```
66+
cd GenAIExamples/ChatQnA/kubernetes/intel/hpu/gaudi/manifest
67+
export HUGGINGFACEHUB_API_TOKEN="YourOwnToken"
68+
export vLLM_ENDPOINT="Your Remote Inference Endpoint"
69+
export TEI_EMBEDDING_ENDPOINT="Your Remote TEI Embedding Endpoint"
70+
export TEI_RERANKING_ENDPOINT="Your Remote Reranking Endpoint"
71+
72+
sed -i "s|insert-your-huggingface-token-here|${HUGGINGFACEHUB_API_TOKEN}|g" chatqna-vllm-remote-inference.yaml
73+
sed -i "s|insert-your-remote-vllm-inference-endpoint|${vLLM_ENDPOINT}|g" chatqna-vllm-remote-inference.yaml
74+
sed -i "s|insert-your-remote-embedding-endpoint|${TEI_EMBEDDING_ENDPOINT}|g" chatqna-vllm-remote-inference.yaml
75+
sed -i "s|insert-your-remote-reranking-endpoint|${TEI_RERANKING_ENDPOINT}|g" chatqna-vllm-remote-inference.yaml
76+
```
77+
78+
### Additional Steps for Remote Endpoints with Authentication (If No Authentication Skip This Step)
79+
80+
If your remote inference endpoint is protected with OAuth Client Credentials authentication, update CLIENTID, CLIENT_SECRET and TOKEN_URL with the correct values in "chatqna-llm-uservice-config", "chatqna-data-prep-config", "chatqna-embedding-usvc-config", "chatqna-reranking-usvc-config", "chatqna-retriever-usvc-config" ConfigMaps
81+
82+
### Deploy
83+
```
84+
kubectl apply -f chatqna-vllm-remote-inference.yaml
85+
```
86+
4287
## Verify Services
4388

4489
To verify the installation, run the command `kubectl get pod` to make sure all pods are running.

0 commit comments

Comments
 (0)