Skip to content

Commit 02412e7

Browse files
authored
GMC: Add a CR for switch mode on one NV GPU card (#412)
One TGI LLM microservice may take most of the GPU memory. Then another LLM may fail to launch when GPU memory is not enough if using GMC switch mode. TGI offers an environment parameter "--cuda-memory-fraction 0.5". It allows users to limit the amount of memory used by TGI. See: huggingface/text-generation-inference#673 This commit adds a CR for GMC switch if only having one NV GPU card. Signed-off-by: PeterYang12 <yuhan.yang@intel.com>
1 parent 9b38302 commit 02412e7

File tree

1 file changed

+126
-0
lines changed

1 file changed

+126
-0
lines changed
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
apiVersion: gmc.opea.io/v1alpha3
5+
kind: GMConnector
6+
metadata:
7+
labels:
8+
app.kubernetes.io/name: gmconnector
9+
app.kubernetes.io/managed-by: kustomize
10+
gmc/platform: nvidia
11+
name: switch
12+
namespace: switch
13+
spec:
14+
routerConfig:
15+
name: router
16+
serviceName: router-service
17+
nodes:
18+
root:
19+
routerType: Sequence
20+
steps:
21+
- name: Embedding
22+
nodeName: node1
23+
- name: Reranking
24+
data: $response
25+
internalService:
26+
serviceName: reranking-svc
27+
config:
28+
endpoint: /v1/reranking
29+
TEI_RERANKING_ENDPOINT: tei-reranking-svc
30+
- name: TeiReranking
31+
internalService:
32+
serviceName: tei-reranking-svc
33+
config:
34+
endpoint: /rerank
35+
isDownstreamService: true
36+
- name: Llm
37+
data: $response
38+
nodeName: node2
39+
node1:
40+
routerType: Switch
41+
steps:
42+
- name: Embedding
43+
condition: embedding-model-id==large
44+
internalService:
45+
serviceName: embedding-svc-large
46+
config:
47+
endpoint: /v1/embeddings
48+
TEI_EMBEDDING_ENDPOINT: tei-embedding-svc-bge15
49+
- name: Embedding
50+
condition: embedding-model-id==small
51+
internalService:
52+
serviceName: embedding-svc-small
53+
config:
54+
endpoint: /v1/embeddings
55+
TEI_EMBEDDING_ENDPOINT: tei-embedding-svc-bge-small
56+
- name: TeiEmbedding
57+
internalService:
58+
serviceName: tei-embedding-svc-bge15
59+
config:
60+
MODEL_ID: BAAI/bge-base-en-v1.5
61+
isDownstreamService: true
62+
- name: TeiEmbedding
63+
internalService:
64+
serviceName: tei-embedding-svc-bge-small
65+
config:
66+
MODEL_ID: BAAI/bge-base-en-v1.5
67+
isDownstreamService: true
68+
- name: Retriever
69+
condition: embedding-model-id==large
70+
data: $response
71+
internalService:
72+
serviceName: retriever-svc-large
73+
config:
74+
endpoint: /v1/retrieval
75+
REDIS_URL: redis-vector-db-large
76+
TEI_EMBEDDING_ENDPOINT: tei-embedding-svc-bge15
77+
- name: Retriever
78+
condition: embedding-model-id==small
79+
data: $response
80+
internalService:
81+
serviceName: retriever-svc-small
82+
config:
83+
endpoint: /v1/retrieval
84+
REDIS_URL: redis-vector-db-small
85+
TEI_EMBEDDING_ENDPOINT: tei-embedding-svc-bge-small
86+
- name: VectorDB
87+
internalService:
88+
serviceName: redis-vector-db-large
89+
isDownstreamService: true
90+
- name: VectorDB
91+
internalService:
92+
serviceName: redis-vector-db-small
93+
isDownstreamService: true
94+
node2:
95+
routerType: Switch
96+
steps:
97+
- name: Llm
98+
condition: model-id==intel
99+
internalService:
100+
serviceName: llm-svc-intel
101+
config:
102+
endpoint: /v1/chat/completions
103+
TGI_LLM_ENDPOINT: tgi-service-intel
104+
- name: Llm
105+
condition: model-id==llama
106+
internalService:
107+
serviceName: llm-svc-llama
108+
config:
109+
endpoint: /v1/chat/completions
110+
TGI_LLM_ENDPOINT: tgi-service-llama
111+
- name: TgiNvidia
112+
internalService:
113+
serviceName: tgi-service-intel
114+
config:
115+
endpoint: /generate
116+
MODEL_ID: Intel/neural-chat-7b-v3-3
117+
CUDA_MEMORY_FRACTION: "0.5"
118+
isDownstreamService: true
119+
- name: TgiNvidia
120+
internalService:
121+
serviceName: tgi-service-llama
122+
config:
123+
endpoint: /generate
124+
MODEL_ID: bigscience/bloom-560m
125+
CUDA_MEMORY_FRACTION: "0.5"
126+
isDownstreamService: true

0 commit comments

Comments
 (0)