Skip to content

Commit ba17031

Browse files
lkk12014402rooteero-t
authored
add tgi bf16 setup on CPU k8s. (#795)
Co-authored-by: root <root@idc708073.jf.intel.com> Co-authored-by: Eero Tamminen <eero.t.tamminen@intel.com>
1 parent f990f79 commit ba17031

File tree

2 files changed

+1478
-0
lines changed

2 files changed

+1478
-0
lines changed

ChatQnA/kubernetes/intel/README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,17 @@ sed -i "s/insert-your-huggingface-token-here/${HUGGINGFACEHUB_API_TOKEN}/g" chat
1717
kubectl apply -f chatqna.yaml
1818
```
1919

20+
Newer CPUs such as Intel Cooper Lake, Sapphire Rapids, support [`bfloat16` data type](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format). If you have such CPUs, and given model supports `bfloat16`, adding `--dtype bfloat16` argument for `huggingface/text-generation-inference` server halves its memory usage and speeds it a bit. To use it, run the following commands:
21+
22+
```
23+
# label your node for scheduling the service on it automatically
24+
kubectl label node 'your-node-name' node-type=node-bfloat16
25+
26+
# add `nodeSelector` for the `huggingface/text-generation-inference` server at `chatqna_bf16.yaml`
27+
# create
28+
kubectl apply -f chatqna_bf16.yaml
29+
```
30+
2031
## Deploy On Gaudi
2132

2233
```

0 commit comments

Comments
 (0)