-
Notifications
You must be signed in to change notification settings - Fork 156
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refine LLM Native Microservice (#477)
* refine test script of embedding llama_index Signed-off-by: letonghan <letong.han@intel.com> * update Signed-off-by: letonghan <letong.han@intel.com> * update code of llm-native Signed-off-by: letonghan <letong.han@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix llm native issue Signed-off-by: letonghan <letong.han@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * udpate param & add readme Signed-off-by: letonghan <letong.han@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: letonghan <letong.han@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
- Loading branch information
1 parent
cf15b91
commit b16b14a
Showing
12 changed files
with
883 additions
and
293 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# LLM Native Microservice | ||
|
||
LLM Native microservice uses [optimum-habana](https://github.com/huggingface/optimum-habana) for model initialization and warm-up, focusing solely on large language models (LLMs). It operates without frameworks like TGI/VLLM, using PyTorch directly for inference, and supports only non-streaming formats. This streamlined approach optimizes performance on Habana hardware. | ||
|
||
## 🚀1. Start Microservice | ||
|
||
If you start an LLM microservice with docker, the `docker_compose_llm.yaml` file will automatically start a Native LLM service with docker. | ||
|
||
### 1.1 Setup Environment Variables | ||
|
||
In order to start Native LLM service, you need to setup the following environment variables first. | ||
|
||
```bash | ||
export LLM_NATIVE_MODEL="Qwen/Qwen2-7B-Instruct" | ||
``` | ||
|
||
### 1.2 Build Docker Image | ||
|
||
```bash | ||
cd ../../../../ | ||
docker build -t opea/llm-native:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/native/docker/Dockerfile . | ||
``` | ||
|
||
To start a docker container, you have two options: | ||
|
||
- A. Run Docker with CLI | ||
- B. Run Docker with Docker Compose | ||
|
||
You can choose one as needed. | ||
|
||
### 1.3 Run Docker with CLI (Option A) | ||
|
||
```bash | ||
docker run -d --runtime=habana --name="llm-native-server" -p 9000:9000 -e https_proxy=$https_proxy -e http_proxy=$http_proxy -e TOKENIZERS_PARALLELISM=false -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e LLM_NATIVE_MODEL=${LLM_NATIVE_MODEL} opea/llm-native:latest | ||
``` | ||
|
||
### 1.4 Run Docker with Docker Compose (Option B) | ||
|
||
```bash | ||
cd docker | ||
docker compose -f docker_compose_llm.yaml up -d | ||
``` | ||
|
||
## 🚀2. Consume LLM Service | ||
|
||
### 2.1 Check Service Status | ||
|
||
```bash | ||
curl http://${your_ip}:9000/v1/health_check\ | ||
-X GET \ | ||
-H 'Content-Type: application/json' | ||
``` | ||
|
||
### 2.2 Consume LLM Service | ||
|
||
```bash | ||
curl http://${your_ip}:9000/v1/chat/completions\ | ||
-X POST \ | ||
-d '{"query":"What is Deep Learning?"}' \ | ||
-H 'Content-Type: application/json' | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
|
||
|
||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
# HABANA environment | ||
FROM vault.habana.ai/gaudi-docker/1.16.0/ubuntu22.04/habanalabs/pytorch-installer-2.2.2:latest as hpu | ||
|
||
ENV LANG=en_US.UTF-8 | ||
ARG REPO=https://github.com/huggingface/optimum-habana.git | ||
ARG REPO_VER=v1.12.1 | ||
|
||
RUN apt-get update && \ | ||
apt-get install git-lfs && \ | ||
git-lfs install && \ | ||
apt-get install -y --no-install-recommends --fix-missing \ | ||
libgl1-mesa-glx \ | ||
libjemalloc-dev \ | ||
vim | ||
|
||
RUN useradd -m -s /bin/bash user && \ | ||
mkdir -p /home/user && \ | ||
chown -R user /home/user/ | ||
|
||
USER user | ||
|
||
COPY comps /home/user/comps | ||
|
||
RUN pip install --upgrade-strategy eager optimum[habana] && \ | ||
pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.17.0 | ||
|
||
RUN git clone ${REPO} /home/user/optimum-habana && \ | ||
cd /home/user/optimum-habana && git checkout ${REPO_VER} && \ | ||
cd examples/text-generation && pip install -r requirements.txt && \ | ||
cd /home/user/comps/llms/text-generation/native && pip install -r requirements.txt && \ | ||
pip install --upgrade --force-reinstall pydantic | ||
|
||
ENV PYTHONPATH=/root:/home/user | ||
|
||
WORKDIR /home/user/comps/llms/text-generation/native | ||
|
||
ENTRYPOINT ["python", "llm.py"] |
28 changes: 28 additions & 0 deletions
28
comps/llms/text-generation/native/docker/docker_compose_llm.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
version: "3.8" | ||
|
||
services: | ||
llm: | ||
image: opea/llm-native:latest | ||
container_name: llm-native-server | ||
ports: | ||
- "9000:9000" | ||
runtime: habana | ||
cap_add: | ||
- SYS_NICE | ||
ipc: host | ||
environment: | ||
no_proxy: ${no_proxy} | ||
http_proxy: ${http_proxy} | ||
https_proxy: ${https_proxy} | ||
LLM_NATIVE_MODEL: ${LLM_NATIVE_MODEL} | ||
HABANA_VISIBLE_DEVICES: all | ||
OMPI_MCA_btl_vader_single_copy_mechanism: none | ||
TOKENIZERS_PARALLELISM: false | ||
restart: unless-stopped | ||
|
||
networks: | ||
default: | ||
driver: bridge |
Oops, something went wrong.