Refine LLM Native Microservice (#477)

* refine test script of embedding llama_index Signed-off-by: letonghan <letong.han@intel.com> * update Signed-off-by: letonghan <letong.han@intel.com> * update code of llm-native Signed-off-by: letonghan <letong.han@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix llm native issue Signed-off-by: letonghan <letong.han@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * udpate param & add readme Signed-off-by: letonghan <letong.han@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: letonghan <letong.han@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
opea-project · Aug 16, 2024 · b16b14a · b16b14a
1 parent cf15b91
commit b16b14a
Show file tree

Hide file tree

Showing 12 changed files with 883 additions and 293 deletions.
diff --git a/comps/embeddings/README.md b/comps/embeddings/README.md
@@ -45,7 +45,7 @@ First, you need to start a TEI service.
 your_port=8090
 model="BAAI/bge-large-en-v1.5"
 revision="refs/pr/5"
-docker run -p $your_port:80 -v ./data:/data --name tei_server -e http_proxy=$http_proxy -e https_proxy=$https_proxy --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 --model-id $model --revision $revision
+docker run -p $your_port:80 -v ./data:/data --name tei_server -e http_proxy=$http_proxy -e https_proxy=$https_proxy --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 --model-id $model --revision $revision
 ```
 
 Then you need to test your TEI service using the following commands:
@@ -66,9 +66,6 @@ cd langchain
 cd llama_index
 export TEI_EMBEDDING_ENDPOINT="http://localhost:$yourport"
 export TEI_EMBEDDING_MODEL_NAME="BAAI/bge-large-en-v1.5"
-export LANGCHAIN_TRACING_V2=true
-export LANGCHAIN_API_KEY=${your_langchain_api_key}
-export LANGCHAIN_PROJECT="opea/gen-ai-comps:embeddings"
 python embedding_tei.py
 ```
 
@@ -92,7 +89,7 @@ First, you need to start a TEI service.
 your_port=8090
 model="BAAI/bge-large-en-v1.5"
 revision="refs/pr/5"
-docker run -p $your_port:80 -v ./data:/data --name tei_server -e http_proxy=$http_proxy -e https_proxy=$https_proxy --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 --model-id $model --revision $revision
+docker run -p $your_port:80 -v ./data:/data --name tei_server -e http_proxy=$http_proxy -e https_proxy=$https_proxy --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 --model-id $model --revision $revision
 ```
 
 Then you need to test your TEI service using the following commands:
@@ -124,13 +121,16 @@ docker build -t opea/embedding-tei:latest --build-arg https_proxy=$https_proxy -
 
 ```bash
 cd ../../
-docker build -t opea/embedding-tei:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/embeddings/llama_index/docker/Dockerfile .
+docker build -t opea/embedding-tei-llama-index:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/embeddings/llama_index/docker/Dockerfile .
 ```
 
 ## 2.3 Run Docker with CLI
 
 ```bash
+# run with langchain docker
 docker run -d --name="embedding-tei-server" -p 6000:6000 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e TEI_EMBEDDING_ENDPOINT=$TEI_EMBEDDING_ENDPOINT -e TEI_EMBEDDING_MODEL_NAME=$TEI_EMBEDDING_MODEL_NAME opea/embedding-tei:latest
+# run with llama-index docker
+docker run -d --name="embedding-tei-llama-index-server" -p 6000:6000 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e TEI_EMBEDDING_ENDPOINT=$TEI_EMBEDDING_ENDPOINT -e TEI_EMBEDDING_MODEL_NAME=$TEI_EMBEDDING_MODEL_NAME opea/embedding-tei-llama-index:latest
 ```
 
 ## 2.4 Run Docker with Docker Compose

diff --git a/comps/embeddings/llama_index/docker/docker_compose_embedding.yaml b/comps/embeddings/llama_index/docker/docker_compose_embedding.yaml
@@ -5,7 +5,7 @@ version: "3.8"
 
 services:
   embedding:
-    image: opea/embedding-tei:latest
+    image: opea/embedding-tei-llama-index:latest
     container_name: embedding-tei-server
     ports:
       - "6000:6000"
@@ -16,7 +16,6 @@ services:
       https_proxy: ${https_proxy}
       TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
       TEI_EMBEDDING_MODEL_NAME: ${TEI_EMBEDDING_MODEL_NAME}
-      LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
     restart: unless-stopped
 
 networks:

diff --git a/comps/llms/text-generation/native/Dockerfile b/comps/llms/text-generation/native/Dockerfile
diff --git a/comps/llms/text-generation/native/README.md b/comps/llms/text-generation/native/README.md
@@ -0,0 +1,61 @@
+# LLM Native Microservice
+
+LLM Native microservice uses [optimum-habana](https://github.com/huggingface/optimum-habana) for model initialization and warm-up, focusing solely on large language models (LLMs). It operates without frameworks like TGI/VLLM, using PyTorch directly for inference, and supports only non-streaming formats. This streamlined approach optimizes performance on Habana hardware.
+
+## 🚀1. Start Microservice
+
+If you start an LLM microservice with docker, the `docker_compose_llm.yaml` file will automatically start a Native LLM service with docker.
+
+### 1.1 Setup Environment Variables
+
+In order to start Native LLM service, you need to setup the following environment variables first.
+
+```bash
+export LLM_NATIVE_MODEL="Qwen/Qwen2-7B-Instruct"
+```
+
+### 1.2 Build Docker Image
+
+```bash
+cd ../../../../
+docker build -t opea/llm-native:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/native/docker/Dockerfile .
+```
+
+To start a docker container, you have two options:
+
+- A. Run Docker with CLI
+- B. Run Docker with Docker Compose
+
+You can choose one as needed.
+
+### 1.3 Run Docker with CLI (Option A)
+
+```bash
+docker run -d --runtime=habana --name="llm-native-server" -p 9000:9000 -e https_proxy=$https_proxy -e http_proxy=$http_proxy -e TOKENIZERS_PARALLELISM=false -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e LLM_NATIVE_MODEL=${LLM_NATIVE_MODEL} opea/llm-native:latest
+```
+
+### 1.4 Run Docker with Docker Compose (Option B)
+
+```bash
+cd docker
+docker compose -f docker_compose_llm.yaml up -d
+```
+
+## 🚀2. Consume LLM Service
+
+### 2.1 Check Service Status
+
+```bash
+curl http://${your_ip}:9000/v1/health_check\
+  -X GET \
+  -H 'Content-Type: application/json'
+```
+
+### 2.2 Consume LLM Service
+
+```bash
+curl http://${your_ip}:9000/v1/chat/completions\
+  -X POST \
+  -d '{"query":"What is Deep Learning?"}' \
+  -H 'Content-Type: application/json'
+```
diff --git a/comps/llms/text-generation/native/docker/Dockerfile b/comps/llms/text-generation/native/docker/Dockerfile
@@ -0,0 +1,42 @@
+
+
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+# HABANA environment
+FROM vault.habana.ai/gaudi-docker/1.16.0/ubuntu22.04/habanalabs/pytorch-installer-2.2.2:latest as hpu
+
+ENV LANG=en_US.UTF-8
+ARG REPO=https://github.com/huggingface/optimum-habana.git
+ARG REPO_VER=v1.12.1
+
+RUN apt-get update && \
+    apt-get install git-lfs && \
+    git-lfs install && \
+    apt-get install -y --no-install-recommends --fix-missing \
+    libgl1-mesa-glx \
+    libjemalloc-dev \
+    vim
+
+RUN useradd -m -s /bin/bash user && \
+    mkdir -p /home/user && \
+    chown -R user /home/user/
+
+USER user
+
+COPY comps /home/user/comps
+
+RUN pip install --upgrade-strategy eager optimum[habana] && \
+    pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.17.0
+
+RUN git clone ${REPO} /home/user/optimum-habana && \
+    cd /home/user/optimum-habana && git checkout ${REPO_VER} && \
+    cd examples/text-generation && pip install -r requirements.txt && \
+    cd /home/user/comps/llms/text-generation/native && pip install -r requirements.txt && \
+    pip install --upgrade --force-reinstall pydantic
+
+ENV PYTHONPATH=/root:/home/user
+
+WORKDIR /home/user/comps/llms/text-generation/native
+
+ENTRYPOINT ["python", "llm.py"]
diff --git a/comps/llms/text-generation/native/docker/docker_compose_llm.yaml b/comps/llms/text-generation/native/docker/docker_compose_llm.yaml
@@ -0,0 +1,28 @@
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+version: "3.8"
+
+services:
+  llm:
+    image: opea/llm-native:latest
+    container_name: llm-native-server
+    ports:
+      - "9000:9000"
+    runtime: habana
+    cap_add:
+      - SYS_NICE
+    ipc: host
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      LLM_NATIVE_MODEL: ${LLM_NATIVE_MODEL}
+      HABANA_VISIBLE_DEVICES: all
+      OMPI_MCA_btl_vader_single_copy_mechanism: none
+      TOKENIZERS_PARALLELISM: false
+    restart: unless-stopped
+
+networks:
+  default:
+    driver: bridge