Skip to content

Commit

Permalink
Refine LLM Native Microservice (#477)
Browse files Browse the repository at this point in the history
* refine test script of embedding llama_index

Signed-off-by: letonghan <letong.han@intel.com>

* update

Signed-off-by: letonghan <letong.han@intel.com>

* update code of llm-native

Signed-off-by: letonghan <letong.han@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix llm native issue

Signed-off-by: letonghan <letong.han@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* udpate param & add readme

Signed-off-by: letonghan <letong.han@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: letonghan <letong.han@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
letonghan and pre-commit-ci[bot] authored Aug 16, 2024
1 parent cf15b91 commit b16b14a
Show file tree
Hide file tree
Showing 12 changed files with 883 additions and 293 deletions.
12 changes: 6 additions & 6 deletions comps/embeddings/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ First, you need to start a TEI service.
your_port=8090
model="BAAI/bge-large-en-v1.5"
revision="refs/pr/5"
docker run -p $your_port:80 -v ./data:/data --name tei_server -e http_proxy=$http_proxy -e https_proxy=$https_proxy --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 --model-id $model --revision $revision
docker run -p $your_port:80 -v ./data:/data --name tei_server -e http_proxy=$http_proxy -e https_proxy=$https_proxy --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 --model-id $model --revision $revision
```

Then you need to test your TEI service using the following commands:
Expand All @@ -66,9 +66,6 @@ cd langchain
cd llama_index
export TEI_EMBEDDING_ENDPOINT="http://localhost:$yourport"
export TEI_EMBEDDING_MODEL_NAME="BAAI/bge-large-en-v1.5"
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=${your_langchain_api_key}
export LANGCHAIN_PROJECT="opea/gen-ai-comps:embeddings"
python embedding_tei.py
```

Expand All @@ -92,7 +89,7 @@ First, you need to start a TEI service.
your_port=8090
model="BAAI/bge-large-en-v1.5"
revision="refs/pr/5"
docker run -p $your_port:80 -v ./data:/data --name tei_server -e http_proxy=$http_proxy -e https_proxy=$https_proxy --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 --model-id $model --revision $revision
docker run -p $your_port:80 -v ./data:/data --name tei_server -e http_proxy=$http_proxy -e https_proxy=$https_proxy --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 --model-id $model --revision $revision
```

Then you need to test your TEI service using the following commands:
Expand Down Expand Up @@ -124,13 +121,16 @@ docker build -t opea/embedding-tei:latest --build-arg https_proxy=$https_proxy -

```bash
cd ../../
docker build -t opea/embedding-tei:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/embeddings/llama_index/docker/Dockerfile .
docker build -t opea/embedding-tei-llama-index:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/embeddings/llama_index/docker/Dockerfile .
```

## 2.3 Run Docker with CLI

```bash
# run with langchain docker
docker run -d --name="embedding-tei-server" -p 6000:6000 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e TEI_EMBEDDING_ENDPOINT=$TEI_EMBEDDING_ENDPOINT -e TEI_EMBEDDING_MODEL_NAME=$TEI_EMBEDDING_MODEL_NAME opea/embedding-tei:latest
# run with llama-index docker
docker run -d --name="embedding-tei-llama-index-server" -p 6000:6000 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e TEI_EMBEDDING_ENDPOINT=$TEI_EMBEDDING_ENDPOINT -e TEI_EMBEDDING_MODEL_NAME=$TEI_EMBEDDING_MODEL_NAME opea/embedding-tei-llama-index:latest
```

## 2.4 Run Docker with Docker Compose
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ version: "3.8"

services:
embedding:
image: opea/embedding-tei:latest
image: opea/embedding-tei-llama-index:latest
container_name: embedding-tei-server
ports:
- "6000:6000"
Expand All @@ -16,7 +16,6 @@ services:
https_proxy: ${https_proxy}
TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
TEI_EMBEDDING_MODEL_NAME: ${TEI_EMBEDDING_MODEL_NAME}
LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
restart: unless-stopped

networks:
Expand Down
41 changes: 0 additions & 41 deletions comps/llms/text-generation/native/Dockerfile

This file was deleted.

61 changes: 61 additions & 0 deletions comps/llms/text-generation/native/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# LLM Native Microservice

LLM Native microservice uses [optimum-habana](https://github.com/huggingface/optimum-habana) for model initialization and warm-up, focusing solely on large language models (LLMs). It operates without frameworks like TGI/VLLM, using PyTorch directly for inference, and supports only non-streaming formats. This streamlined approach optimizes performance on Habana hardware.

## 🚀1. Start Microservice

If you start an LLM microservice with docker, the `docker_compose_llm.yaml` file will automatically start a Native LLM service with docker.

### 1.1 Setup Environment Variables

In order to start Native LLM service, you need to setup the following environment variables first.

```bash
export LLM_NATIVE_MODEL="Qwen/Qwen2-7B-Instruct"
```

### 1.2 Build Docker Image

```bash
cd ../../../../
docker build -t opea/llm-native:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/native/docker/Dockerfile .
```

To start a docker container, you have two options:

- A. Run Docker with CLI
- B. Run Docker with Docker Compose

You can choose one as needed.

### 1.3 Run Docker with CLI (Option A)

```bash
docker run -d --runtime=habana --name="llm-native-server" -p 9000:9000 -e https_proxy=$https_proxy -e http_proxy=$http_proxy -e TOKENIZERS_PARALLELISM=false -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e LLM_NATIVE_MODEL=${LLM_NATIVE_MODEL} opea/llm-native:latest
```

### 1.4 Run Docker with Docker Compose (Option B)

```bash
cd docker
docker compose -f docker_compose_llm.yaml up -d
```

## 🚀2. Consume LLM Service

### 2.1 Check Service Status

```bash
curl http://${your_ip}:9000/v1/health_check\
-X GET \
-H 'Content-Type: application/json'
```

### 2.2 Consume LLM Service

```bash
curl http://${your_ip}:9000/v1/chat/completions\
-X POST \
-d '{"query":"What is Deep Learning?"}' \
-H 'Content-Type: application/json'
```
42 changes: 42 additions & 0 deletions comps/llms/text-generation/native/docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@


# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

# HABANA environment
FROM vault.habana.ai/gaudi-docker/1.16.0/ubuntu22.04/habanalabs/pytorch-installer-2.2.2:latest as hpu

ENV LANG=en_US.UTF-8
ARG REPO=https://github.com/huggingface/optimum-habana.git
ARG REPO_VER=v1.12.1

RUN apt-get update && \
apt-get install git-lfs && \
git-lfs install && \
apt-get install -y --no-install-recommends --fix-missing \
libgl1-mesa-glx \
libjemalloc-dev \
vim

RUN useradd -m -s /bin/bash user && \
mkdir -p /home/user && \
chown -R user /home/user/

USER user

COPY comps /home/user/comps

RUN pip install --upgrade-strategy eager optimum[habana] && \
pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.17.0

RUN git clone ${REPO} /home/user/optimum-habana && \
cd /home/user/optimum-habana && git checkout ${REPO_VER} && \
cd examples/text-generation && pip install -r requirements.txt && \
cd /home/user/comps/llms/text-generation/native && pip install -r requirements.txt && \
pip install --upgrade --force-reinstall pydantic

ENV PYTHONPATH=/root:/home/user

WORKDIR /home/user/comps/llms/text-generation/native

ENTRYPOINT ["python", "llm.py"]
28 changes: 28 additions & 0 deletions comps/llms/text-generation/native/docker/docker_compose_llm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

version: "3.8"

services:
llm:
image: opea/llm-native:latest
container_name: llm-native-server
ports:
- "9000:9000"
runtime: habana
cap_add:
- SYS_NICE
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
LLM_NATIVE_MODEL: ${LLM_NATIVE_MODEL}
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
TOKENIZERS_PARALLELISM: false
restart: unless-stopped

networks:
default:
driver: bridge
Loading

0 comments on commit b16b14a

Please sign in to comment.