LLM Setup

Article

Install APISIX

cd apisix
docker compose up -d

Run LLM Model

docker run -d --runtime nvidia --gpus all \
-p 4001:30000 \
-v /data/models/.cache/huggingface:/root/.cache/huggingface \
-e "HF_TOKEN=<YOUR_HUGGING_FACE_TOKEN>" \
--ipc=host \
--network ai \
--restart always \
--name sglang-neuralmagic-Meta-Llama-3.1-70B-Instruct-FP8 \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8 \
--host 0.0.0.0 --port 30000 \
--mem-fraction-static 0.5 \
--tp 2

CURL LLM API via APISIX

curl http://localhost/llm/neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8/v1/completions \
-H "Content-Type: application/json" \
-d '{
     "model": "neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8",
     "prompt": "What is the LLM?",
     "temperature": 0.7
   }' -H "Authorization: Bearer 1234"

Custom SGLang Docker image

Dockerfile

FROM lmsysorg/sglang:latest

# Install python-multipart
RUN pip install python-multipart

Build image

docker build -t lotuss/sglang .

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
apisix		apisix
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Setup

Article

Install APISIX

Run LLM Model

CURL LLM API via APISIX

Custom SGLang Docker image

About

Releases

Packages

Contributors 2

jittagornp/llm-setup

Folders and files

Latest commit

History

Repository files navigation

LLM Setup

Article

Install APISIX

Run LLM Model

CURL LLM API via APISIX

Custom SGLang Docker image

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages