-
Notifications
You must be signed in to change notification settings - Fork 10
Description
I have been trying to get vLLM running on Intel B60 Cards with Arrow Lake CPU. The procedure I followed are as follows. It is taken from this site. I have 2 B60 Cards in the machine. Using Ubuntu 24.04.
Linux ARLS-ASRK-02 6.14.0-33-generic #33~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 19 17:02:30 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
https://docs.vllm.ai/en/stable/getting_started/installation/gpu.html#build-wheel-from-source
git clone https://github.com/vllm-project/vllm.git
cd vllm
docker build -f docker/Dockerfile.xpu -t vllm-xpu-env --shm-size=4g .
docker run -it
--rm
--network=host
--device /dev/dri
-v /dev/dri/by-path:/dev/dri/by-path
vllm-xpu-env
python -m vllm.entrypoints.openai.api_server
--model=facebook/opt-13b
--dtype=bfloat16
--max_model_len=1024
--distributed-executor-backend=mp
--pipeline-parallel-size=2
-tp=8
Should I be using llm-scaler for 2 Intel B60 Cards with Arrow Lake CPU?