-
Notifications
You must be signed in to change notification settings - Fork 901
Open
Description
Describe the feature
CUDA_VISIBLE_DEVICES=0,1 \
NPROC_PER_NODE=2 \
MAX_PIXELS=1003520 \
swift deploy \
--model model/Qwen/Qwen2.5-VL-7B-Instruct \
--adapters output/Qwen2.5-VL-7B-SFT-LoRA-seq/v8-20251010-061551/checkpoint-30000 \
--infer_backend vllm \
--merge_lora true \
--gpu_memory_utilization 0.95 \
--max_model_len 32768 \
--max_new_tokens 2048 \
--served_model_name Qwen2.5-VL-7B-Instruct \
Refer to the examples/infer/vllm/mllm_ddp.sh script. When I run the above command, why does only one GPU start up (I want data parallel)?
Possible solutions
- Multi-instance Data Parallel (DP): Launch multiple service processes using
CUDA_VISIBLE_DEVICES=0 swift deploy ... --port 8000
andCUDA_VISIBLE_DEVICES=1 swift deploy ... --port 8001
respectively. Each --port 8000and
CUDA_VISIBLE_DEVICES=1 swift deploy ... --port 8001` to launch multiple service processes, each binding to one GPU and one port. Then use load balancers like nginx to distribute traffic across multiple backends, achieving near-linear throughput scaling. - Is there a more elegant way for
swift deploy
to automatically leverage multiple GPUs for data parallelism during service deployment? For example, supporting parameters likeNPROC_PER_NODE
, or an officially recommended multi-GPU service deployment approach? - Are there plans to support directly launching multi-process, multi-GPU Data Parallel via
swift deploy
Metadata
Metadata
Assignees
Labels
No labels