Skip to content

Data parallel for deploy. #6097

@YushunXiang

Description

@YushunXiang

Describe the feature

CUDA_VISIBLE_DEVICES=0,1 \
NPROC_PER_NODE=2 \
MAX_PIXELS=1003520 \
swift deploy \
    --model model/Qwen/Qwen2.5-VL-7B-Instruct \
    --adapters output/Qwen2.5-VL-7B-SFT-LoRA-seq/v8-20251010-061551/checkpoint-30000 \
    --infer_backend vllm \
    --merge_lora true \
    --gpu_memory_utilization 0.95 \
    --max_model_len 32768 \
    --max_new_tokens 2048 \
    --served_model_name Qwen2.5-VL-7B-Instruct \

Refer to the examples/infer/vllm/mllm_ddp.sh script. When I run the above command, why does only one GPU start up (I want data parallel)?

Possible solutions

  • Multi-instance Data Parallel (DP): Launch multiple service processes using CUDA_VISIBLE_DEVICES=0 swift deploy ... --port 8000 and CUDA_VISIBLE_DEVICES=1 swift deploy ... --port 8001 respectively. Each --port 8000andCUDA_VISIBLE_DEVICES=1 swift deploy ... --port 8001` to launch multiple service processes, each binding to one GPU and one port. Then use load balancers like nginx to distribute traffic across multiple backends, achieving near-linear throughput scaling.
  • Is there a more elegant way for swift deploy to automatically leverage multiple GPUs for data parallelism during service deployment? For example, supporting parameters like NPROC_PER_NODE, or an officially recommended multi-GPU service deployment approach?
  • Are there plans to support directly launching multi-process, multi-GPU Data Parallel via swift deploy

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions