Data parallel for deploy.

## Describe the feature
```shell
CUDA_VISIBLE_DEVICES=0,1 \
NPROC_PER_NODE=2 \
MAX_PIXELS=1003520 \
swift deploy \
    --model model/Qwen/Qwen2.5-VL-7B-Instruct \
    --adapters output/Qwen2.5-VL-7B-SFT-LoRA-seq/v8-20251010-061551/checkpoint-30000 \
    --infer_backend vllm \
    --merge_lora true \
    --gpu_memory_utilization 0.95 \
    --max_model_len 32768 \
    --max_new_tokens 2048 \
    --served_model_name Qwen2.5-VL-7B-Instruct \
```
Refer to the [examples/infer/vllm/mllm_ddp.sh](https://github.com/modelscope/ms-swift/blob/main/examples/infer/vllm/mllm_ddp.sh) script. When I run the above command, why does only one GPU start up (I want data parallel)?

## Possible solutions

- Multi-instance Data Parallel (DP): Launch multiple service processes using `CUDA_VISIBLE_DEVICES=0 swift deploy ... --port 8000` and `CUDA_VISIBLE_DEVICES=1 swift deploy ... --port 8001` respectively. Each --port 8000` and `CUDA_VISIBLE_DEVICES=1 swift deploy ... --port 8001` to launch multiple service processes, each binding to one GPU and one port. Then use load balancers like nginx to distribute traffic across multiple backends, achieving near-linear throughput scaling.
- Is there a more elegant way for `swift deploy` to automatically leverage multiple GPUs for data parallelism during service deployment? For example, supporting parameters like `NPROC_PER_NODE`, or an officially recommended multi-GPU service deployment approach?
- Are there plans to support directly launching multi-process, multi-GPU Data Parallel via `swift deploy`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data parallel for deploy. #6097

Describe the feature

Possible solutions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Data parallel for deploy. #6097

Description

Describe the feature

Possible solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions