[DOCS] Update the document of HETERO pipeline parallelism #24470

WeldonWangwang · 2024-05-11T03:43:00Z

Details:

Update the document of HETERO pipeline parallelism

Tickets:

CVS-138199

...cles_en/openvino-workflow/running-inference/inference-devices-and-modes/hetero-execution.rst

kblaszczak-intel

That is nice, thanks for aligning the header underscores :)

songbell · 2024-05-16T09:35:57Z

...cles_en/openvino-workflow/running-inference/inference-devices-and-modes/hetero-execution.rst

+Pipeline parallelism
+------------------------
+
+The pipeline parallelism is set via ``ov::hint::model_distribution_policy``, This mode is an efficient technique to infer large models on multiple devices. The model is split into multiple stages, and each stage is assigned to a different device (``dGPU``, ``iGPU``, ``CPU``, etc.). This mode assign operations to different devices as reasonably as possible, ensuring that different stages can be executed in sequence and minimizing the amount of data transfer between different devices.


Suggested change

The pipeline parallelism is set via ``ov::hint::model_distribution_policy``, This mode is an efficient technique to infer large models on multiple devices. The model is split into multiple stages, and each stage is assigned to a different device (``dGPU``, ``iGPU``, ``CPU``, etc.). This mode assign operations to different devices as reasonably as possible, ensuring that different stages can be executed in sequence and minimizing the amount of data transfer between different devices.

The pipeline parallelism is set via ``ov::hint::model_distribution_policy``. This mode is an efficient technique to infer large models on multiple devices. The model is split into multiple stages, and each stage is assigned to a different device (``dGPU``, ``iGPU``, ``CPU``, etc.). This mode assign operations to different devices as reasonably as possible, ensuring that different stages can be executed in sequence and minimizing the amount of data transfer between different devices.

Updated, thanks.

songbell · 2024-05-16T09:40:15Z

...cles_en/openvino-workflow/running-inference/inference-devices-and-modes/hetero-execution.rst

+The pipeline parallelism is set via ``ov::hint::model_distribution_policy``, This mode is an efficient technique to infer large models on multiple devices. The model is split into multiple stages, and each stage is assigned to a different device (``dGPU``, ``iGPU``, ``CPU``, etc.). This mode assign operations to different devices as reasonably as possible, ensuring that different stages can be executed in sequence and minimizing the amount of data transfer between different devices.
+
+For large models which don’t fit on a single first priority device, model pipeline parallelism is employed where certain parts of the model are placed on different devices to ensure that the device has enough memory to infer these operations, and assign other operations to next priority device.
+


maybe can re-organize this part? kind of confusing of different devices, and next priority device

Yes, the description that may cause confusion has been removed.

…envinotoolkit#24470)" This reverts commit 03aad66.

Update the doc to HETERO pipeline parallelism

aea7a1b

WeldonWangwang requested review from peterchen-intel, wangleis, songbell and yangwang201911 May 11, 2024 03:43

WeldonWangwang requested review from a team as code owners May 11, 2024 03:43

WeldonWangwang requested review from akopytko and removed request for a team May 11, 2024 03:43

github-actions bot added category: docs OpenVINO documentation category: docs_snippets OpenVINO docs snippets (docs/snippets) labels May 11, 2024

WeldonWangwang added 2 commits May 11, 2024 12:03

Update the snippets of hetero code

0a549ee

Update the doc

2722a49

zhaixuejun1993 approved these changes May 13, 2024

View reviewed changes

zhaixuejun1993 reviewed May 13, 2024

View reviewed changes

...cles_en/openvino-workflow/running-inference/inference-devices-and-modes/hetero-execution.rst Show resolved Hide resolved

peterchen-intel approved these changes May 14, 2024

View reviewed changes

kblaszczak-intel approved these changes May 15, 2024

View reviewed changes

songbell reviewed May 16, 2024

View reviewed changes

WeldonWangwang added 2 commits May 17, 2024 10:12

Update the doc

1dab6c7

Merge branch 'master' into wangwang/add_hetero_pipeline_parallel_doc

780ecc0

github-actions bot removed the category: docs_snippets OpenVINO docs snippets (docs/snippets) label May 17, 2024

Assets move

2617d6c

songbell approved these changes May 17, 2024

View reviewed changes

peterchen-intel assigned wangleis May 17, 2024

wangleis approved these changes May 17, 2024

View reviewed changes

wangleis added this pull request to the merge queue May 17, 2024

Merged via the queue into openvinotoolkit:master with commit 03aad66 May 17, 2024
80 checks passed

WeldonWangwang added a commit to WeldonWangwang/openvino that referenced this pull request May 23, 2024

Revert "[DOCS] Update the document of HETERO pipeline parallelism (op…

3b00729

…envinotoolkit#24470)" This reverts commit 03aad66.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOCS] Update the document of HETERO pipeline parallelism #24470

[DOCS] Update the document of HETERO pipeline parallelism #24470

WeldonWangwang commented May 11, 2024

kblaszczak-intel left a comment

songbell May 16, 2024

WeldonWangwang May 17, 2024

songbell May 16, 2024

WeldonWangwang May 17, 2024

		The pipeline parallelism is set via ``ov::hint::model_distribution_policy``, This mode is an efficient technique to infer large models on multiple devices. The model is split into multiple stages, and each stage is assigned to a different device (``dGPU``, ``iGPU``, ``CPU``, etc.). This mode assign operations to different devices as reasonably as possible, ensuring that different stages can be executed in sequence and minimizing the amount of data transfer between different devices.

		For large models which don’t fit on a single first priority device, model pipeline parallelism is employed where certain parts of the model are placed on different devices to ensure that the device has enough memory to infer these operations, and assign other operations to next priority device.

[DOCS] Update the document of HETERO pipeline parallelism #24470

[DOCS] Update the document of HETERO pipeline parallelism #24470

Conversation

WeldonWangwang commented May 11, 2024

Details:

Tickets:

kblaszczak-intel left a comment

Choose a reason for hiding this comment

songbell May 16, 2024

Choose a reason for hiding this comment

WeldonWangwang May 17, 2024

Choose a reason for hiding this comment

songbell May 16, 2024

Choose a reason for hiding this comment

WeldonWangwang May 17, 2024

Choose a reason for hiding this comment