Skip to content

fix model parallel device mismatch issue in create_bidirectional_mask#46221

Merged
vasqu merged 1 commit into
huggingface:mainfrom
kaixuanliu:bidirection_mask_device
May 26, 2026
Merged

fix model parallel device mismatch issue in create_bidirectional_mask#46221
vasqu merged 1 commit into
huggingface:mainfrom
kaixuanliu:bidirection_mask_device

Conversation

@kaixuanliu
Copy link
Copy Markdown
Contributor

In create_bidirectional_mask func, we need to keep the device align with input_embeds.
e.g. in this case:
tests/models/t5gemma/test_modeling_t5gemma.py::T5GemmaModelTest::test_model_parallel_beam_search,
encoder_hidden_states may live on a different device than inputs_embeds (e.g. cross-attention from a decoder to encoder states) in model parallel scene. I think it may be a common bug for decoder-encoder model family. And this PR tries to solve it.
@Cyrilvallez @vasqu pls help review, thx!

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This indeed is an edge case we never considered before

@vasqu vasqu enabled auto-merge May 26, 2026 15:16
@vasqu vasqu added this pull request to the merge queue May 26, 2026
Merged via the queue into huggingface:main with commit 710742f May 26, 2026
31 checks passed
@kaixuanliu kaixuanliu deleted the bidirection_mask_device branch May 27, 2026 02:07
yuchenxie4645 pushed a commit to yuchenxie4645/transformers that referenced this pull request May 28, 2026
…k` (huggingface#46221)

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
kashif pushed a commit to kashif/transformers that referenced this pull request Jun 1, 2026
…k` (huggingface#46221)

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants