fix(asr): resolve tensor device mismatch in multi-GPU environments#260
Open
JasonOA888 wants to merge 2 commits intomicrosoft:mainfrom
Open
fix(asr): resolve tensor device mismatch in multi-GPU environments#260JasonOA888 wants to merge 2 commits intomicrosoft:mainfrom
JasonOA888 wants to merge 2 commits intomicrosoft:mainfrom
Conversation
When using device_map=auto, multi-GPU inference, tensors from different model components may reside on different devices. The causes: 1. speech_masks indexing acoustic/semantic features fails when masks are on a different device than the features 2. acoustic_input_mask indexing inputs_embeds fails when mask is on a different device than the embeddings Root cause: accelerate's device_map=auto distributes model sublayers across GPUs. The acoustic_tokenizer and semantic_tokenizer may end up on different devices, the language_model layers. The speech_masks and acoustic_input_mask (usually on CPU or cuda:0) are not moved to match. Fix: Ensure all masks are moved to the same device as the tensors they index before any indexing operation. Fixes microsoft#240
When using device_map=auto, multi-GPU inference fails because index/mask tensors reside on different devices than the data they are indexing into. Three locations fixed: 1. encode_speech(): speech_masks moved to features device before indexing 2. forward(): acoustic_input_mask move to inputs_embeds device 3. encode_speech(): speech_semantic_tensors move to connector device For speech_semantic_tensors 4. All mask/index tensors are now moved to the target device before use, preventing 'indices should be either on cpu or on the same device as the indexed tensor' errors. Fixes microsoft#240
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #240
Problem
When using
device_map=autowith multiple GPUs, inference fails with:This occurs at
modeling_vibevoice_asr.py:335:Root Cause
accelerate'sdevice_map=autodistributes model sublayers across GPUs. Theacoustic_tokenizermay end up oncuda:0whilesemantic_connectorlayers are oncuda:6. Butspeech_masks(created on CPU orcuda:0) is used to index tensors on different devices.Similarly,
acoustic_input_maskindexesinputs_embedswithout device alignment.Solution
Move all mask/index tensors to the target device before indexing:
speech_masks.to(acoustic_features.device)inencode_speech()acoustic_input_mask.to(inputs_embeds.device)inforward()speech_masks.to(audio_features.device)inmodeling_vibevoice.pyforward_speech_features()speech_semantic_tensorsdevice alignment before passing tosemantic_connectorTesting
.to()is a no-op)Files Changed
vibevoice/modular/modeling_vibevoice_asr.py- Fixed 2 device mismatch locationsvibevoice/modular/modeling_vibevoice.py- Fixed 2 device mismatch locations