Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry on Sequence Parallel Support for VocabParallelEmbedding #389

Open
qinxiangyujiayou opened this issue May 18, 2024 · 0 comments
Open

Comments

@qinxiangyujiayou
Copy link

qinxiangyujiayou commented May 18, 2024

Dear Team,

Thank you for your excellent work. I have a question regarding the use of VocabParallelEmbedding in the context of sequence parallelism. I noticed in the code:

if self.sequence_parallel:
    # already partitioned sequence, do not need scatter_to_sequence_parallel_region
    # embeddings = tensor_parallel.scatter_to_sequence_parallel_region(embeddings)

It appears that the data input to the embedding layer is already partitioned. However, VocabParallelEmbedding does not seem to support using partitioned data directly, as its embedding parameters are distributed across different GPUs. I am wondering if this is a bug or perhaps there is a misunderstanding on my part. I look forward to your clarification.

Thank you very much.

Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant