-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FAW] parallel FreqAwareEmbedding #1424
Conversation
…ps/cacheembedding
…ps/cacheembedding
per_sample_weights, self.include_last_offset, self.padding_idx) | ||
|
||
if shape_hook is not None: | ||
output_shard = shape_hook(output_shard) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the shape hook might introduce some tensor view&squeeze operations.
I suppose we should be aware of that when using ColoParameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, can you update it in another PR? I copy this line from yours code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ColoTensor's spec had troubles with view operations before.
Does it support ColoTensor.view and ColoTensor.transpose now?
No description provided.