You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was exploring using Tensor Parallel when training. I was wondering if you had any input on the correct use of RowParallelLinear when it comes to the feedforward out.
Normally I would just do Column Parallel, SwiGLU, Row Parallel in a standard FeedForward but it is not super clear to me how to handle this case when it comes to fused attn ff and ff tail.
Any input would be greatly appreciated.
Thank you,
Enrico
The text was updated successfully, but these errors were encountered:
Hi,
I was exploring using Tensor Parallel when training. I was wondering if you had any input on the correct use of RowParallelLinear when it comes to the feedforward out.
For example:
Column Parallel over q, k, v, and ff inner.
Row Parallel over attn out.
I am not 100% sure whether this should be Row Parallel as well.
Normally I would just do Column Parallel, SwiGLU, Row Parallel in a standard FeedForward but it is not super clear to me how to handle this case when it comes to
fused attn ff
andff tail
.Any input would be greatly appreciated.
Thank you,
Enrico
The text was updated successfully, but these errors were encountered: