You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Awesome project! Thank you publishing it! I was just curious about the following:
Why does the SegMoE SD 4x2 model have Mixture of Experts (MoE) layers within their attention heads, while most other models, including the tutorial on Huggingface (https://huggingface.co/blog/moe), typically use MoE layers in the feedforward network (FFN)? What's the distinction between these approaches?
The text was updated successfully, but these errors were encountered:
Awesome project! Thank you publishing it! I was just curious about the following:
Why does the SegMoE SD 4x2 model have Mixture of Experts (MoE) layers within their attention heads, while most other models, including the tutorial on Huggingface (https://huggingface.co/blog/moe), typically use MoE layers in the feedforward network (FFN)? What's the distinction between these approaches?
The text was updated successfully, but these errors were encountered: