-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about the model design. #8
Comments
Hi, It is generally believed that the single direction is not as good as the birectional Mamba. At the same time, different scan strategies can further improve the generationperformance, which can refer to the discussion in Zigma and DIM paper. For simplicity, we used bidirectional Mamba here. However, it is worth noting that there has been an increasing focus of work on autoregression, such as llamagen [3] and Kaiming He' recent work [4]. [1] ZigMa: A DiT-style Zigzag Mamba Diffusion Model |
thks! I have noticed these papers.😊
费政聪 ***@***.***>于2024年7月15日 周一11:16写道:
… Hi, It is generally believed that the single direction is not as good as
the birectional Mamba. At the same time, different scan strategies can
further improve the generationperformance, which can refer to the
discussion in Zigma and DIM paper. For simplicity, we used bidirectional
Mamba here.
However, it is worth noting that there has been an increasing focus of
work on autoregression, such as llamagen [3] and Kaiming He' recent work
[4].
[1] ZigMa: A DiT-style Zigzag Mamba Diffusion Model
[2] DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis
[3] Autoregressive Model Beats Diffusion: Llama for Scalable Image
Generation
[4] Autoregressive Image Generation without Vector Quantization
—
Reply to this email directly, view it on GitHub
<#8 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ARTC2EH6DHHSTOVE35Y24SDZMM5J7AVCNFSM6AAAAABKK33J32VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRXGYZTMMJQGM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hi, this is a great work!
I would like to know why you use bidirectional Mamba? Does a single directional Mamba have any problems in your experiments?
The text was updated successfully, but these errors were encountered: