Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result changes drastically depending on initialization #420

Open
nonconvexopt opened this issue Jun 25, 2024 · 2 comments
Open

Result changes drastically depending on initialization #420

nonconvexopt opened this issue Jun 25, 2024 · 2 comments

Comments

@nonconvexopt
Copy link

nonconvexopt commented Jun 25, 2024

I experienced that compared to Transformers, Mamba model has large variance in performance with respect to model initialization. Did you guys also noticed this?

@tridao
Copy link
Collaborator

tridao commented Jun 25, 2024

For language modeling the performance is stable. Which intialization do you mean?

@nonconvexopt
Copy link
Author

Just random initialization. I am applying it to financial time series but result has more variance compared to Transformer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants