Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

does forward/eval from a trained mamba model require cuda as well? #89

Open
shadowleaves opened this issue Jan 1, 2024 · 5 comments
Open

Comments

@shadowleaves
Copy link

codes in selective_scan_fwd() of selective_scan.cpp seem to suggest even forward from a trained model would require cuda, which might be inconvenient when running models in production environments. Any idea how to do model forward on a CPU-only machine? Thanks

@tridao
Copy link
Collaborator

tridao commented Jan 1, 2024

Yup, it's only implemented for CUDA for now. You can look at the selective_scan_ref for the pure pytorch implementation that should run on CPU (though probably quite slow).

@shadowleaves
Copy link
Author

thanks, will look into it

@kroggen
Copy link

kroggen commented Jan 3, 2024

You can check this fork. It works on CPU

@JulienSiems
Copy link

@kroggen Thanks for the cpu version. Would be nice if you added this as a PR, currently using your code for debugging.

@kroggen
Copy link

kroggen commented Jan 14, 2024

Inference of Mamba models in pure C

https://github.com/kroggen/mamba.c

Recurrent mode only, for simplicity

Faster than pytorch (in default mode) on CPU

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants