does forward/eval from a trained mamba model require cuda as well? #89

shadowleaves · 2024-01-01T05:36:18Z

codes in selective_scan_fwd() of selective_scan.cpp seem to suggest even forward from a trained model would require cuda, which might be inconvenient when running models in production environments. Any idea how to do model forward on a CPU-only machine? Thanks

tridao · 2024-01-01T05:38:05Z

Yup, it's only implemented for CUDA for now. You can look at the selective_scan_ref for the pure pytorch implementation that should run on CPU (though probably quite slow).

shadowleaves · 2024-01-01T05:43:33Z

thanks, will look into it

kroggen · 2024-01-03T04:38:47Z

You can check this fork. It works on CPU

JulienSiems · 2024-01-08T09:39:45Z

@kroggen Thanks for the cpu version. Would be nice if you added this as a PR, currently using your code for debugging.

kroggen · 2024-01-14T03:09:42Z

Inference of Mamba models in pure C

https://github.com/kroggen/mamba.c

Recurrent mode only, for simplicity

Faster than pytorch (in default mode) on CPU

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

does forward/eval from a trained mamba model require cuda as well? #89

does forward/eval from a trained mamba model require cuda as well? #89

shadowleaves commented Jan 1, 2024

tridao commented Jan 1, 2024

shadowleaves commented Jan 1, 2024

kroggen commented Jan 3, 2024

JulienSiems commented Jan 8, 2024

kroggen commented Jan 14, 2024

does forward/eval from a trained mamba model require cuda as well? #89

does forward/eval from a trained mamba model require cuda as well? #89

Comments

shadowleaves commented Jan 1, 2024

tridao commented Jan 1, 2024

shadowleaves commented Jan 1, 2024

kroggen commented Jan 3, 2024

JulienSiems commented Jan 8, 2024

kroggen commented Jan 14, 2024