-
Notifications
You must be signed in to change notification settings - Fork 30.7k
Adds Causal Conv 1D kernel for mamba models #40765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo, the users should have both options: kernels and/or the original cuda kernel
(This will break for users that used the original kernels)
Also it could be done for more models - mamba2, bamba, jamba, ...
Indeed, i guess i will keep the other import of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, lgtm overall
Not now, but in the future it would be nice to have lazy loading like flash attention
" is None. Falling back to the mamba.py backend. To install follow https://github.com/state-spaces/mamba/#installation and" | ||
" https://github.com/Dao-AILab/causal-conv1d" | ||
" is None. Falling back to the mamba.py backend. To install follow https://github.com/state-spaces/mamba/#installation for mamba-ssm and" | ||
" install the kernels library using `pip install kernels`" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to revamp the message? Just so people know that both are possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes done, but for V5 I guess we need to start pushing people to use extensively kernels since it's a lot better
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the _lazy_loading_kernel
is counter intutive and IMO not needed as it just wraps around get kernel
def _lazy_loading_kernel(kernel_name: str) -> None: | ||
kernel = get_kernel(kernel_name) | ||
return kernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how this one is actualy lazy? but does not really seem needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, not taking a careful look but the why for lazy loading on causal conv1d is the same as for flash attn:
- the builds are not always stable and can lead easily to torch mismatches
- if someone doesnt use this then they shouldnt be affected by it (ie it should not crash)
At some point, I hope the build system on fa etc changes to smthn like manylinux 😢 it's a pain on the original packages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no but here we hare just "naming" _lazy_loading_kernel
while we are not changing get_kernel
: tldr it is not helping.
I am glad to have lazy loading, but only if we properly do lazy load haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AH ok, yea makes sense didnt take a proper look at the function oops
" is None. Falling back to the sequential implementation of Mamba, as use_mambapy is set to False. To install follow https://github.com/state-spaces/mamba/#installation and" | ||
" https://github.com/Dao-AILab/causal-conv1d. For the mamba.py backend, follow https://github.com/alxndrTL/mamba.py." | ||
" is None. Falling back to the sequential implementation of Mamba, as use_mambapy is set to False. To install follow https://github.com/state-spaces/mamba/#installation for mamba-ssm and" | ||
" https://github.com/Dao-AILab/causal-conv1d or install kernels for causal-conv1d. For the mamba.py backend, follow https://github.com/alxndrTL/mamba.py." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" https://github.com/Dao-AILab/causal-conv1d or install kernels for causal-conv1d. For the mamba.py backend, follow https://github.com/alxndrTL/mamba.py." | |
" https://github.com/Dao-AILab/causal-conv1d or `pip install kernels` for causal-conv1d. For the mamba.py backend, follow https://github.com/alxndrTL/mamba.py." |
[For maintainers] Suggested jobs to run (before merge) run-slow: falcon_mamba, mamba |
Thank you all for the reviews 🤗 |
* add kernel * make style * keep causal-conv1d * small fix * small fix * fix modular converter * modular fix + lazy loading * revert changes modular * nit * hub kernels update * update * small nit
* add kernel * make style * keep causal-conv1d * small fix * small fix * fix modular converter * modular fix + lazy loading * revert changes modular * nit * hub kernels update * update * small nit
* add kernel * make style * keep causal-conv1d * small fix * small fix * fix modular converter * modular fix + lazy loading * revert changes modular * nit * hub kernels update * update * small nit
What does this PR do?
Adds the https://huggingface.co/kernels-community/causal-conv1d to the mamba model.