-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about does mamba support variable-length input or cu_seqlens like flash attention? #180
Comments
Yes, there should be ways to deal with variable length. It's not implemented yet however. |
Got it. Thank you Tri Dao! |
Sorry but I still have some confusions: Is it theoretical possible for Mamba to provide variable-length API like Flash-Attention |
From my understanding, since Not sure if my understanding is correct? Just curious whether it is possible to feed in one packed sequence as input |
Just have another question, could Mamba be parallelized over |
It's theoretically possible to process variable lengths / packed sequences, but the implementation will be a bit tricky. |
In practice, depending on your setting, you may be able to simply concatenate the sequences and pass the whole sequence in (without enforcing state resetting at sequence boundaries). I've used this in the past where it has worked fine in some settings. |
It is often done that way, but it does cause sample cross contamination during training and that is usually not desirable. |
Yes. I'm just saying sometimes it's also fine :) |
Hi @albertfgu @tridao , I just have another confusion about mamba. Does that mean selective SSM mechanism can learn the boundary patterns by |
I also see one blog on together.ai and on cartesia.ai, where the next steps shows that variable length training are on the future roadmap. |
Update: |
We know that flash attention supports
cu_seqlens
, which can remove padding for variable-length input in a batch and only store regular tokens. This can be useful for optimizing the computational efficiency when packing multiple short sequences.So, does Mamba also have this mechanism such as variable-length input or
cu_seqlens
like flash attention?The text was updated successfully, but these errors were encountered: