Question about does mamba support variable-length input or cu_seqlens like flash attention? #180

zigzagcai · 2024-02-20T07:09:44Z

We know that flash attention supports cu_seqlens, which can remove padding for variable-length input in a batch and only store regular tokens. This can be useful for optimizing the computational efficiency when packing multiple short sequences.

So, does Mamba also have this mechanism such as variable-length input or cu_seqlens like flash attention?

The text was updated successfully, but these errors were encountered:

tridao · 2024-02-20T07:25:59Z

Yes, there should be ways to deal with variable length. It's not implemented yet however.

zigzagcai · 2024-02-21T02:41:24Z

Got it. Thank you Tri Dao!

zigzagcai · 2024-02-21T08:16:29Z

Yes, there should be ways to deal with variable length. It's not implemented yet however.

Sorry but I still have some confusions:

Is it theoretical possible for Mamba to provide variable-length API like Flash-Attention flash_attn_varlen_qkvpacked_func (Dao-AILab/flash-attention#432 (comment))?
Since in most cases, for the computing efficiency, we want to concatenating short samples to get one packed sample. We know that for Transformer-based models, we can use flash-attention API which provides cu_seqlens to process packed samples.

zigzagcai · 2024-02-21T10:30:38Z

From my understanding, since conv1d and parallel associative scan in the Mamba block are linear operations, hence in theory we can make Mamba block capable of processing packed sequence with the help of attention mask or cu_seqlens.
For example, we want Mamba block to processes (packed_sequence, hidden_size) instead of (batch_size, seq_length, hidden_size), as what flash attention does.

Not sure if my understanding is correct? Just curious whether it is possible to feed in one packed sequence as input (packed_sequence, hidden_size) into mamba block like what LSTM (here) or Transformer-block has been done.

zigzagcai · 2024-02-26T07:38:50Z

Just have another question, could Mamba be parallelized over seq_len dimension like what has been done in flash-attention?

tridao · 2024-02-26T07:45:41Z

It's theoretically possible to process variable lengths / packed sequences, but the implementation will be a bit tricky.
Parallelizing over seq_len dimension reduces to how one would parallelize associative scan (e..g with Blelloch scan).

albertfgu · 2024-02-26T15:46:57Z

In practice, depending on your setting, you may be able to simply concatenate the sequences and pass the whole sequence in (without enforcing state resetting at sequence boundaries). I've used this in the past where it has worked fine in some settings.

deroholic · 2024-02-26T15:51:52Z

In practice, depending on your setting, you may be able to simply concatenate the sequences and pass the whole sequence in (without enforcing state resetting at sequence boundaries). I've used this in the past where it has worked fine in some settings.

It is often done that way, but it does cause sample cross contamination during training and that is usually not desirable.

albertfgu · 2024-02-26T15:52:37Z

Yes. I'm just saying sometimes it's also fine :)

zigzagcai · 2024-03-04T08:37:57Z

In practice, depending on your setting, you may be able to simply concatenate the sequences and pass the whole sequence in (without enforcing state resetting at sequence boundaries). I've used this in the past where it has worked fine in some settings.

Hi @albertfgu @tridao , I just have another confusion about mamba. Does that mean selective SSM mechanism can learn the boundary patterns by delta, or we can reset the delta -> inf to manually specifying the sequence boundaries in a cumulative sequence input?
I see in the section 3.5.2 of Mamba paper and find below description:

zigzagcai · 2024-03-05T10:01:10Z

I also see one blog on together.ai and on cartesia.ai, where the next steps shows that variable length training are on the future roadmap.
It would be fantastic if mamba could provide such feature like transformer in the future!

zigzagcai · 2024-03-14T09:39:30Z

Update:
Mamba variable-length sequences has been supported in #244

zigzagcai mentioned this issue Mar 12, 2024

Attempt to support packed sequence or cu_seqlens #235

Closed

zigzagcai closed this as completed Mar 12, 2024

zigzagcai reopened this Mar 14, 2024

zigzagcai closed this as completed Mar 14, 2024

zigzagcai mentioned this issue Mar 14, 2024

[Feature] Support variable-length sequences for mamba block #244

Open

zigzagcai reopened this Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about does mamba support variable-length input or cu_seqlens like flash attention? #180

Question about does mamba support variable-length input or cu_seqlens like flash attention? #180

zigzagcai commented Feb 20, 2024 •

edited

Loading

tridao commented Feb 20, 2024

zigzagcai commented Feb 21, 2024

zigzagcai commented Feb 21, 2024 •

edited

Loading

zigzagcai commented Feb 21, 2024 •

edited

Loading

zigzagcai commented Feb 26, 2024 •

edited

Loading

tridao commented Feb 26, 2024

albertfgu commented Feb 26, 2024

deroholic commented Feb 26, 2024

albertfgu commented Feb 26, 2024

zigzagcai commented Mar 4, 2024 •

edited

Loading

zigzagcai commented Mar 5, 2024 •

edited

Loading

zigzagcai commented Mar 14, 2024 •

edited

Loading

Question about does mamba support variable-length input or cu_seqlens like flash attention? #180

Question about does mamba support variable-length input or cu_seqlens like flash attention? #180

Comments

zigzagcai commented Feb 20, 2024 • edited Loading

tridao commented Feb 20, 2024

zigzagcai commented Feb 21, 2024

zigzagcai commented Feb 21, 2024 • edited Loading

zigzagcai commented Feb 21, 2024 • edited Loading

zigzagcai commented Feb 26, 2024 • edited Loading

tridao commented Feb 26, 2024

albertfgu commented Feb 26, 2024

deroholic commented Feb 26, 2024

albertfgu commented Feb 26, 2024

zigzagcai commented Mar 4, 2024 • edited Loading

zigzagcai commented Mar 5, 2024 • edited Loading

zigzagcai commented Mar 14, 2024 • edited Loading

zigzagcai commented Feb 20, 2024 •

edited

Loading

zigzagcai commented Feb 21, 2024 •

edited

Loading

zigzagcai commented Feb 21, 2024 •

edited

Loading

zigzagcai commented Feb 26, 2024 •

edited

Loading

zigzagcai commented Mar 4, 2024 •

edited

Loading

zigzagcai commented Mar 5, 2024 •

edited

Loading

zigzagcai commented Mar 14, 2024 •

edited

Loading