Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Registering (forward/backward) hooks on intermediate Mamba layers #59

Closed
SrGonao opened this issue Dec 15, 2023 · 7 comments
Closed

Registering (forward/backward) hooks on intermediate Mamba layers #59

SrGonao opened this issue Dec 15, 2023 · 7 comments

Comments

@SrGonao
Copy link

SrGonao commented Dec 15, 2023

I'm trying to look at the internal activations of a MambaLMHeadModel (eg. the one you get when you load pretrained). If I look through the modules, I can find all the layers, and each of the modules of the Mamba block.

If I register a forward hook to "backbone.layers.0.mixer" (or any other layer) I can get an activation every forward pass. On the other hand, if I hook on "backbone.layers.0.mixer.in_proj" or "backbone.layers.0.mixer.conv1d", they don't get called during the forward pass of the model.

Is this the expected behaviour?

@albertfgu
Copy link
Contributor

In the fused fast implementations, the inner modules might not be getting called directly. Instead, we use their weights (e.g. backbone.layers.0.mixer.conv1d.weight) and pass it into a different function (e.g. from the causal-conv1d package).

@SrGonao
Copy link
Author

SrGonao commented Dec 15, 2023

I tried setting the fast path variable of the layers to false but I still couldn't get it to work. Is it even possible in the pretained versions?

@albertfgu
Copy link
Contributor

Is it even possible in the pretained versions?

It should be. Pretrained versions just give a set of weights which are the same no matter the computation path.

I tried setting the fast path variable of the layers to false but I still couldn't get it to work.

You should double check that it's running the path you intend and that the modules are being called directly. For example, you want to hit the following line to call the conv module, instead of using the fast causal_conv1d_fn. Perhaps it's not hitting this path if you've already pip installed the module because it will automatically use the fast conv1d function.

https://github.com/state-spaces/mamba/blob/main/mamba_ssm/modules/mamba_simple.py#L169

@SrGonao
Copy link
Author

SrGonao commented Dec 16, 2023

Thank you I hadn't seen that logic there. Because it does not seem feasible to make sure that causal-conv1d is not installed, I will probably have to do some hacky work around to go around that

@albertfgu
Copy link
Contributor

It should be fairly easy to add another flag to the init function and change the line

            if causal_conv1d_fn is None:

to something like

            if causal_conv1d_fn is None or not self.use_fast_conv:

@SrGonao
Copy link
Author

SrGonao commented Dec 17, 2023

Yes, that's what I was thinking

@SrGonao SrGonao closed this as completed Dec 17, 2023
@zjq0455
Copy link

zjq0455 commented Nov 2, 2024

I'm trying to look at the internal activations of a MambaLMHeadModel (eg. the one you get when you load pretrained). If I look through the modules, I can find all the layers, and each of the modules of the Mamba block.

If I register a forward hook to "backbone.layers.0.mixer" (or any other layer) I can get an activation every forward pass. On the other hand, if I hook on "backbone.layers.0.mixer.in_proj" or "backbone.layers.0.mixer.conv1d", they don't get called during the forward pass of the model.

Is this the expected behaviour?

Hi! I also face the same issue about register_forward_hook to get the input and output of linear layers in Mamba blocks. Did you solve the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants