-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Registering (forward/backward) hooks on intermediate Mamba layers #59
Comments
In the fused fast implementations, the inner modules might not be getting called directly. Instead, we use their weights (e.g. |
I tried setting the fast path variable of the layers to false but I still couldn't get it to work. Is it even possible in the pretained versions? |
It should be. Pretrained versions just give a set of weights which are the same no matter the computation path.
You should double check that it's running the path you intend and that the modules are being called directly. For example, you want to hit the following line to call the conv module, instead of using the fast https://github.com/state-spaces/mamba/blob/main/mamba_ssm/modules/mamba_simple.py#L169 |
Thank you I hadn't seen that logic there. Because it does not seem feasible to make sure that causal-conv1d is not installed, I will probably have to do some hacky work around to go around that |
It should be fairly easy to add another flag to the init function and change the line
to something like
|
Yes, that's what I was thinking |
Hi! I also face the same issue about register_forward_hook to get the input and output of linear layers in Mamba blocks. Did you solve the problem? |
I'm trying to look at the internal activations of a MambaLMHeadModel (eg. the one you get when you load pretrained). If I look through the modules, I can find all the layers, and each of the modules of the Mamba block.
If I register a forward hook to "backbone.layers.0.mixer" (or any other layer) I can get an activation every forward pass. On the other hand, if I hook on "backbone.layers.0.mixer.in_proj" or "backbone.layers.0.mixer.conv1d", they don't get called during the forward pass of the model.
Is this the expected behaviour?
The text was updated successfully, but these errors were encountered: