Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed difference between two algorithm 'blocks' and 'lstm' #1

Closed
toco2270853 opened this issue Jan 23, 2021 · 1 comment
Closed

Speed difference between two algorithm 'blocks' and 'lstm' #1

toco2270853 opened this issue Jan 23, 2021 · 1 comment

Comments

@toco2270853
Copy link

Hello, thanks for the good code.
I find that while running CIFAR10 Experiments, there is about 40x speed difference between 'blocks' and 'lstm'. Is it normal?

@toco2270853 toco2270853 changed the title Speed difference two algorithm 'blocks' and 'lstm' Speed difference between two algorithm 'blocks' and 'lstm' Jan 23, 2021
@sarthmit
Copy link
Owner

sarthmit commented Jan 23, 2021

Thanks for your interest in the paper!

I am not sure what is the margin for speed decrease but RIMs and BRIMs are in general considerably slower than LSTM. This is because of the following issues -

  1. We do a sparsification of the RNN matrices after each gradient update to make sure that the modules are independent
    self.model.blockify_params()
  2. There is a top-k selection criteria that is quite a bottleneck in speed
    bottomk_indices = torch.topk(iatt[:,:,0], dim=1,

I do encourage to check the following implementation of RIMs out (https://github.com/dido1998/Recurrent-Independent-Mechanisms) that attempts at speeding up RIM computations by solving the issue illustrated in (1.) by using groupRNN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants