Speed difference between two algorithm 'blocks' and 'lstm' #1

toco2270853 · 2021-01-23T05:01:26Z

Hello, thanks for the good code.
I find that while running CIFAR10 Experiments, there is about 40x speed difference between 'blocks' and 'lstm'. Is it normal?

sarthmit · 2021-01-23T05:41:54Z

Thanks for your interest in the paper!

I am not sure what is the margin for speed decrease but RIMs and BRIMs are in general considerably slower than LSTM. This is because of the following issues -

We do a sparsification of the RNN matrices after each gradient update to make sure that the modules are independent

BRIMs/CIFAR10/rnn_models.py

Line 73 in f8af67e

self.model.blockify_params()
There is a top-k selection criteria that is quite a bottleneck in speed

BRIMs/CIFAR10/blocks_core_all.py

Line 136 in f8af67e

bottomk_indices = torch.topk(iatt[:,:,0], dim=1,

I do encourage to check the following implementation of RIMs out (https://github.com/dido1998/Recurrent-Independent-Mechanisms) that attempts at speeding up RIM computations by solving the issue illustrated in (1.) by using groupRNN.

toco2270853 changed the title ~~Speed difference two algorithm 'blocks' and 'lstm'~~ Speed difference between two algorithm 'blocks' and 'lstm' Jan 23, 2021

sarthmit closed this as completed Jan 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed difference between two algorithm 'blocks' and 'lstm' #1

Speed difference between two algorithm 'blocks' and 'lstm' #1

toco2270853 commented Jan 23, 2021

sarthmit commented Jan 23, 2021 •

edited

Speed difference between two algorithm 'blocks' and 'lstm' #1

Speed difference between two algorithm 'blocks' and 'lstm' #1

Comments

toco2270853 commented Jan 23, 2021

sarthmit commented Jan 23, 2021 • edited

sarthmit commented Jan 23, 2021 •

edited