Skip to content
This repository has been archived by the owner on Oct 30, 2023. It is now read-only.

Question #15

Closed
tchaton opened this issue Jan 5, 2019 · 5 comments
Closed

Question #15

tchaton opened this issue Jan 5, 2019 · 5 comments

Comments

@tchaton
Copy link

tchaton commented Jan 5, 2019

Hello guys,

Very nice piece of work.
I was wondering why you didn't use a
einsum implementation of the bilinear attention in order to speed up training.
image
This equation is perfect for it. U should have a significant gain, and it would be nice for once to have highly optimized code available on github.

Best,
T.C

@tchaton
Copy link
Author

tchaton commented Jan 5, 2019

Hello there,

I have implemented the module with einsum in order to optimize the code.
Please tell me if you think it is correct ? Tensor-wise it works, however, the output values are huge, which suggest something wrong.

image

@tchaton
Copy link
Author

tchaton commented Jan 5, 2019

Hello guys,

I have corrected a bit the code.
image

image

However, I had to normalize by the max for softmax not to saturate.

Could you please correct this version and also tell me how you solve the softmax issue ?

Best,
T.C

@tchaton tchaton mentioned this issue Jan 9, 2019
@linjieli222
Copy link

Softmax is applied to both dimensions in attention.py, not only the columns (dim = 1).

@tchaton
Copy link
Author

tchaton commented Feb 4, 2019

Ok, I will change that :)

https://github.com/Zhaoyi-Yan/Shift-Net_pytorch/blob/sobel/models/bilinear_shift_net/innerBilinearShiftTripleModule.py#L68

I have implemented the module there. Could you please have a quick look to tell me if you think it is correct ?
Using einsum, it is very fast.

Best,
T.C

@jnhwkim
Copy link
Owner

jnhwkim commented Mar 12, 2019

@tchaton in our implementation, the computational speed is not much different from the previous. I doubt that the other factors, missing nonlinear activations, regularization, and the other implementational details, may be related. However, we found that memory consumption is hugely reduced (around 30%) due to its efficiency in dealing with computational temporaries. Please refer to our implementation in #23. Thank you for the heads-up.

@jnhwkim jnhwkim closed this as completed Mar 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants