Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The value in ComplexMultiply_backward function #6

Closed
kaiyuyue opened this issue Feb 11, 2018 · 8 comments
Closed

The value in ComplexMultiply_backward function #6

kaiyuyue opened this issue Feb 11, 2018 · 8 comments

Comments

@kaiyuyue
Copy link

kaiyuyue commented Feb 11, 2018

Hi @gdlg, thanks for this nice work. I'm confused about the backward procedure of complex multiplication. So I hope you can help me to figure it out.

In forward,

Z = XY = (Rx + i * Ix)(Ry + i * Iy) = (RxRy - IxIy) + i * (IxRy + RxIy) = Rz + i * Iz

In backward, according the chain rule, it will has

grad_(L/X) = grad_(L/Z) * grad(Z/X)
           = grad_Z * Y
           = (R_gz + i * I_gz)(Ry + i * Iy)
           = (R_gzRy - I_gzIy) + i * (I_gzRy + R_gzIy)

So, why is this line implemented by using the value = 1 for real part and value = -1 for image part?

Is there something wrong in my thoughts? Thanks.

@gdlg
Copy link
Owner

gdlg commented Feb 11, 2018

Hi @kaiyuyue , thank you for diving into the code and checking its correctness.

I believe that the code is actually correct. In your calculation, you are assuming that the differentiation of a holomorphic function works in the same way as for real-valued function however the correct way is not as straightforward.

To get to the correct result, you need to either use Wirtinger derivatives (https://en.wikipedia.org/wiki/Wirtinger_derivatives#Chain_rule) or treat all the functions as function of 2 real-valued variables instead of 1 complex one. In the second case, the main difference is that you have to differentiate w.r.t to Re(x) and Im(x) separately.

Here is a not-so-short proof using Wirtinger derivatives:
If we write L = f(z) and z = g(x,y) = x * y

The Wirtinger derivatives of g w.r.t x are:
My Formula

We can compute dL/dRe(x) and dL/dIm(x):

My Formula

Then substituting and regrouping:
My Formula

Thus grad_X_re = torch.addcmul(grad_Z_re * Y_re, 1, grad_Z_im, Y_im)

Similarly:
My Formula

Thus grad_X_im = torch.addcmul(grad_Z_im * Y_re, -1, grad_Z_re, Y_im)

Assuming that the calculus above is correct, the code should be as well. I will add a gradcheck test to prove the correctness of the multiplication.

This backward function is actually not used in the computation of the compact bilinear pooling gradient because I tried to rearrange the computation of the multiplication and FFT gradients to decrease the memory footprint however the formula is the same.

@kaiyuyue
Copy link
Author

Very inspired. Thanks for correcting my errors.

@kaiyuyue kaiyuyue reopened this Feb 12, 2018
@kaiyuyue
Copy link
Author

kaiyuyue commented Feb 12, 2018

The backward function actually is not used means the memory used of the part is the same as that of this way?

import pytorch_fft.fft.autograd as afft

sketch_x = CountSketchFn_forward(h1, s1, output_size, x)
sketch_y = CountSketchFn_forward(h2, s2, output_size, y)

x_re, x_im = afft.Fft()(sketch_x, sketch_x.new(*sketch_x.size()).zero_())
y_re, y_im = afft.Fft()(sketch_y, sketch_y.new(*sketch_y.size()).zero_())

prod_re, prod_im = x_re.mul(y_re), x_im.mul(y_im)

@gdlg
Copy link
Owner

gdlg commented Feb 12, 2018

I am not entirely sure what you mean.
In the current implementation, I don’t use the backward function ComplexMultiply_backward however I have interleaved the FFTs and part of the complex multiply here and there to compute grad_x then grad_y and avoid having a bench of temporaries.
I have found that it is difficult to really benchmark the use of memory in PyTorch so I am not really sure how much memory was saved.

Regarding your snippet,
x_re, x_im = afft.Fft()(sketch_x, sketch_x.new(*sketch_x.size()).zero_())
It is more efficient to use Rfft which exploits the symmetry of the FFT of a real-valued function and return an array half the size of the output of Fft. This saves both memory and computations.

prod_re, prod_im = x_re.mul(y_re), x_im.mul(y_im)
This is not the true complex multiplication but I am not really sure that it matters in the implementation of compact bilinear pooling.

@kaiyuyue
Copy link
Author

kaiyuyue commented Feb 22, 2018

machi 2018-02-23 00-31-46

Here is the original TS algorithm used in paper of Compact Bilinear Pooling. The last procedure said that doing the element-wise multiplication between two FFT outputs. So does it mean the true complex multiplication or in this way prod_re, prod_im = x_re.mul(y_re), x_im.mul(y_im)?

@gdlg
Copy link
Owner

gdlg commented Feb 22, 2018

Thanks @kaiyuyue . I didn’t spot that in the paper. I will correct the implementation to match the original algorithm.

@gdlg
Copy link
Owner

gdlg commented Feb 22, 2018

Actually I think that the implementation is correct.

The where the ◦ denotes element-wise multiplication. denotes the fact the multiplication should be element-wise with respect to the array however it is still a complex multiplication.

This multiplication stems from 3.2 Moreover, one can show that Ψ(x ⊗ y, h, s) = Ψ(x, h, s) ∗ Ψ(y, h, s), i.e. the count sketch of two vectors’ outer product is the convolution of individual’s count sketch

The Caffe implementation by Gao et al. is also using a complex multiplication.

@kaiyuyue
Copy link
Author

Oh! My bad. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants