The value in ComplexMultiply_backward function #6

kaiyuyue · 2018-02-11T16:36:22Z

Hi @gdlg, thanks for this nice work. I'm confused about the backward procedure of complex multiplication. So I hope you can help me to figure it out.

In forward,

Z = XY = (Rx + i * Ix)(Ry + i * Iy) = (RxRy - IxIy) + i * (IxRy + RxIy) = Rz + i * Iz

In backward, according the chain rule, it will has

grad_(L/X) = grad_(L/Z) * grad(Z/X)
           = grad_Z * Y
           = (R_gz + i * I_gz)(Ry + i * Iy)
           = (R_gzRy - I_gzIy) + i * (I_gzRy + R_gzIy)

So, why is this line implemented by using the value = 1 for real part and value = -1 for image part?

Is there something wrong in my thoughts? Thanks.

The text was updated successfully, but these errors were encountered:

gdlg · 2018-02-11T23:28:08Z

Hi @kaiyuyue , thank you for diving into the code and checking its correctness.

I believe that the code is actually correct. In your calculation, you are assuming that the differentiation of a holomorphic function works in the same way as for real-valued function however the correct way is not as straightforward.

To get to the correct result, you need to either use Wirtinger derivatives (https://en.wikipedia.org/wiki/Wirtinger_derivatives#Chain_rule) or treat all the functions as function of 2 real-valued variables instead of 1 complex one. In the second case, the main difference is that you have to differentiate w.r.t to Re(x) and Im(x) separately.

Here is a not-so-short proof using Wirtinger derivatives:
If we write L = f(z) and z = g(x,y) = x * y

The Wirtinger derivatives of g w.r.t x are:

We can compute dL/dRe(x) and dL/dIm(x):

Then substituting and regrouping:

Thus grad_X_re = torch.addcmul(grad_Z_re * Y_re, 1, grad_Z_im, Y_im)

Similarly:

Thus grad_X_im = torch.addcmul(grad_Z_im * Y_re, -1, grad_Z_re, Y_im)

Assuming that the calculus above is correct, the code should be as well. I will add a gradcheck test to prove the correctness of the multiplication.

This backward function is actually not used in the computation of the compact bilinear pooling gradient because I tried to rearrange the computation of the multiplication and FFT gradients to decrease the memory footprint however the formula is the same.

kaiyuyue · 2018-02-12T00:40:07Z

Very inspired. Thanks for correcting my errors.

kaiyuyue · 2018-02-12T00:57:34Z

The backward function actually is not used means the memory used of the part is the same as that of this way?

import pytorch_fft.fft.autograd as afft

sketch_x = CountSketchFn_forward(h1, s1, output_size, x)
sketch_y = CountSketchFn_forward(h2, s2, output_size, y)

x_re, x_im = afft.Fft()(sketch_x, sketch_x.new(*sketch_x.size()).zero_())
y_re, y_im = afft.Fft()(sketch_y, sketch_y.new(*sketch_y.size()).zero_())

prod_re, prod_im = x_re.mul(y_re), x_im.mul(y_im)

gdlg · 2018-02-12T18:26:12Z

I am not entirely sure what you mean.
In the current implementation, I don’t use the backward function ComplexMultiply_backward however I have interleaved the FFTs and part of the complex multiply here and there to compute grad_x then grad_y and avoid having a bench of temporaries.
I have found that it is difficult to really benchmark the use of memory in PyTorch so I am not really sure how much memory was saved.

Regarding your snippet,
x_re, x_im = afft.Fft()(sketch_x, sketch_x.new(*sketch_x.size()).zero_())
It is more efficient to use Rfft which exploits the symmetry of the FFT of a real-valued function and return an array half the size of the output of Fft. This saves both memory and computations.

prod_re, prod_im = x_re.mul(y_re), x_im.mul(y_im)
This is not the true complex multiplication but I am not really sure that it matters in the implementation of compact bilinear pooling.

kaiyuyue · 2018-02-22T16:40:24Z

Here is the original TS algorithm used in paper of Compact Bilinear Pooling. The last procedure said that doing the element-wise multiplication between two FFT outputs. So does it mean the true complex multiplication or in this way prod_re, prod_im = x_re.mul(y_re), x_im.mul(y_im)?

gdlg · 2018-02-22T17:05:19Z

Thanks @kaiyuyue . I didn’t spot that in the paper. I will correct the implementation to match the original algorithm.

gdlg · 2018-02-22T18:27:36Z

Actually I think that the implementation is correct.

The where the ◦ denotes element-wise multiplication. denotes the fact the multiplication should be element-wise with respect to the array however it is still a complex multiplication.

This multiplication stems from 3.2 Moreover, one can show that Ψ(x ⊗ y, h, s) = Ψ(x, h, s) ∗ Ψ(y, h, s), i.e. the count sketch of two vectors’ outer product is the convolution of individual’s count sketch

The Caffe implementation by Gao et al. is also using a complex multiplication.

kaiyuyue · 2018-02-26T10:29:43Z

Oh! My bad. Thanks.

kaiyuyue closed this as completed Feb 12, 2018

kaiyuyue reopened this Feb 12, 2018

kaiyuyue closed this as completed Mar 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The value in ComplexMultiply_backward function #6

The value in ComplexMultiply_backward function #6

kaiyuyue commented Feb 11, 2018 •

edited

gdlg commented Feb 11, 2018

kaiyuyue commented Feb 12, 2018

kaiyuyue commented Feb 12, 2018 •

edited

gdlg commented Feb 12, 2018

kaiyuyue commented Feb 22, 2018 •

edited

gdlg commented Feb 22, 2018

gdlg commented Feb 22, 2018

kaiyuyue commented Feb 26, 2018

The value in ComplexMultiply_backward function #6

The value in ComplexMultiply_backward function #6

Comments

kaiyuyue commented Feb 11, 2018 • edited

gdlg commented Feb 11, 2018

kaiyuyue commented Feb 12, 2018

kaiyuyue commented Feb 12, 2018 • edited

gdlg commented Feb 12, 2018

kaiyuyue commented Feb 22, 2018 • edited

gdlg commented Feb 22, 2018

gdlg commented Feb 22, 2018

kaiyuyue commented Feb 26, 2018

kaiyuyue commented Feb 11, 2018 •

edited

kaiyuyue commented Feb 12, 2018 •

edited

kaiyuyue commented Feb 22, 2018 •

edited