Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complex Backprop and Learning speed #2

Closed
pfeatherstone opened this issue Apr 27, 2020 · 7 comments
Closed

Complex Backprop and Learning speed #2

pfeatherstone opened this issue Apr 27, 2020 · 7 comments
Labels
question Further information is requested

Comments

@pfeatherstone
Copy link

Question: In the real domain, you only require differentiability for back-propagation to work. In the complex domain, you need holomorphism. Now pytorch doesn't check this because it doesn't natively support complex numbers. Do you think there could be a learning/training problem with back-propagation if some of the functions don't support Cauchy-Riemann equations?

Question: In complex analysis, a function f(z) has two derivatives: df/dz and df/dz*. If the forward passes are implemented correctly, as you have done, is back-propagation well defined? Specifically, do you get both derivatives being back-propagated?

@ivannz ivannz added the question Further information is requested label May 1, 2020
@ivannz
Copy link
Owner

ivannz commented May 1, 2020

Differentiability in the complex domain requires that a C to C function satisfy Cauchy-Riemann conditions, which imply that a holomorphic C to R function is constant. Therefore there cannot be an end-to-end holomorphic learning problem, since we ultimately use some sort of non-constant real-valued fitness function L, e.g. empirical/theoretical risk, negative log-likelihood, which makes L o f non-holomorphic.

To be able to learn complex-valued networks we have to forgo the C-differentiability requirement. This can be done by using C-R (Wirtinger) calculus, which extends holomorphic calculus to non-holomorphic functions by considering C-domain functions as defined on R^2 with independent z and conj(z). It still works correctly and gives correct derivatives in the holomorphic case. Since CR calculus treats complex parameter and its conjugate as independent variables, it is possible to equivalently parameterize the function f(z) = F(z, conj(z)) as f(z) = f(a + j b) = H(a, b) -- a R^2 to C function. Thus the complex-valued arithmetic is emulated within double-real network through skew-symmetry constraint on linear operations. Via CR calculus we get df/dz = 1/2(dH/da - j dH/db) and df/d conj(z) = 1/2(dH/da + j dH/db) for f C to C. The gradient of real-valued loss f w.r.t. z, i.e. " the direction of maximum increase" is df/d conj(z), not df/dz, which in the case of R-valued loss is corect: conj(df/dz) = d conj(f)/d conj(z) = df/d conj(z), which autodiff of pytorch effectively computes. see Hunger (2007).

Other references include Trabelsi et al. (2018), Zhang et al. (2014), Adali et al. (2011), Benvenuto and Piazza (1992), appendix D of Nazarov and Burnaev (2020) or a discussion here.

@ivannz ivannz pinned this issue May 2, 2020
@ivannz ivannz unpinned this issue May 2, 2020
@pfeatherstone
Copy link
Author

@ivannz Thank you for the detailed answer.

@pfeatherstone
Copy link
Author

Might need to brush up on some complex analysis

@pfeatherstone
Copy link
Author

@ivannz I thought i might carry on asking questions here rather than opening seperate issues. Hence why i've edited the title.

I've ported a mobilenetv2 network from torchvision to use modules in this repo, namely: CplxConv1d, CplxBatchNorm1d and CplxAdaptiveModReLU. I'm training this network to classify some complex valued 1D data. But it's very very slow to learn. Have you noticed the same thing when applying these modules to your datasets? I've only tried this in one domain, so hard to tell if its the nature of the modules or if it's the problem domain.

@pfeatherstone pfeatherstone reopened this May 12, 2020
@pfeatherstone pfeatherstone changed the title Holomorphism and back-propagation Learning speed May 12, 2020
@ivannz
Copy link
Owner

ivannz commented May 13, 2020

I have used this module for both classification and regression tasks:

  • pure complex data in a regression task of high frequency radio signal
  • complex Fourier transform of waveform data for annotation (multioutput classification)
  • image classification experiments on MNIST-like and CIFAR10 datasets

If by learning speed you mean its overall arithmetic complexity, then yes -- a complex network uses 4 times as many multiplications as a real-valued network with the same number of intermediate features, i.e. the number of linear outputs or convolutional channels and not overall parameters. Even if one compares a complex network to a real one with the same number of floating-point parameters, then a complex network is still slower, but not dramatically. Please, also bear in mind the discussion in issue #1 -- cplxmodule is a pure python extension for pytorch and, although it offloads all computation to torch, there is still some insignificant overhead.

If by learning speed you mean the convergence rate of gradient descent or the rate of train loss decrease, I haven't measured it per se, but at the same time haven't noticed anything suggesting, that complex nets are slower to learn. The test performance depends on the dataset, but in my experience complex-valued networks seldom outperformed real-valued networks, mostly on par. See Nazarov and Burnaev (2020) and references therein.

PS: I'd rather you created another issue to keep unrelated discussions separate.

@pfeatherstone
Copy link
Author

@ivannz Again thank you for your response and I will make sure to open new issues for separate discussions.
Regarding classification, did you use the RadioML dataset? If so I found that the modulation labels are wrong. If not, did you use another publicly available dataset?

@ivannz
Copy link
Owner

ivannz commented May 13, 2020

I used a private dataset for a digital predistortion task, i.e. approximating a perturbation of an input signal, so that a power amplifier would operate in linear gain regime.

Repository owner locked as resolved and limited conversation to collaborators May 25, 2020
@ivannz ivannz closed this as completed May 25, 2020
@ivannz ivannz pinned this issue Jun 8, 2020
@ivannz ivannz changed the title Learning speed Complex Backprop and Learning speed Jun 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants