Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: statistics and distribution for complex valued random variables #9064

Open
josef-pkt opened this issue Nov 15, 2023 · 2 comments
Open

Comments

@josef-pkt
Copy link
Member

josef-pkt commented Nov 15, 2023

related to #3528 but more general or different focus

We could add statistics and models for complex random variables with almost the usual coverage (mean, cov, params, ...).

I did not find much on standard statistics, standard errors, cov_params, asymptotic distribution for hypothesis tests.
However, complex normal distributions are defined through their bivariate representation of real and imaginary parts.
So, we can use all the standard statistics and hypothesis for real and imaginary parts.

If random variable is second order circular (alias proper), then this imposes restrictions on the covariance of the real and imaginary parts. That covariance has to be constructed specifically for those proper cases. (This case can also be directly computed with "standard" complex computation.)
If complex random variable is improper, then it corresponds to a bivariate normal distribution without restrictions on cov (except positive semidefinite)

what we need:

  • functions that work directly for proper complex random variables
  • functions, classes for bivariate random variables, e.g. MultivariateOLS
  • inference for both single hypothesis (on one real or imaginary part of parameters), e.g. use cov_params for [p.real, p.imag] and joint hypothesis for complex parameter (i.e. using joint distribution of real and complex parts)
  • wrapper function/class
    • internally convert to whichever representation is convenient
    • double results for both complex parameters and bivariate representation of parameters, methods to specify hypothesis on complex parameter and methods for hypothesis on [real, imag] parts of complex parameters.
  • tools:
    • conversion between different representations (complex, bivariate-real, extended complex)
    • descriptive statistics, e.g. cov, pcov
    • specific functions, e.g. test for second order circularity, Taylor estimate of complex scatter
  • specific models, OLSComplexCircular, OLSComplexImproper and MultivariateLS (multivariate complex endog) versions
    • models corresponding to nonlinear LS might be more frequently used than linear models
  • multivariate statistics (maybe eventually)
    • independent component analysis ICA seems to be popular
    • CCA, e.g. used for canonical correlation for x and x.conj()
    • hypothesis tests for patterned cov, (and pcov ?)
    • ... ?

definitions

  • circular: distribution is rotation invariant, i.e. assumption on full distribution, all moments (original definition)
  • second order circular (proper): assumption on second moments (pseudo-cov is zero), no assumption on higher moments

disadvantage
I guess this will not have a large user base. Signal processing literature does not have much traditional statistics (including inference on mean parameters), and econometrics in signal processing is more time series analysis and forecasting.

main references (not complete)

Adali, Tülay, Peter J. Schreier, and Louis L. Scharf. “Complex-Valued Signal Processing: The Proper Way to Deal With Impropriety.” IEEE Transactions on Signal Processing 59, no. 11 (November 2011): 5101–25. https://doi.org/10.1109/TSP.2011.2162954.

Ollila, E., and V. Koivunen. “Generalized Complex Elliptical Distributions.” In Processing Workshop Proceedings, 2004 Sensor Array and Multichannel Signal, 460–64, 2004. https://doi.org/10.1109/SAM.2004.1502990.

Ollila, Esa. “On the Circularity of a Complex Random Variable.” IEEE Signal Processing Letters 15 (2008): 841–44. https://doi.org/10.1109/LSP.2008.2005050.

Ollila, Esa, Jan Eriksson, and Visa Koivunen. “Complex Elliptically Symmetric Random Variables—Generation, Characterization, and Circularity Tests.” IEEE Transactions on Signal Processing 59, no. 1 (January 2011): 58–69. https://doi.org/10.1109/TSP.2010.2083655.

Ollila, Esa, Visa Koivunen, and H. Vincent Poor. “Complex-Valued Signal Processing — Essential Models, Tools and Statistics.” In 2011 Information Theory and Applications Workshop, 1–10, 2011. https://doi.org/10.1109/ITA.2011.5743596.

Ollila, Esa, David E. Tyler, Visa Koivunen, and H. Vincent Poor. “Complex Elliptically Symmetric Distributions: Survey, New Results and Applications.” IEEE Transactions on Signal Processing 60, no. 11 (November 2012): 5597–5625. https://doi.org/10.1109/TSP.2012.2212433.

Picinbono, B. “On Circularity.” IEEE Transactions on Signal Processing 42, no. 12 (December 1994): 3473–82. https://doi.org/10.1109/78.340781.

———. “Second-Order Complex Random Vectors and Normal Distributions.” IEEE Transactions on Signal Processing 44, no. 10 (October 1996): 2637–40. https://doi.org/10.1109/78.539051.

Picinbono, B., and P. Bondon. “Second-Order Statistics of Complex Signals.” IEEE Transactions on Signal Processing 45, no. 2 (February 1997): 411–20. https://doi.org/10.1109/78.554305.

@josef-pkt
Copy link
Member Author

josef-pkt commented Nov 15, 2023

finally, I found some literature on cov_params (I have not looked at details yet.)

The signal processing literature for cov_params is under the term "Cramer-Rao Bound"
i.e. cov_params for MLE (Fisher information matrix, expected OPG)

Fortunati et al 2016 includes sandwich form for misspecified likelihood and M-estimators but only for proper complex r.v.
Ollila et al 2008 includes RLB for improper complex r.v.
Fortunati 2017 also includes misspecified non-circular/improper models. section 5 example for misspecified circularity when estimating a simple/not-wide linear model.

Fortunati, Stefano. “Misspecified Cramér-Rao Bounds for Complex Unconstrained and Constrained Parameters.” In 2017 25th European Signal Processing Conference (EUSIPCO), 1644–48, 2017. https://doi.org/10.23919/EUSIPCO.2017.8081488.

Fortunati, Stefano, Fulvio Gini, and Maria S. Greco. “The Misspecified Cramer-Rao Bound and Its Application to Scatter Matrix Estimation in Complex Elliptically Symmetric Distributions.” IEEE Transactions on Signal Processing 64, no. 9 (May 2016): 2387–99. https://doi.org/10.1109/TSP.2016.2526961.

Ollila, Esa, Visa Koivunen, and Jan Eriksson. “On the Cramér-Rao Bound for the Constrained and Unconstrained Complex Parameters.” In 2008 5th IEEE Sensor Array and Multichannel Signal Processing Workshop, 414–18, 2008. https://doi.org/10.1109/SAM.2008.4606902.

@josef-pkt
Copy link
Member Author

josef-pkt commented Nov 16, 2023

detail for constrained cov estimation

Ollila et al 2012, theorem 6 explanation, proof relies on equivalence between real bivariate and complex representation, however it is for the second order circular case. (So I got confused)
data has length 2*nobs

It looks like they are stacking the bivariate real vector, z = x + j y, stacked is v = [[x, y], [-y, x]].
then estimate cov or scatter using Tyler's scatter estimate.
Small differences to the empirical covariance (estimated from complex, which ignores pseudo-cov)

y2 = np.vstack((y, np.column_stack((- y[:, -2:], y[:, :2]))))
y2.mean(0)
array([-0.00071712, -0.00750972, -0.00359257,  0.00438998])

# implied cov and pcov
c2 = np.cov(y2.T, ddof=0)
cov_from_rvec(c2)
(array([[1.97897467+0.j        , 0.00516927-0.00691401j],
        [0.00516927+0.00691401j, 2.00438595+0.j        ]]),
 array([[ 1.23923196e-05+0.j        , -2.11567142e-05-0.00691401j],
        [-2.11567142e-05+0.00691401j, -3.71239976e-05+0.j        ]]))

# sample estimates from complex variables
covx, pcovx
(array([[1.97896125+0.j        , 0.00517966-0.00694414j],
        [0.00517966+0.00694414j, 2.00431028+0.j        ]]),
 array([[0.00404063-0.00518913j, 0.00051496+0.00378722j],
        [0.00051496+0.00378722j, 0.01313544+0.00295734j]]))

np.cov(x, rowvar=False, ddof=0)
array([[1.97896125+0.j        , 0.00517966-0.00694414j],
       [0.00517966+0.00694414j, 2.00431028+0.j        ]])

nobs is 50000, so there might also be precision issues when computing statistics in different ways

This vertical stacking looks like an interesting useful tool to impose equal cov across sub-matrices.

Ollila et al only look at Tyler's or M estimation of scattermatrix for second order circular case.
However,

  • if we use the original bivariate real data cov([x, y]), then we should get the M-estimator for the unrestricted cov_real and from it the implied M-estimator for cov and pcov of complex variable
  • we can compare this to separately computing M-estimator of cov and pcov from the complex variables.

extending this to non-gaussian scatter matrix requires Tyler/M-estimation in PR #8129
Ollila et al have weight functions for Tyler's M-estimator for several elliptically circular distributions. The PR only includes weight function for t-distribution, AFAIR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant