Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANM, lack of Gamma in "Gamma HSIC" #58

Open
ArnoVel opened this issue Jan 13, 2020 · 4 comments
Open

ANM, lack of Gamma in "Gamma HSIC" #58

ArnoVel opened this issue Jan 13, 2020 · 4 comments

Comments

@ArnoVel
Copy link

ArnoVel commented Jan 13, 2020

Hi,
This might simply be a conceptual problem, or a lack of knowledge on my part.
Usually, using HSIC to compare two ANM candidates can be done by comparing the statistics directly, or by computing the related p-value.
However, to compute a p-value one needs to have some notion of the HSIC distribution under the null.
The classic paper from Gretton et al. proposes a Gamma Approximation by giving specific plug-in values for the two Gamma parameters in terms of the expectation and variance of the HSIC;
If I had to compute the p-value myself, I would use the above approximation for the gamma distribution, and then use the gamma CDF parametrized by the above values.

I am aware there might be other ways to do such a thing, however your snipper in the anm method does not seem to compute p-values, but only test statistics.
While this might be wrong, the variable names as well as the description of the method suggests this.

Am I wrong? Right? If either, how so?

Thanks for any additionnal information on this topic,
I would ideally like to design a test which detects whenever a model satisfies an ANM with low Type I and II error.

@ArnoVel
Copy link
Author

ArnoVel commented Jan 22, 2020

For future reference: this test essentially compares the test statistics m*HSIC_b, it is called in this way not because the Gamma approximation is used, but because the gamma approximation is used on the same quantity (m*HSIC_b) in the reference paper.

@diviyank
Copy link
Collaborator

Hi,
You are correct: Only the test statistic is computed, and not the p-value. (ref: authors' code here: http://web.math.ku.dk/~peters/code.html). We might want to include the p-value computation, at least for information for users.

Feel free to make a pull request ; it might take some time before I could look into it.
Best regards,
Diviyan

@ArnoVel
Copy link
Author

ArnoVel commented Jan 31, 2020

Hi,
I am a little bit busy atm, however I can point to two possible sources for an easy implementation:

  • a python copy of the original Gretton et al. matlab code, this uses numpy and vectorises on cpu only.
  • my pytorch (gpu compatible) update this however resorts to scipy for the inverse cdf, so while most of the computations can be performed on gpu, there's a limitation there. Also I have a nonstandard way to specify kernels, but that can be changed easily!

@diviyank
Copy link
Collaborator

Hi,
Thanks ! I'll look into it
Best regards,
Diviyan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants