Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow different numbers of points for P and Q in MMD/HSIC #1918

Closed
karlnapf opened this issue Mar 1, 2014 · 8 comments
Closed

Allow different numbers of points for P and Q in MMD/HSIC #1918

karlnapf opened this issue Mar 1, 2014 · 8 comments

Comments

@karlnapf
Copy link
Member

karlnapf commented Mar 1, 2014

and in general! This requires to change some minor bits of the implementation.
CQuadraticTimeMMD CHSIC CLinearTimeMMD (last one requires some thought as the expression is not in the reference papers)

@lambday
Copy link
Member

lambday commented Mar 5, 2014

@karlnapf for QuadraticTimeMMD should the statistic returned still be m*MMD when m and n are different?

@karlnapf
Copy link
Member Author

karlnapf commented Mar 6, 2014

Good point! I don't really know to be honest. Have to talk to @sejdino on that.

@sejdino
Copy link

sejdino commented Mar 6, 2014

Hey @lambday, in the m!=n case, it ends up being (m+n)*MMD that has a nice distribution, so this is what we should be returning. Have a look at this paper: eq (3) is the unbiased MMD, (5) is biased (ignore the square root though--we are always working with the squared MMD) and eq (10) gives the asymptotic distribution. Conveniently, the permutation test will work in exactly the same way: merge everything, split randomly into m and n, recompute. For spectral approximation, you will need to take into account ratios rho_x and rho_y like in eq. (10). Gamma approximation might be messy, so I think we should do that only for m=n for now.

BTW, I would hold off changing anything in CLinearTimeMMD (and in B-test) for now - since it's not clear how best to reconcile m!=n with the streaming facility - we will need to think about this a bit.

Also, no need to do anything in HSIC - independence testing doesn't make sense with different numbers of samples.

@lambday
Copy link
Member

lambday commented Mar 6, 2014

Thanks @sejdino. Going through the paper to make sure I get these things right.

@lambday
Copy link
Member

lambday commented Mar 16, 2014

@karlnapf @sejdino Hi, I wanted to work on this and gathered some confusion regarding the spectral approximation when m != n for quadratic time MMD. The current implementation uses the formula in this paper for approximating m*MMD_b^2 as this 2 * sum_l [1/(2m) * nu_l * z_l^2] where nu_l are the eigenvalues of centered gram matrix. I thought of using eq. 10 from A kernel two-sample test paper using rho_x and rho_y, but I am not sure what should I use as an estimator of the eigenvalues of the equation when m and n are different. Would that be 1/(m+n) times eigenvalues of the centered matrix?

@sejdino
Copy link

sejdino commented Mar 16, 2014

Precisely, these are 1/(m+n) times the eigenvalues of the centred kernel matrix on the merged samples. These can then just be plugged into the eq. (10) from "A kernel two-sample test" with rho_x=m/(m+n) and rho_y=n/(m+n).

@lambday
Copy link
Member

lambday commented Mar 16, 2014

@sejdino thanks a lot. I am hoping to finish this by tomorrow :)

@karlnapf
Copy link
Member Author

And done!
@lambday You will have a long list of resolved issues for your proposal ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants