MMD two sample tests with 1 instance in one sample #4631

amir-rahnama · 2019-05-12T08:19:01Z

I noticed that when I run a Quadratic time MMD two sample tests, in a case on the sample has a single instance, the samples will all return nan. Now, this could be due to the assumption in MMD itself, but I was wondering if I still can interpret mmd.perform_test(alpha) still or not?

The text was updated successfully, but these errors were encountered:

karlnapf · 2019-05-12T12:47:18Z

Thanks for reporting! I think this should raise an assertion error, @lambday ?

@ambodi it won't return anything sensible. The test statistic might be OK (depending on whether or not you use biased/unbiased), but the p-value will be nonsense (permutation test makes no sense with one sample, the other approximations also will fail). Just curious, why are you interested in this case?

amir-rahnama · 2019-05-13T14:08:50Z

@karlnapf it does not raise an error but returns nans.

Great question, I have this case: I have a function that outputs one vector and another function that outputs 5000 vectors. I wanted to evaluate the divergence between these two functions. Now I can definitely run the functions over a range of values, let's say 700, but then the problem is that shogun memory overflows for 5000*700 samples.

Any suggestions?

lambday · 2019-05-14T03:12:21Z

@karlnapf yeah maybe there should be an assertion, but we've left it up to the user. For the unbiased case we should at least have that check.

@ambodi The NaN values come because of how we ~~scale the MMD^2 estimates for performing the test~~ compute the MMD^2 estimates in the unbiased case (see http://shogun.ml/api/latest/classshogun_1_1CMMD.html). It's the (n-1) term in the denominator which is messing up. The unbiased one is the default - if you want to change it, you can use set_statistic_type().

It's interesting that shogun memory overflows for 5000x700 samples. What's the dimension? Can you post your code somewhere so that we can try it out?

[EDIT: fixed incorrect statement about scaling]

karlnapf · 2019-05-14T12:46:05Z

@karlnapf it does not raise an error but returns nans.

Great question, I have this case: I have a function that outputs one vector and another function that outputs 5000 vectors. I wanted to evaluate the divergence between these two functions. Now I can definitely run the functions over a range of values, let's say 700, but then the problem is that shogun memory overflows for 5000*700 samples.

Any suggestions?

statistically, this problem seems pretty much impossible to solve with a test. You might want to try an outlier detection algorithm, like a one-class SVM, and train it on the 5000 vectors and then apply to the single vector to see whether it is similar-enough or not

stale · 2020-02-26T15:53:19Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

bhavukkalra · 2020-03-10T15:24:51Z

hello @karlnapf i am using this sklearn model to do this outlier detection task
https://scikit-learn.org/stable/modules/outlier_detection.html
could you please guide to which 5000 vectors were you refering to above? so that i could use them in the training process. and we could use try blocks for detecting the ones which interfere with (n-1) term in the denominator.
what do you suggest?

stale bot added the stale label Feb 26, 2020

gf712 added good first issue and removed stale labels Feb 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MMD two sample tests with 1 instance in one sample #4631

MMD two sample tests with 1 instance in one sample #4631

amir-rahnama commented May 12, 2019

karlnapf commented May 12, 2019

amir-rahnama commented May 13, 2019

lambday commented May 14, 2019 •

edited

karlnapf commented May 14, 2019

stale bot commented Feb 26, 2020

bhavukkalra commented Mar 10, 2020

MMD two sample tests with 1 instance in one sample #4631

MMD two sample tests with 1 instance in one sample #4631

Comments

amir-rahnama commented May 12, 2019

karlnapf commented May 12, 2019

amir-rahnama commented May 13, 2019

lambday commented May 14, 2019 • edited

karlnapf commented May 14, 2019

stale bot commented Feb 26, 2020

bhavukkalra commented Mar 10, 2020

lambday commented May 14, 2019 •

edited