New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MMD two sample tests with 1 instance in one sample #4631
Comments
Thanks for reporting! I think this should raise an assertion error, @lambday ? @ambodi it won't return anything sensible. The test statistic might be OK (depending on whether or not you use biased/unbiased), but the p-value will be nonsense (permutation test makes no sense with one sample, the other approximations also will fail). Just curious, why are you interested in this case? |
@karlnapf it does not raise an error but returns nans. Great question, I have this case: I have a function that outputs one vector and another function that outputs 5000 vectors. I wanted to evaluate the divergence between these two functions. Now I can definitely run the functions over a range of values, let's say 700, but then the problem is that shogun memory overflows for 5000*700 samples. Any suggestions? |
@karlnapf yeah maybe there should be an assertion, but we've left it up to the user. For the unbiased case we should at least have that check. @ambodi The NaN values come because of how we It's interesting that shogun memory overflows for 5000x700 samples. What's the dimension? Can you post your code somewhere so that we can try it out? [EDIT: fixed incorrect statement about scaling] |
statistically, this problem seems pretty much impossible to solve with a test. You might want to try an outlier detection algorithm, like a one-class SVM, and train it on the 5000 vectors and then apply to the single vector to see whether it is similar-enough or not |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
hello @karlnapf i am using this sklearn model to do this outlier detection task |
I noticed that when I run a Quadratic time MMD two sample tests, in a case on the sample has a single instance, the samples will all return nan. Now, this could be due to the assumption in MMD itself, but I was wondering if I still can interpret
mmd.perform_test(alpha)
still or not?The text was updated successfully, but these errors were encountered: