MMD unbiased estimate issue #4595

saeidnp · 2019-03-29T01:40:39Z

I run the following code which computes MMD unbiased estimate multiple times. Since both the distributions are Normal(0,1) and the estimate is unbiased, I expect the average output to be 0. However, it always prints the exact same number (-0.6577018139068969)

import numpy as np
from torch.distributions.normal import Normal
import shogun as sg

for _ in range(100):
    X = Normal(0,1).sample((7000,)).numpy().reshape(-1,1)
    Y = Normal(0,1).sample((7000,)).numpy().reshape(-1,1)
    mmd = sg.QuadraticTimeMMD()
    mmd.set_p(sg.RealFeatures(X.T.astype(np.float64)))
    mmd.set_q(sg.RealFeatures(Y.T.astype(np.float64)))
    mmd.set_kernel(sg.GaussianKernel(32, 1))
    mmd.set_statistic_type(sg.ST_UNBIASED_FULL)
    stat = mmd.compute_statistic()
    print('stat = ', stat)

The text was updated successfully, but these errors were encountered:

lambday · 2019-03-29T11:26:17Z

Hi @saeidnp. Thanks for letting us know. I'll check and get back to you on this.

karlnapf · 2019-04-13T16:54:09Z

any updates @lambday ?

karlnapf · 2019-04-13T16:59:28Z

I just tried and I get

import numpy as np
import shogun as sg

stats = []
for _ in range(100):
    N = 2000
    X = np.random.randn(1,N)
    Y = np.random.randn(1,N)
    mmd = sg.QuadraticTimeMMD()
    mmd.set_p(sg.RealFeatures(X.astype(np.float64)))
    mmd.set_q(sg.RealFeatures(Y.astype(np.float64)))
    mmd.set_kernel(sg.GaussianKernel(32, 1))
    mmd.set_statistic_type(sg.ST_UNBIASED_FULL)
    stat = mmd.compute_statistic()
    print('stat = ', stat)
    stats += [stat]
    print('Average so far:', np.mean(stats))
...
('stat = ', 0.09728989243740216)
('Average so far:', -0.061675398767896017)
('stat = ', 0.5960679263807833)
('Average so far:', -0.046726686832698761)
('stat = ', -0.24397649394813925)
('Average so far:', -0.051110015879708551)
('stat = ', 1.8716953927651048)
('Average so far:', -0.0093098983004734764)
('stat = ', -0.37008902290835977)
('Average so far:', -0.016986049887875315)
('stat = ', -0.15827194147277623)
('Average so far:', -0.01992950596256075)
('stat = ', 0.9893259266391397)
('Average so far:', 0.00066754368237191238)
('stat = ', -0.36889349576085806)
('Average so far:', -0.006723677106492687)
...

I guess there is an issue with your random number generator?

saeidnp · 2019-04-19T01:43:26Z

@karlnapf in your code less samples are used than mine.
When I run your code, I get the expected output. But I tried your code with N = 7000 and got that magic number (-0.6577018139068969) again!

karlnapf · 2019-04-20T07:52:16Z

That seems strange to me but I’ll check!

karlnapf · 2019-04-23T13:40:03Z

I get exactly the same magic number. Crazy! This is a bug and I will dig deeper once I have a moment. @lambday might also be able to help

karlnapf · 2019-04-30T18:29:40Z

This is an overflow issue caused by a .sum() call of an Eigen3 block. Will fix it next

karlnapf · 2019-05-08T13:01:16Z

(13:52:45) Heiko: block.cast<float64>().sum()
(13:52:49) Heiko: is that lazy evaluated?
(13:52:59) Heiko: or is a new matrix allocated
(13:52:59) Heiko: ?
(13:53:43) ChriSopht: the casting is done lazily. but this does not get vectorized
(13:54:00) Heiko: i see
(13:54:18) Heiko: so if I want to have it vectorized I need to allocate a new matrix with the casted values
(13:55:09) ChriSopht: HeikoS: that would vectorize the summation but not the conversion. so likely not worth the overhead
(13:55:44) ChriSopht: eventually this needs to be implemented in Eigen, but it is not too trivial to do generically
(13:55:56) Heiko: ChriSopht: alright thanks
(13:56:07) Heiko: well I can think about changing the original array in the first place maybe
(13:56:12) Heiko: but this is helpful, thanks!
(13:57:07) ChriSopht: sure that is possible, depends of course if that is beneficial (more memory/cache throughput).
(13:58:14) ChriSopht: you could try to implement a custom SIMD-cast+summation function (it is not very hard, if you are just consider one target architecture)
(13:58:40) Heiko: ah that might be an idea
(13:58:51) Heiko: do you have any pointers for that?
(14:01:32) ChriSopht: you need one cvtps_pd (depending on your architecture): https://software.intel.com/sites/landingpage/IntrinsicsGuide/#cats=Convert&techs=SSE2,AVX,AVX_512&expand=1762&text=cvtps_pd and then one add_pd in your main-loop
(14:02:13) ChriSopht: depending on your block-sizes you can also use two or four accumulators (better throughput)
(14:03:22) ChriSopht: last reduction can be done by Eigen::internal::predux

gbaydin · 2019-05-10T11:43:05Z

@saeidnp can you please confirm whether this closed issue fixed the "magic number" problem? The issue is closed but I don't see a message here saying that the bug is fixed.

karlnapf · 2019-05-10T12:20:53Z

@gbaydin yes it is solved via my hotfix. Could you confirm if you are using it?

It (unfortunately) will now a tiny bit slower than before due to the lack of low level vectorization in Eigen3 when summing the kernel matrix, now a c++ loop rather than SIMD summations of the columns. But that should only be noticeable when sampling from the null distribution (repeated computation of the test). We will run benchmarks and potentially try to improve it.

saeidnp · 2019-05-10T18:36:24Z

@gbaydin @karlnapf yes it passes my tests.

karlnapf · 2019-05-12T12:57:08Z

Great, thanks

…hogun-toolbox#4625)

karlnapf closed this as completed Apr 13, 2019

karlnapf reopened this Apr 23, 2019

karlnapf added the Type: Bug label Apr 23, 2019

karlnapf assigned lambday Apr 23, 2019

karlnapf added a commit to karlnapf/shogun that referenced this issue May 2, 2019

Fix overflow issue with large matrices (fixes shogun-toolbox#4595)

3c9c24e

karlnapf added a commit to karlnapf/shogun that referenced this issue May 2, 2019

Fix overflow issue with large matrices (fixes shogun-toolbox#4595)

a0059d9

karlnapf closed this as completed in 287f02d May 6, 2019

ktiefe pushed a commit to ktiefe/shogun that referenced this issue Jul 30, 2019

Fix overflow issue with large matrices (fixes shogun-toolbox#4595) (s…

704aca4

…hogun-toolbox#4625)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MMD unbiased estimate issue #4595

MMD unbiased estimate issue #4595

saeidnp commented Mar 29, 2019

lambday commented Mar 29, 2019

karlnapf commented Apr 13, 2019

karlnapf commented Apr 13, 2019 •

edited

saeidnp commented Apr 19, 2019

karlnapf commented Apr 20, 2019

karlnapf commented Apr 23, 2019

karlnapf commented Apr 30, 2019

karlnapf commented May 8, 2019 •

edited

gbaydin commented May 10, 2019

karlnapf commented May 10, 2019

saeidnp commented May 10, 2019

karlnapf commented May 12, 2019

MMD unbiased estimate issue #4595

MMD unbiased estimate issue #4595

Comments

saeidnp commented Mar 29, 2019

lambday commented Mar 29, 2019

karlnapf commented Apr 13, 2019

karlnapf commented Apr 13, 2019 • edited

saeidnp commented Apr 19, 2019

karlnapf commented Apr 20, 2019

karlnapf commented Apr 23, 2019

karlnapf commented Apr 30, 2019

karlnapf commented May 8, 2019 • edited

gbaydin commented May 10, 2019

karlnapf commented May 10, 2019

saeidnp commented May 10, 2019

karlnapf commented May 12, 2019

karlnapf commented Apr 13, 2019 •

edited

karlnapf commented May 8, 2019 •

edited