The linear time MMD implements a nonparametric statistical hypothesis test to reject the null hypothesis that to distributions p and q, each only observed via n samples, are the same, i.e. H0 : p = q.
The (unbiased) statistic is given by
See gretton2012kernel
for a detailed introduction.
Imagine we have samples from p and q. As the linear time MMD is a streaming statistic, we need to pass it CStreamingFeatures. Here, we use synthetic data generators, but it is possible to construct CStreamingFeatures from (large) files.
linear_time_mmd.sg:create_features
We create an instance of CLinearTimeMMD
, passing it data and the kernel to use,
linear_time_mmd.sg:create_instance
An important parameter for controlling the efficiency of the linear time MMD is block size of the number of samples that is processed at once. As a guideline, set as large as memory allows.
linear_time_mmd.sg:set_burst
Computing the statistic is done as
linear_time_mmd.sg:estimate_mmd
We can perform the hypothesis test via computing the rejection threshold
linear_time_mmd.sg:perform_test_threshold
Alternatively, we can compute the p-value for the above value of the statistic
linear_time_mmd.sg:perform_test_p_value
../../references.bib