-
Notifications
You must be signed in to change notification settings - Fork 18
Numerical instability of Sinkhorn (even with lse_mode=True) & weird behavior with lse_mode=False #6
Comments
Hi @theouscidda6 , When I run your code snippet with the latest OTT release I get: epsilon = 1: regularised optimal transport cost = 0.7936503887176514 So, I would suggest you download the latest release (which might not be the one you get with pip install). The overflow is related to a previous issue. I agree with you though that for epsilon < =0.001 with lse_mode=False, instead of getting a NaN (so that we know that there was a numerical issue) we get extremely low values for the OT objective. It would be great if this could be fixed. |
Hi both, Thanks a lot for raising this issue! As @ersisimou said, it is likely you get the nans in lse_mode=True because you do not have the latest version of OTT. Regarding the very small reg_ot_cost values, thanks a lot for the comment, we will see what we can do to indicate numerical issues. In fact, with such small values of epsilon (<1e-4) and lse_mode=False, you can see that the kernel matrix (geom.kernel_matrix) is null. Meanwhile, it is possible to follow other outputs of sinkhorn to verify that the algorithm has indeed converged. For example:
Nevertheless, we will see how to better indicate numerical issues. |
Hi Théo, |
Generate two empirical masures:
Compute the regularized Wasserstein distance for decreasing epsilons with lse_mode=True:
Compute the regularized Wasserstein distance for decreasing epsilons with lse_mode=False:
Comments:
Using the logsumexp mode (les_mode=True), I get overflow (i.e. nan) from epsilon = 10e-3. This is quite strange because as the support points are randomly drawn in U([0,1]^5), the maximum distance between two support points of the first and second measure is 25. So we have ||C||_inf / eps ~ 2.5 * 10e4 which is reasonable, so the logsumexp should not generate any overflow.
Moreover, by not using the logsumexp mode (les_mode=False), I don't get any overflow (until epsilon = 1e-9). This is strange since logsumexp is supposed to be more stable than the version using matrix products of vectors against Gibbs Kernel. On the other hand, as epsilon decreases, the regularized wassertein distance tends to 0 (9.99 * 10e-10 for epsilon = 1e-9). But when epislon becomes very small, the regularized Wassersetin distance tends towards the Wasserstein distance. This is therefore strange because there is no reason a priori for the Wasserstein distance between the two measures to be zero. Indeed, even if the points of the support of each measure are drawn according to the same law (U([0,1]^5), the weights are not uniform and are also drawn randomly then normalized. The regularized Wasserstein distance that we compute is therefore not a priori an estimator of the regularized Wasserstein distance between two measures following the same law (U([0,1]^5), which in this case would make sense to tend towards 0.
I hope to have helped you with this remark, and I thank you very much for the development of OTT which really facilitates the use of numerical optimal transport, especially for the differentiation of OT metrics.
The text was updated successfully, but these errors were encountered: