Optimization with Absolute Exponential Kernel #2552

jduerholt · 2024-09-26T08:26:09Z

jduerholt
Sep 26, 2024
Collaborator

Hi all,

I am currently experimenting with a first p-norm distance function which measures the difference between two monotonically increasing shapes (in case of monotonically increasing shapes, the first p-norm is identical to the Wasserstein distance).

The Wasserstein distance $W (x_{i}, x_{j})$ is then used in an absolute exponential kernel:

$K (x_{i}, x_{j}) = e x p (- W (x_{i}, x_{j}) / l)$

This Kernel provides so far the best predictive performance on my problem. The problem with the absolute exponential kernel is that it is non-smooth. But changing the kernel to an RBF or Matern 3/2 style kernel results in non-positive definitive kernels.

Using the absolute exponential type kernel in optimization leads to frequently ocurring gradient errors (nan gradients). But surprisingly, they do not occur always. In many scenarios the acquisition function seems to be differentiable with respect to X. I was expecting always non-differentiability.

Any ideas on this?

Best,

Johannes

sdaulton · 2024-09-26T15:13:06Z

sdaulton
Sep 26, 2024
Collaborator

Interesting. You could also use a gradient-free optimizer for model fitting and acquisition optimization (e.g. L-BFGS-B with finite differences, CMA-ES). You could also consider smooth differentiable relaxations of the manhattan distance.

But changing the kernel to an RBF or Matern 3/2 style kernel results in non-positive definitive kernels.

I don't quite understand why this is

1 reply

jduerholt Sep 26, 2024
Collaborator Author

Model fitting is not the problem, gradient based acquisition optimization is not working properly, but I switched to CMA-ES which seems to work.

Concerning the kernel, this is a paper which discusses it for two dimensional wasserstein distances: https://arxiv.org/pdf/2002.01878. But based on my experiments, it holds also for the first p-norm. The kernel gets only positive definite for small lenghtscales (too small for my case). I try to post tomorrow a figure that illustrates over which space the kernel should operate.

Balandat · 2024-09-26T23:43:05Z

Balandat
Sep 26, 2024
Collaborator

Using the absolute exponential type kernel in optimization leads to frequently ocurring gradient errors (nan gradients). But surprisingly, they do not occur always. In many scenarios the acquisition function seems to be differentiable with respect to X. I was expecting always non-differentiability.

Presumably this is b/c the ℓ1 norm is differentiable almost everywhere (i.e. on {(x1, x2) : x1_i != x2_i for all i})?

If differentiability is a concern, have you tried a differentiable approximation to the ℓ1 norm? I.e. a p-Norm for p=1+eps?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization with Absolute Exponential Kernel #2552

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Optimization with Absolute Exponential Kernel #2552

jduerholt Sep 26, 2024 Collaborator

Replies: 2 comments · 1 reply

sdaulton Sep 26, 2024 Collaborator

jduerholt Sep 26, 2024 Collaborator Author

Balandat Sep 26, 2024 Collaborator

jduerholt
Sep 26, 2024
Collaborator

Replies: 2 comments 1 reply

sdaulton
Sep 26, 2024
Collaborator

jduerholt Sep 26, 2024
Collaborator Author

Balandat
Sep 26, 2024
Collaborator