Replies: 2 comments 1 reply
-
Interesting. You could also use a gradient-free optimizer for model fitting and acquisition optimization (e.g. L-BFGS-B with finite differences, CMA-ES). You could also consider smooth differentiable relaxations of the manhattan distance.
I don't quite understand why this is |
Beta Was this translation helpful? Give feedback.
-
Presumably this is b/c the ℓ1 norm is differentiable almost everywhere (i.e. on If differentiability is a concern, have you tried a differentiable approximation to the ℓ1 norm? I.e. a p-Norm for |
Beta Was this translation helpful? Give feedback.
-
Hi all,
I am currently experimenting with a first p-norm distance function which measures the difference between two monotonically increasing shapes (in case of monotonically increasing shapes, the first p-norm is identical to the Wasserstein distance).
The Wasserstein distance$W(x_i, x_j)$ is then used in an absolute exponential kernel:
This Kernel provides so far the best predictive performance on my problem. The problem with the absolute exponential kernel is that it is non-smooth. But changing the kernel to an RBF or Matern 3/2 style kernel results in non-positive definitive kernels.
Using the absolute exponential type kernel in optimization leads to frequently ocurring gradient errors (nan gradients). But surprisingly, they do not occur always. In many scenarios the acquisition function seems to be differentiable with respect to X. I was expecting always non-differentiability.
Any ideas on this?
Best,
Johannes
Beta Was this translation helpful? Give feedback.
All reactions