autodiff gaussian width parameter #4782

gf712 · 2019-10-23T11:11:58Z

@karlnapf I was testing out autodiff with stan for the Gaussian kernel and it seems to work fine.
The issue is that the forward pass isn't thread safe, so we might have to rethink a bit how kernels compute values.
Then we can have a virtual method that defines the kernel using arrays rather than scalars, which is then called from forward and backward functions. The linalg parallelism would be the completely handled by eigen and we don't worry about cache, etc...

karlnapf · 2019-11-01T12:03:51Z

src/shogun/kernel/GaussianKernel.cpp

-				float64_t element=distance(j, k);
-				derivative(j, k) = std::exp(-element) * element * 2.0;
+				auto f = exp(-CShiftInvariantKernel::distance(j, k) / constant_part);
+				f.grad();


Wow it is quite simple to do this for the bandwidth ... I guess I tried to diff wrt the features last time I tried, which raised problems regarding passing the matrices to stan...but this is cool!

I assume the same results as there are tests covering this?
what are the computational differences compared to the previous line? Maybe a benchmark would be good?

yes, the results are correct and are test in unittests, mostly by GP stuff.
The computation is very slow... Basically it creates a new variable in the heap in the inner loop which is in the heap, and then it redoes the gradient calculation.
What we need is a way to calculate the "formula" of the gradient once outside the loop and then reuse it.. Not sure how to do this in stan or if it is even possible

karlnapf · 2019-11-01T12:05:24Z

Rethinking the kernels might be a good idea to parallelise things anyways ... The singe kernel value approach we have is more suited to very expensive kernel functions (string kernels), including the kernel cache (useless for things like the gaussian kernel, here vectorization is much more useful)

Any ideas how to go about with this concretely?

gf712 · 2019-11-01T15:22:10Z

i noticed that for example for euclidean distance this is really innefficient, as we compute one value at the time and do some checks before the computation. I think shogun does the calculation fast, but could be better with some refactoring:
I think the best way is to write these things lazily like in Eigen. So i would call get_lazy_distance_matrix, which would return a lazily evaluated computation of the distance and then use that to calculate the gaussian kernel. Basically we need to build around a computational graph that is executed once, without any if else conditions or casts in the loop.

karlnapf · 2019-11-01T15:59:57Z

i noticed that for example for euclidean distance this is really innefficient, as we compute one value at the time and do some checks before the computation. I think shogun does the calculation fast, but could be better with some refactoring:
I think the best way is to write these things lazily like in Eigen. So i would call get_lazy_distance_matrix, which would return a lazily evaluated computation of the distance and then use that to calculate the gaussian kernel. Basically we need to build around a computational graph that is executed once, without any if else conditions or casts in the loop.

I think in addition to computing a single kernel value, we would need to offer an API that gets you the matrix. But not via simply looping over single values but rather directly in the lazy way you suggest ...
Refactoring the computational graph thingi you mentioned basically requires rewriting shogun in a tf style I guess.....not even sure what would be good first steps

gf712 · 2019-11-05T18:51:22Z

@karlnapf turns out eigen does what we want! This would fit nicely with using more eigen and lazily evaluate the distance in the kernel! The latest commit is as fast as the manual derivative!

compile time differentiation is much faster Revert "autodiff gaussian width parameter" This reverts commit e9bced7.

gf712 · 2019-11-05T19:41:48Z

@karlnapf I am thinking we could write our own AutoDiffScalar, but instead AutoDiffArray, that does the same compile time optimisation as the current autodiff but with arrays, so should handle parallelism better? and then we can go through stan and add more ops than there in eigen right now!

gf712 · 2019-11-05T19:46:23Z

Benchmark calculating the Gaussian kernel gradient with different training example sizes and feature dimensions

--------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations
--------------------------------------------------------------------
BM_Autodiff_eigen/64/4         0.203 ms        0.146 ms         5107
BM_Autodiff_eigen/256/4        0.451 ms        0.389 ms         1825
BM_Autodiff_eigen/1024/4        1.54 ms         1.25 ms          568
BM_Autodiff_eigen/2048/4        2.84 ms         2.35 ms          301
BM_Autodiff_eigen/64/16        0.468 ms        0.403 ms         1721
BM_Autodiff_eigen/256/16        1.39 ms         1.35 ms          460
BM_Autodiff_eigen/1024/16       4.84 ms         4.60 ms          149
BM_Autodiff_eigen/2048/16       9.12 ms         8.82 ms           74
BM_Manualdiff/64/4             0.198 ms        0.198 ms         2901
BM_Manualdiff/256/4            0.501 ms        0.499 ms         1369
BM_Manualdiff/1024/4            1.44 ms         1.40 ms          439
BM_Manualdiff/2048/4            2.77 ms         2.65 ms          269
BM_Manualdiff/64/16            0.506 ms        0.468 ms         1559
BM_Manualdiff/256/16            1.70 ms         1.46 ms          509
BM_Manualdiff/1024/16           5.27 ms         5.22 ms          100
BM_Manualdiff/2048/16           9.16 ms         9.14 ms           72

karlnapf · 2019-11-06T10:39:04Z

src/shogun/kernel/GaussianKernel.cpp

-				derivative(j, k) = std::exp(-element) * element * 2.0;
+				eigen_log_width.derivatives() = EigenScalar::Unit(1,0);
+				auto el = CShiftInvariantKernel::distance(j, k);
+				Eigen::AutoDiffScalar<EigenScalar> kernel = exp(-el / (exp(eigen_log_width * 2.0) * 2.0));


It kinda sucks that the code for the kernel needs to be in here as well as in the kernel itself dont you think?

karlnapf · 2019-11-06T14:14:00Z

src/shogun/kernel/GaussianKernel.cpp

 	require(lhs, "Left hand side features must be set!");
-	require(rhs, "Rightt hand side features must be set!");
+	require(rhs, "Right hand side features must be set!");

 	if (!strcmp(param->m_name, "log_width"))


so I guess a next step could be to start thinking about getting rid of this explicit code, and rather automatically offer this derivative through registering something in the ctors ....

I think for that we should maybe have a base class for classes that have parameters that we can take the derivative wrt. This class registers the gradient parameters in some vector and then we can get the index from there.
Basically when we do watch_param(...) this would add the variable in such a vector if it has the flag GRADIENT

karlnapf · 2019-11-06T14:15:16Z

src/shogun/kernel/GaussianKernel.cpp

+				using EigenScalar = Eigen::Matrix<float64_t, 1, 1>;
+				Eigen::AutoDiffScalar<EigenScalar> kernel = kernel_function(j, k);
+				// 0 is the index of the width parameter
+				derivative(j, k) = kernel.derivatives()(0);


this seems pretty automatable to me (at least say for scalar valued kernels with scalar parameters)

gf712 · 2019-11-06T14:40:01Z

src/shogun/kernel/GaussianKernel.cpp

+auto CGaussianKernel::kernel_function(int32_t idx_a, int32_t idx_b)
+{
+	// this could be written as Eigen::Matrix<float64_t, n_differentiable_params, 1>;
+	m_eigen_log_width = Eigen::AutoDiffScalar<Eigen::VectorXd>(m_log_width);


@karlnapf this should fix the temporary being deleted. Ideally we would replace m_log_width with this... Because this isn't thread safe

and for the distance matrix we woudn't have this issue because features holds a reference to the data!

gf712 · 2019-11-06T16:32:42Z

src/shogun/kernel/GaussianKernel.cpp

-    float64_t result=distance(idx_a, idx_b);
-	return std::exp(-result);
+	float64_t result;
+	#pragma omp critical


@karlnapf the issue now is that this is not thread safe anymore

because i assign the value of m_eigen_log_width in kernel_function. Basically would have to do the assignment outside the function, but if a user does ->put("log_width", ..) there is no way of tracking this.. hence the gradient dependent parameters would have to be added to the parameter framework somehow. Or AutoDiffScalar would have a reference instead of a value..

karlnapf · 2019-11-07T09:35:00Z

src/shogun/kernel/GaussianKernel.h

@@ -174,6 +185,7 @@ class CGaussianKernel: public CShiftInvariantKernel
 protected:
 	/** width */
 	float64_t m_log_width;
+	Eigen::AutoDiffScalar<Eigen::VectorXd> m_eigen_log_width;


this idea is actually not too bad: just make the parameters one can diff wrt class members. They just wrap the other member if I understand correctly? These two getting out of sync is he problem you mean with the user putting a parameter? Callback stuff might help? Should be possible to automate something for that or?

a more consequent version of this would be to make all model parameters live in eigen, and all the "model" function simply returning composed versions of those... but i guess that goes too far ;)

iglesias · 2019-11-11T09:26:03Z

src/shogun/kernel/GaussianKernel.h

@@ -123,6 +125,15 @@ class CGaussianKernel: public CShiftInvariantKernel
 		return std::exp(m_log_width * 2.0) * 2.0;
 	}

+#ifndef SWIG
+	/**
+	 *	Returns a lazily evaluated Eigen expression template


If it is already evaluated, then it won't be lazily evaluated (at least not from the point of view of deferring evaluation after this method has been executed).

What about without the lazily evaluated part, or refactoring to "to be lazily evaluated" if you really want to keep that part.

yes, good point, i'll change that

iglesias · 2019-11-11T09:28:15Z

src/shogun/kernel/GaussianKernel.cpp

+	// this could be written as 
+	// eigen_log_width.derivatives() = EigenScalar::Unit(n_differentiable_params, i);
+	// where i is the idx of the adjoint
+	m_eigen_log_width.derivatives() = Eigen::VectorXd::Unit(1,0);


This is the derivative of the log width wrt the log width itself?

this the derivative of the kernel function wrt to the log with, df/dw. It gives the same result as the manual derivative that was in the loop before in get_parameter_gradient

Then kernel function should be equal to the log width (for the derivative to be one), no?

what do you mean? The derivative is set to one, so that when you apply the chain rule, you get it wrt to this parameter, i.e. you initialise the dependent variable. Like shown here https://eigen.tuxfamily.org/bz_attachmentbase/attachment.cgi?id=395. The same happens in stan gradient function: https://github.com/stan-dev/math/blob/b42294a57318a27d2b6723b96a0b620485ba83e0/stan/math/rev/core/grad.hpp#L39

gf712 · 2020-02-15T21:15:25Z

Closing this in favour of work being done graphs

gf712 requested a review from karlnapf October 23, 2019 11:12

karlnapf reviewed Nov 1, 2019

View reviewed changes

gf712 added 5 commits November 5, 2019 19:02

autodiff gaussian width parameter

0d13a9e

include only the necessary header

e7f2e62

moved var instantiation outside the loop

0b0de2e

autodiff gaussian width parameter

e9bced7

include only the necessary header

5815a3d

gf712 force-pushed the gaussian_kernel_autodiff branch from 64540d1 to aa229fa Compare November 5, 2019 19:03

use eigen instead of stanmath

a35cf20

compile time differentiation is much faster Revert "autodiff gaussian width parameter" This reverts commit e9bced7.

gf712 force-pushed the gaussian_kernel_autodiff branch from aa229fa to a35cf20 Compare November 5, 2019 19:05

karlnapf reviewed Nov 6, 2019

View reviewed changes

generic kernel function for AD and kernel computation

bdcebbf

karlnapf reviewed Nov 6, 2019

View reviewed changes

gf712 commented Nov 6, 2019

View reviewed changes

gf712 force-pushed the gaussian_kernel_autodiff branch from 6ed74f7 to fe0a4aa Compare November 6, 2019 15:47

make expression template depend on class member

bb85e1b

gf712 force-pushed the gaussian_kernel_autodiff branch from fe0a4aa to bb85e1b Compare November 6, 2019 16:12

fix race condition

f94af02

gf712 commented Nov 6, 2019

View reviewed changes

karlnapf reviewed Nov 7, 2019

View reviewed changes

iglesias reviewed Nov 11, 2019

View reviewed changes

gf712 closed this Feb 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autodiff gaussian width parameter #4782

autodiff gaussian width parameter #4782

gf712 commented Oct 23, 2019

karlnapf Nov 1, 2019

gf712 Nov 1, 2019

karlnapf commented Nov 1, 2019

gf712 commented Nov 1, 2019

karlnapf commented Nov 1, 2019

gf712 commented Nov 5, 2019

gf712 commented Nov 5, 2019

gf712 commented Nov 5, 2019 •

edited

karlnapf Nov 6, 2019

karlnapf Nov 6, 2019

gf712 Nov 6, 2019

karlnapf Nov 6, 2019

gf712 Nov 6, 2019

gf712 Nov 6, 2019

gf712 Nov 6, 2019

gf712 Nov 6, 2019

karlnapf Nov 7, 2019

karlnapf Nov 7, 2019 •

edited

iglesias Nov 11, 2019

gf712 Nov 11, 2019

iglesias Nov 11, 2019

gf712 Nov 11, 2019

iglesias Nov 11, 2019

gf712 Nov 11, 2019

gf712 commented Feb 15, 2020

autodiff gaussian width parameter #4782

autodiff gaussian width parameter #4782

Conversation

gf712 commented Oct 23, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karlnapf commented Nov 1, 2019

gf712 commented Nov 1, 2019

karlnapf commented Nov 1, 2019

gf712 commented Nov 5, 2019

gf712 commented Nov 5, 2019

gf712 commented Nov 5, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karlnapf Nov 7, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gf712 commented Feb 15, 2020

gf712 commented Nov 5, 2019 •

edited

karlnapf Nov 7, 2019 •

edited