Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autodiff gaussian width parameter #4782

Closed
wants to merge 9 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
37 changes: 31 additions & 6 deletions src/shogun/kernel/GaussianKernel.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
#include <shogun/distance/EuclideanDistance.h>
#include <shogun/mathematics/Math.h>


using namespace shogun;

CGaussianKernel::CGaussianKernel() : CShiftInvariantKernel()
Expand Down Expand Up @@ -86,23 +87,42 @@ void CGaussianKernel::set_width(float64_t w)
{
require(w>0, "width ({}) must be positive",w);
m_log_width = std::log(w / 2.0) / 2.0;
m_eigen_log_width.value() = m_log_width;
}

auto CGaussianKernel::kernel_function(int32_t idx_a, int32_t idx_b)
{
// this could be written as Eigen::Matrix<float64_t, n_differentiable_params, 1>;
m_eigen_log_width.value() = m_log_width;

// this could be written as
// eigen_log_width.derivatives() = EigenScalar::Unit(n_differentiable_params, i);
// where i is the idx of the adjoint
m_eigen_log_width.derivatives() = Eigen::VectorXd::Unit(1,0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the derivative of the log width wrt the log width itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this the derivative of the kernel function wrt to the log with, df/dw. It gives the same result as the manual derivative that was in the loop before in get_parameter_gradient

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then kernel function should be equal to the log width (for the derivative to be one), no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean? The derivative is set to one, so that when you apply the chain rule, you get it wrt to this parameter, i.e. you initialise the dependent variable. Like shown here https://eigen.tuxfamily.org/bz_attachmentbase/attachment.cgi?id=395. The same happens in stan gradient function: https://github.com/stan-dev/math/blob/b42294a57318a27d2b6723b96a0b620485ba83e0/stan/math/rev/core/grad.hpp#L39


auto el = CShiftInvariantKernel::distance(idx_a, idx_b);
return exp(-el / (exp(m_eigen_log_width * 2.0) * 2.0));
}

SGMatrix<float64_t> CGaussianKernel::get_parameter_gradient(const TParameter* param, index_t index)
{
using std::exp;

require(lhs, "Left hand side features must be set!");
require(rhs, "Rightt hand side features must be set!");
require(rhs, "Right hand side features must be set!");

if (!strcmp(param->m_name, "log_width"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I guess a next step could be to start thinking about getting rid of this explicit code, and rather automatically offer this derivative through registering something in the ctors ....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for that we should maybe have a base class for classes that have parameters that we can take the derivative wrt. This class registers the gradient parameters in some vector and then we can get the index from there.
Basically when we do watch_param(...) this would add the variable in such a vector if it has the flag GRADIENT

{
SGMatrix<float64_t> derivative=SGMatrix<float64_t>(num_lhs, num_rhs);

for (int k=0; k<num_rhs; k++)
{
#pragma omp parallel for
// #pragma omp parallel for
for (int j=0; j<num_lhs; j++)
{
float64_t element=distance(j, k);
derivative(j, k) = std::exp(-element) * element * 2.0;
auto kernel = kernel_function(j, k);
// 0 is the index of the width parameter
derivative(j, k) = kernel.derivatives()(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems pretty automatable to me (at least say for scalar valued kernels with scalar parameters)

}
}
return derivative;
Expand All @@ -116,9 +136,14 @@ SGMatrix<float64_t> CGaussianKernel::get_parameter_gradient(const TParameter* pa

float64_t CGaussianKernel::compute(int32_t idx_a, int32_t idx_b)
{
float64_t result=distance(idx_a, idx_b);
return std::exp(-result);
float64_t result;
#pragma omp critical
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlnapf the issue now is that this is not thread safe anymore

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because i assign the value of m_eigen_log_width in kernel_function. Basically would have to do the assignment outside the function, but if a user does ->put("log_width", ..) there is no way of tracking this.. hence the gradient dependent parameters would have to be added to the parameter framework somehow. Or AutoDiffScalar would have a reference instead of a value..

{
auto kernel = kernel_function(idx_a, idx_b);
result = kernel.value();
}
return result;
}

void CGaussianKernel::load_serializable_post() noexcept(false)
{
Expand Down
12 changes: 12 additions & 0 deletions src/shogun/kernel/GaussianKernel.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@

#include <shogun/lib/config.h>
#include <shogun/kernel/ShiftInvariantKernel.h>
#include <shogun/mathematics/eigen3.h>
#include <unsupported/Eigen/AutoDiff>

namespace shogun
{
Expand Down Expand Up @@ -123,6 +125,15 @@ class CGaussianKernel: public CShiftInvariantKernel
return std::exp(m_log_width * 2.0) * 2.0;
}

#ifndef SWIG
/**
* Returns a lazily evaluated Eigen expression template
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is already evaluated, then it won't be lazily evaluated (at least not from the point of view of deferring evaluation after this method has been executed).

What about without the lazily evaluated part, or refactoring to "to be lazily evaluated" if you really want to keep that part.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good point, i'll change that

* that tracks the gradient of differentiable hyperparameters
*
* @return expression template of kernel function
*/
auto kernel_function(int32_t idx_a, int32_t idx_b);
#endif
/** return derivative with respect to specified parameter
*
* @param param the parameter
Expand Down Expand Up @@ -174,6 +185,7 @@ class CGaussianKernel: public CShiftInvariantKernel
protected:
/** width */
float64_t m_log_width;
Eigen::AutoDiffScalar<Eigen::VectorXd> m_eigen_log_width;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this idea is actually not too bad: just make the parameters one can diff wrt class members. They just wrap the other member if I understand correctly? These two getting out of sync is he problem you mean with the user putting a parameter? Callback stuff might help? Should be possible to automate something for that or?

Copy link
Member

@karlnapf karlnapf Nov 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a more consequent version of this would be to make all model parameters live in eigen, and all the "model" function simply returning composed versions of those... but i guess that goes too far ;)

};

}
Expand Down