New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace lapack with linalg in MahalanobisDistance #4085
Conversation
Great! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for that.
Is the mahalanobis covered by an integration test? would be something that matches integration_meta_cpp*mahalanobis*
I think we could also use linalg in ridge regression, rather than pure eigen3
cmake/FindMetaExamples.cmake
Outdated
@@ -19,7 +19,6 @@ function(get_excluded_meta_examples) | |||
read_config_header(${shogun_INCLUDE_DIR}/shogun/lib/config.h) | |||
is_set(${SHOGUN_CONFIG} HAVE_NLOPT) | |||
is_set(${SHOGUN_CONFIG} USE_GPL_SHOGUN) | |||
is_set(${SHOGUN_CONFIG} HAVE_LAPACK) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want to keep that, there might be other examples
cmake/FindMetaExamples.cmake
Outdated
@@ -37,13 +36,6 @@ function(get_excluded_meta_examples) | |||
) | |||
ENDIF() | |||
|
|||
IF(NOT HAVE_LAPACK) | |||
LIST(APPEND EXCLUDED_META_EXAMPLES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please keep the block, so we can add things there that pop up in the future, just don't append anything
@@ -8,7 +8,6 @@ | |||
*/ | |||
#include <shogun/lib/config.h> | |||
|
|||
#ifdef HAVE_LAPACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah so this guard was just outdated?
I think we should use linalg rather than eigen3 directly inside though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, just outdated. I will look into that to find the possibility of using pure linalg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found eigen_kernel_matrix.diagonal() += eigen_tau;
. We are not able to implement adding a vector to a diagonal with SGMatrix or linalg. Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leave it as it is for now (you can already change the rest)
Then you could create this method in linalg, and then send another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, you could just add a loop for now.
both |
I think the LAPACK guard was still necessary. The windows build doesnt have lapack installed, and here you go:
|
It is this line: But actually this code is bad generally (not your fault). One should never ever store matrix inverse. Instead, it should store a cholesky factorization of the covariance matrix. Then, rather than multiplying the inverse covariance matrix to a vector in the |
The matrix is PSD indeed, so LDLT would be more stable. I think adding an argument to cholesky would be good, so that the factorization can operate in both modes, something like or Good catch! |
Seems simply adding a |
You are absolutely right. In fact I now remember a discussion we previously had about this some years back. I think a clean solution would be to expose a new method and a few solver method as well. Everything else is too messy. |
We did have some discussion years ago. The best way seems to be having separate methods for |
I see. I'll try to implement that. |
d2ad7c8
to
b2206e6
Compare
* @see <a href="http://en.wikipedia.org/wiki/Mahalanobis_distance"> | ||
* Wikipedia: Mahalanobis Distance</a> | ||
*/ | ||
class CMahalanobisDistance : public CRealDistance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW why this reformatting?
Pls don't put this into a PR that changes small bits as this one. Makes everything much harder to review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems the style checker changed other codes than mine. I'm using command like git clang-format-3.8 --commit xxx
, is it right ?
eigen_kernel_matrix = eigen_feats_matrix*eigen_feats_matrix.transpose(); | ||
|
||
eigen_kernel_matrix.diagonal() += eigen_tau; | ||
linalg::matrix_prod(feats_matrix, feats_matrix, kernel_matrix, false, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice much cleaner
return false; | ||
} | ||
eigen_y = llt.solve(eigen_y); | ||
auto decomposition = linalg::cholesky_factor(kernel_matrix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you change the indentation? Is this the style cheker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed that the original code is indented with spaces.
@@ -0,0 +1,58 @@ | |||
/* | |||
* Copyright (c) The Shogun Machine Learning Toolbox | |||
* Written (w) 2018 Wuwei Lin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We changed the license header recently, to something much shorter. Check other files for the template ...
Could you pls split the PR into two: one for regression and one for the distance. |
yep, you can just force update this one now |
But you will need to fix the "inverse" problem (see above) first. |
a2211df
to
caa2902
Compare
caa2902
to
fb42c9a
Compare
|
||
if ( l == r) | ||
{ | ||
mean = ((CDenseFeatures<float64_t>*) l)->get_mean(); | ||
icov = ((CDenseFeatures<float64_t>*) l)->get_cov(); | ||
cov = ((CDenseFeatures<float64_t>*)l)->get_cov(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while you are changing these lines, could you please replace these explicit casts to dynamic_cast
at least plz? :)
or even better do a REQUIRED
assertation for the type
} | ||
else | ||
{ | ||
mean = ((CDenseFeatures<float64_t>*)l) | ||
->compute_mean((CDotFeatures*)lhs, (CDotFeatures*)rhs); | ||
icov = CDotFeatures::compute_cov((CDotFeatures*) lhs, (CDotFeatures*) rhs); | ||
cov = CDotFeatures::compute_cov((CDotFeatures*)lhs, (CDotFeatures*)rhs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another casting
|
||
float64_t result = cblas_ddot(v.vlen, v.vector, 1, diff.vector, 1); | ||
auto v = linalg::ldlt_solver(chol_cov_L, chol_cov_d, chol_cov_p, diff); | ||
float64_t result = linalg::dot(v, diff); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be auto or? :)
@@ -39,20 +37,43 @@ bool CMahalanobisDistance::init(CFeatures* l, CFeatures* r) | |||
{ | |||
CRealDistance::init(l, r); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this call covers all the checks, so we don't need them again in this call (just convolutes the code)
@vigsterkr both REAL and DENSE is checked. That was the whole point of having the base class here iirc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do a cast after the call and add a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would actually change the design of this and have instead of init returning a boolean the actual typecasted values available... as i dont see really the point of an assertation failure that will just throw an exception and in case of success have a true :)
this function call should actually tell us that yes the types are cool and here you go this is what you wanted anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vinx13 you can ignore most of this... just make sure that you do an ASSERT(CRealDistance::init(l, r)); and add a //FIXME: above with reference to the above comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vigsterkr I mean this whole type system in the features is messed up. I would just rely on dynamic_cast
everywhere. The bool is also pointless. And the enums are even worse (too many types, I would make those strings or even better drop them)
"Right hand side (was %s) has to be CDenseFeatures instance.\n", | ||
rhs->get_name()); | ||
|
||
REQUIRE( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and actually this is implicitly tested in CRealDistance::init
https://github.com/vinx13/shogun/blob/39e7dd2f390d13e665678be23b0b0221f1212033/src/shogun/distance/RealDistance.h#L34
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha I wrote the same thing above ;) sorry didnt see
@@ -63,9 +84,10 @@ void CMahalanobisDistance::cleanup() | |||
|
|||
float64_t CMahalanobisDistance::compute(int32_t idx_a, int32_t idx_b) | |||
{ | |||
auto feat_l = static_cast<CDenseFeatures<float64_t>*>(lhs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is reliable due to old code design principles by sören
@@ -39,20 +37,43 @@ bool CMahalanobisDistance::init(CFeatures* l, CFeatures* r) | |||
{ | |||
CRealDistance::init(l, r); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of the REQUIRED sections it should be just check the return type of CRealDistance::init(l, r)
as it checks for everything you need want....
diff.vlen, diff.vector, 1, 0.0, v.vector, 1); | ||
|
||
float64_t result = cblas_ddot(v.vlen, v.vector, 1, diff.vector, 1); | ||
auto v = linalg::ldlt_solver(chol_cov_L, chol_cov_d, chol_cov_p, diff); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is much much from a numerical perspective!
@karlnapf could you plz take a look |
@@ -37,22 +35,31 @@ CMahalanobisDistance::~CMahalanobisDistance() | |||
|
|||
bool CMahalanobisDistance::init(CFeatures* l, CFeatures* r) | |||
{ | |||
CRealDistance::init(l, r); | |||
// FIXME: See comments in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixme?
* Fix broken link in doc * Use linalg in MahalanobisDistance * Apply autoformatter * Remove MehalanobisDistance from LAPACK guarded list * Add lapack guard back * Update license header style * Save ldlt instead of inverse of cov * Replace explict casts and add type check * Remove unnecessary type checks
This PR replaces lapack usage with linalg in
MahalanobisDistance
.A unit test of
MahalanobisDistance
is added (reference computed with Matlab)