Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace lapack with linalg in MahalanobisDistance #4085

Merged
merged 9 commits into from Feb 19, 2018

Conversation

vinx13
Copy link
Member

@vinx13 vinx13 commented Jan 16, 2018

This PR replaces lapack usage with linalg in MahalanobisDistance.
A unit test of MahalanobisDistance is added (reference computed with Matlab)

@karlnapf
Copy link
Member

Great!

Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for that.

Is the mahalanobis covered by an integration test? would be something that matches integration_meta_cpp*mahalanobis*

I think we could also use linalg in ridge regression, rather than pure eigen3

@@ -19,7 +19,6 @@ function(get_excluded_meta_examples)
read_config_header(${shogun_INCLUDE_DIR}/shogun/lib/config.h)
is_set(${SHOGUN_CONFIG} HAVE_NLOPT)
is_set(${SHOGUN_CONFIG} USE_GPL_SHOGUN)
is_set(${SHOGUN_CONFIG} HAVE_LAPACK)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want to keep that, there might be other examples

@@ -37,13 +36,6 @@ function(get_excluded_meta_examples)
)
ENDIF()

IF(NOT HAVE_LAPACK)
LIST(APPEND EXCLUDED_META_EXAMPLES
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please keep the block, so we can add things there that pop up in the future, just don't append anything

@@ -8,7 +8,6 @@
*/
#include <shogun/lib/config.h>

#ifdef HAVE_LAPACK
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah so this guard was just outdated?

I think we should use linalg rather than eigen3 directly inside though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just outdated. I will look into that to find the possibility of using pure linalg.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found eigen_kernel_matrix.diagonal() += eigen_tau;. We are not able to implement adding a vector to a diagonal with SGMatrix or linalg. Any suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave it as it is for now (you can already change the rest)
Then you could create this method in linalg, and then send another PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, you could just add a loop for now.

@vinx13
Copy link
Member Author

vinx13 commented Jan 16, 2018

both MahalanobisDistance and LinearRidgeRegression are covered by integration tests.

@karlnapf
Copy link
Member

I think the LAPACK guard was still necessary.
BTW if you remove such guards, make sure that Shogun compiles if the library that you guarded is not available. Really important.

The windows build doesnt have lapack installed, and here you go:

[00:29:53]   C:\projects\shogun\src\shogun\distance\MahalanobisDistance.cpp(56): error C2039: 'inverse': is not a member of 'shogun::SGMatrix<float64_t>' [C:\projects\shogun\build\src\shogun\libshogun.vcxproj]
[00:29:53]   C:\projects\shogun\src\shogun\distance\MahalanobisDistance.cpp(56): error C3861: 'inverse': identifier not found [C:\projects\shogun\build\src\shogun

@karlnapf
Copy link
Member

karlnapf commented Jan 17, 2018

It is this line:
https://github.com/shogun-toolbox/shogun/blob/develop/src/shogun/distance/MahalanobisDistance.cpp#L58

But actually this code is bad generally (not your fault). One should never ever store matrix inverse. Instead, it should store a cholesky factorization of the covariance matrix. Then, rather than multiplying the inverse covariance matrix to a vector in the CMahalanobisDistance::compute method, you should do a cholesky (triangular) solve. See (https://github.com/shogun-toolbox/shogun/blob/develop/src/shogun/classifier/LDA.cpp#L180 for an example how to do this in linalg

@vinx13
Copy link
Member Author

vinx13 commented Jan 18, 2018

@karlnapf Since cov matrix is PSD, we should use LDLT (linalg is using LLT) for Cholesky decomposition, right? Shall I add it to linalg?

@karlnapf
Copy link
Member

The matrix is PSD indeed, so LDLT would be more stable.
https://eigen.tuxfamily.org/dox/group__TutorialLinearAlgebra.html

I think adding an argument to cholesky would be good, so that the factorization can operate in both modes, something like or use_ldlt which by default is false. Also make sure to update the docs appropriately so it is clear what is used.

Good catch!

@vinx13
Copy link
Member Author

vinx13 commented Jan 18, 2018

Seems simply adding a use_ldlt option will not work. In LDLT, we want both L and D. One solution is to overload cholesky_factor as SGMatrix<T> cholesky_factor(const SGMatrix<T>& A, SGMatrix<T> &D). Since D is diagonal, alternatively we could use a SGVector. Besides, we may need to add a LDLT solver.

@karlnapf
Copy link
Member

You are absolutely right. In fact I now remember a discussion we previously had about this some years back. I think a clean solution would be to expose a new method and a few solver method as well. Everything else is too messy.

@OXPHOS
Copy link
Member

OXPHOS commented Jan 20, 2018

We did have some discussion years ago. The best way seems to be having separate methods for LLT and LDLT. And since now the linalg methods are split into several files, having more than one cholesky decomposition or solver would be less messy...

@vinx13
Copy link
Member Author

vinx13 commented Jan 20, 2018

I see. I'll try to implement that.

* @see <a href="http://en.wikipedia.org/wiki/Mahalanobis_distance">
* Wikipedia: Mahalanobis Distance</a>
*/
class CMahalanobisDistance : public CRealDistance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW why this reformatting?
Pls don't put this into a PR that changes small bits as this one. Makes everything much harder to review

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the style checker changed other codes than mine. I'm using command like git clang-format-3.8 --commit xxx , is it right ?

eigen_kernel_matrix = eigen_feats_matrix*eigen_feats_matrix.transpose();

eigen_kernel_matrix.diagonal() += eigen_tau;
linalg::matrix_prod(feats_matrix, feats_matrix, kernel_matrix, false, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice much cleaner

return false;
}
eigen_y = llt.solve(eigen_y);
auto decomposition = linalg::cholesky_factor(kernel_matrix);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you change the indentation? Is this the style cheker?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed that the original code is indented with spaces.

@@ -0,0 +1,58 @@
/*
* Copyright (c) The Shogun Machine Learning Toolbox
* Written (w) 2018 Wuwei Lin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We changed the license header recently, to something much shorter. Check other files for the template ...

@karlnapf
Copy link
Member

karlnapf commented Feb 5, 2018

Could you pls split the PR into two: one for regression and one for the distance.
You still haven't checked whether shogun compiles with the guard remove and no lapack installed.
Pls don't remove guards without checking whether things still work: https://ci.appveyor.com/project/vigsterkr/shogun/build/1873#L6478

@vinx13
Copy link
Member Author

vinx13 commented Feb 5, 2018

@karlnapf Created a new PR for LinearRidgeRegression (#4149 ). I will wait until #4130 be merged before I can continue with the refactor of MahalanobisDistance. Can I force update this PR now to drop commits with regression ?

@karlnapf
Copy link
Member

karlnapf commented Feb 5, 2018

yep, you can just force update this one now

@karlnapf
Copy link
Member

karlnapf commented Feb 5, 2018

But you will need to fix the "inverse" problem (see above) first.

@vinx13 vinx13 changed the title Replace lapack with linalg Replace lapack with linalg in MahalanobisDistance Feb 5, 2018

if ( l == r)
{
mean = ((CDenseFeatures<float64_t>*) l)->get_mean();
icov = ((CDenseFeatures<float64_t>*) l)->get_cov();
cov = ((CDenseFeatures<float64_t>*)l)->get_cov();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while you are changing these lines, could you please replace these explicit casts to dynamic_cast at least plz? :)
or even better do a REQUIRED assertation for the type

}
else
{
mean = ((CDenseFeatures<float64_t>*)l)
->compute_mean((CDotFeatures*)lhs, (CDotFeatures*)rhs);
icov = CDotFeatures::compute_cov((CDotFeatures*) lhs, (CDotFeatures*) rhs);
cov = CDotFeatures::compute_cov((CDotFeatures*)lhs, (CDotFeatures*)rhs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another casting


float64_t result = cblas_ddot(v.vlen, v.vector, 1, diff.vector, 1);
auto v = linalg::ldlt_solver(chol_cov_L, chol_cov_d, chol_cov_p, diff);
float64_t result = linalg::dot(v, diff);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be auto or? :)

@shogun-toolbox shogun-toolbox deleted a comment from vigsterkr Feb 6, 2018
@shogun-toolbox shogun-toolbox deleted a comment from vigsterkr Feb 6, 2018
@@ -39,20 +37,43 @@ bool CMahalanobisDistance::init(CFeatures* l, CFeatures* r)
{
CRealDistance::init(l, r);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this call covers all the checks, so we don't need them again in this call (just convolutes the code)
@vigsterkr both REAL and DENSE is checked. That was the whole point of having the base class here iirc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do a cast after the call and add a comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would actually change the design of this and have instead of init returning a boolean the actual typecasted values available... as i dont see really the point of an assertation failure that will just throw an exception and in case of success have a true :)
this function call should actually tell us that yes the types are cool and here you go this is what you wanted anyways

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vinx13 you can ignore most of this... just make sure that you do an ASSERT(CRealDistance::init(l, r)); and add a //FIXME: above with reference to the above comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vigsterkr I mean this whole type system in the features is messed up. I would just rely on dynamic_cast everywhere. The bool is also pointless. And the enums are even worse (too many types, I would make those strings or even better drop them)

"Right hand side (was %s) has to be CDenseFeatures instance.\n",
rhs->get_name());

REQUIRE(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha I wrote the same thing above ;) sorry didnt see

@@ -63,9 +84,10 @@ void CMahalanobisDistance::cleanup()

float64_t CMahalanobisDistance::compute(int32_t idx_a, int32_t idx_b)
{
auto feat_l = static_cast<CDenseFeatures<float64_t>*>(lhs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is reliable due to old code design principles by sören

@@ -39,20 +37,43 @@ bool CMahalanobisDistance::init(CFeatures* l, CFeatures* r)
{
CRealDistance::init(l, r);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of the REQUIRED sections it should be just check the return type of CRealDistance::init(l, r) as it checks for everything you need want....

diff.vlen, diff.vector, 1, 0.0, v.vector, 1);

float64_t result = cblas_ddot(v.vlen, v.vector, 1, diff.vector, 1);
auto v = linalg::ldlt_solver(chol_cov_L, chol_cov_d, chol_cov_p, diff);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is much much from a numerical perspective!

@vinx13
Copy link
Member Author

vinx13 commented Feb 18, 2018

@karlnapf could you plz take a look

@@ -37,22 +35,31 @@ CMahalanobisDistance::~CMahalanobisDistance()

bool CMahalanobisDistance::init(CFeatures* l, CFeatures* r)
{
CRealDistance::init(l, r);
// FIXME: See comments in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixme?

@karlnapf karlnapf merged commit bb72427 into shogun-toolbox:develop Feb 19, 2018
ktiefe pushed a commit to ktiefe/shogun that referenced this pull request Jul 30, 2019
* Fix broken link in doc

* Use linalg in MahalanobisDistance

* Apply autoformatter

* Remove MehalanobisDistance from LAPACK guarded list

* Add lapack guard back

* Update license header style

* Save ldlt instead of inverse of cov

* Replace explict casts and add type check

* Remove unnecessary type checks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants