Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify LRR via using CDenseFeatures interface for cov, gram, sum #4384

Merged
merged 1 commit into from Aug 8, 2018

Conversation

karlnapf
Copy link
Member

No description provided.

@karlnapf karlnapf requested a review from iglesias July 23, 2018 14:46
@@ -1001,6 +1001,42 @@ template< class ST > CDenseFeatures< ST >* CDenseFeatures< ST >::obtain_from_gen
return (CDenseFeatures< ST >*) base_features;
}

template <typename ST>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be even cooler if we moved those methods to CDotFeatures and then specialize them in here for batch computation if not subsets are set (otherwise call CDotFeatures::cov which has an iterative implementation)

linalg::scale(x_bar, x_bar, 1.0 / ((float64_t)x_bar.size()));
auto intercept = linalg::mean(lab) - linalg::dot(lab, x_bar);
set_bias(intercept);
auto feats = data->as<CDenseFeatures<float64_t>>();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea if this patch is that inside the algorithm, we only work against a very high level interface of CFeatures (deals with subsets etc).
For labels, we always get the vector (which creates a copy for subsets). I think since it is just a vector, this is fine, although we could also think about adding an interface to CDot/DenseFeatures that computes the dot product with a CDenseLabels which then is batch without subsets and iterative(no copy) if there are....probably too complicated for the marginal benefits of avoiding a single vector copy


SGVector<float64_t> w;
if (N >= D)
w = solve_using_covariance<float64_t>(feats, y, m_tau);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also generalise LRR for cases where D>N


set_w(w);

if (m_use_bias)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make bias compitation optional which is inline with other shogun methods

{
auto x_bar = feats->sum();
linalg::scale(x_bar, x_bar, ((float64_t)1.0) / N);
auto intercept = linalg::mean(y) - linalg::dot(w, x_bar);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency in the computations of features and labels, is it possible to do some linalg::mean of features?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to CDenseFeatures::mean (locally, will push soon)

but the point here is: labels are directly used via SGVector (with a copy happening if there is a subset), while for the features, we dont touch the feature matrix

auto N = feats->get_num_vectors();
auto D = feats->get_num_features();

SGVector<float64_t> w;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the model contains both w and b and what we are doing is first find w, ignoring that b exists, and then find b provided the found w.

Is this standard practice in linear regression? I am a bit surprised that the equations to find w do not depend on the b since the gradient of the loss wrt to w has b.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you derive it via differentiating the squared loss wrt to w and b, you will get that
e.g. https://are.berkeley.edu/courses/EEP118/current/derive_ols.pdf

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. My question was probably wrong. Even if b does not appear explicitly in the expression for w, that does not mean it is not in there. It has just been substituted.

However, then the expression for w should be in principle different for the cases (1) w/ bias; (2) w/o bias. No?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right! the expression for w was wrong, it assumed no bias. I am fixing it right now

@iglesias
Copy link
Collaborator

iglesias commented Jul 25, 2018 via email

linalg::add_ridge(cov, m_tau);
auto L = linalg::cholesky_factor(cov);

if (m_use_bias)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iglesias voila! :)
If you could do the math and confirm this, that would be neat! :)

@karlnapf karlnapf force-pushed the feature/lrr branch 5 times, most recently from e204643 to b135e00 Compare July 26, 2018 19:12
@iglesias
Copy link
Collaborator

iglesias commented Jul 27, 2018 via email

@karlnapf karlnapf force-pushed the feature/lrr branch 3 times, most recently from 799788d to 4641206 Compare July 31, 2018 12:40
@@ -19,6 +19,19 @@
#include <algorithm>
#include <string.h>

#define ASSERT_FLOATING_POINT \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lisitsyn @iglesias @vigsterkr our template style for features doesnt allow me to use type traits to define member functions of templated features, as all template types are instantiated. This is why I added this runtime check for floating point numbers. We can remove that once the features are cleaned up a bit. It for now just stops callers from doing nonsense (compute mean on bools or ints)

{
REQUIRE(m_labels,"No labels set\n");

if (!data)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks to @shubham808 s work, we can now delete such redundant codes, while making algorithms fully templated

*/
class CLinearRidgeRegression : public CLinearMachine
class CLinearRidgeRegression
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubham808 tada! :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks smooth :)

@karlnapf karlnapf force-pushed the feature/lrr branch 2 times, most recently from 5969cad to 96989cd Compare July 31, 2018 13:19
auto N = feats->get_num_vectors();
auto D = feats->get_num_features();

auto y = regression_labels(m_labels)->get_labels()->as<T>();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what we are paying for mixing templated and non templated types (here features and labels)

auto b = linalg::cholesky_solver(L, y);
w = feats->dot(b);
}
set_w(w->as<float64_t>());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what we are paying for mixing templated and non templated types (here LinearRidgeRegression and features)

@karlnapf karlnapf force-pushed the feature/lrr branch 2 times, most recently from 6ddc11b to af49430 Compare July 31, 2018 14:57
auto N = feats->get_num_vectors();
auto D = feats->get_num_features();

auto y = regression_labels(m_labels)->get_labels().as<T>();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copying the labels vector to get the correctly typed version of y (consequence of float64 labels vs templated algorithm impl)

auto b = linalg::cholesky_solver(L, y);
w = feats->dot(b);
}
set_w(w.as<float64_t>());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copying the labels vector to get the correctly typed version of y (consequence of float64 w vs templated algorithm impl)

@karlnapf
Copy link
Member Author

Need to ignore the stylechecker since class_list.py assumes that classes are defined in a single line, so cannot apply this: https://travis-ci.org/shogun-toolbox/shogun/jobs/410368055#L746

@karlnapf karlnapf force-pushed the feature/lrr branch 12 times, most recently from 587aa7d to 3afc831 Compare August 1, 2018 18:39
* feature vectors of dimension \f$D\f$.
*/
SGMatrix<ST> cov() const;
/** Computes the \f$fNxN\f$ (uncentered, un-normalized) gram matrix of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely minor: missing newline.

* where \f$X\f$ is the \f$DxN\f$ dimensional feature matrix with \f$N\f$
* feature vectors of dimension \f$D\f$.
*/
SGMatrix<ST> cov() const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to call it noncentered_cov or something similar that makes the fact that is not centered explicit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I first met @lisitsyn I am in favour of short function names.
I was thinking about giving it a boolean flag for centring at first but then didnt do it. Could re-introduce it in a case where this is needed.
Generally, I would prefer to move towards making clear cuts of functionality, i.e. have a data centring module come first, and then doing the cov.
A preprocessor could also be responsible for this (at the cost of speedy bulk operations), then the code becomes less convoluted (mean is computed/removed at many places in the code)
What do you think>?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having a function called covariance that does not compute the covariance can lead to confusion :-P

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it computes the covariance, just for a certain case of input.
It is also not called covariance_for_column_major_matrix ;)
If you feel strong, I can change it, otherwise I would just outsource mean computations into different code..
On another note, I wanna mimic the dot API of DotFeatures, but just templated. What are your thoughts on that?

@iglesias
Copy link
Collaborator

iglesias commented Aug 4, 2018 via email

@karlnapf
Copy link
Member Author

karlnapf commented Aug 5, 2018

We have virtual float64_t CDotFeatures::dot=0, which is a quite cool design, as independently of any feature type, we can write code and algorithms that work with these dot product features.
The problem, however, is the return type. Think about linear regression and cov, and 32bit features. We could of course write an iterative cov implementation that fills a 32bit cov matrix using the above API and elementwise casting. We cannot, however, do a batch version without copying the matrix. That is the first issue. The second issue is that the API is not const, due to the on-the-fly features that are computed in a non const inplace way.
So maybe we can build a second version of the API (keeping the old intact), that serves a similar purpose, but that is both const and templated. Thoughts?

@karlnapf
Copy link
Member Author

karlnapf commented Aug 5, 2018

@vinx13 do you have any idea on this error: https://travis-ci.org/shogun-toolbox/shogun/jobs/410945599#L2476

@vinx13
Copy link
Member

vinx13 commented Aug 6, 2018

i don't have seen that before, but i will try to fix it

@karlnapf
Copy link
Member Author

karlnapf commented Aug 6, 2018

The thing is I cannot reproduce this locally or on the buildbot. Similar error occur when gtest.h is not included as the first include in the unit tests

@vinx13
Copy link
Member

vinx13 commented Aug 6, 2018

I can reproduce locally with g++ 5.4.1

@iglesias
Copy link
Collaborator

iglesias commented Aug 6, 2018 via email

@vinx13
Copy link
Member

vinx13 commented Aug 7, 2018 via email

@karlnapf
Copy link
Member Author

karlnapf commented Aug 7, 2018

Aaaah it is probably the include order .... Can you try moving it around?

#include <shogun/lib/SGMatrix.h>
#include <shogun/lib/common.h>
#include <shogun/mathematics/linalg/LinalgNamespace.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be moved to .cpp file so that the inclusion order problem can be fixed

@iglesias
Copy link
Collaborator

iglesias commented Aug 7, 2018 via email

@karlnapf
Copy link
Member Author

karlnapf commented Aug 8, 2018

ah yes of course, this was since I had some of the implementations in the header for some short time.
@iglesias it is scary indeed. I think this is due to the non-std template stuff we are doing in CDenseFeatures.....it is definitely not a guard. Same with gtest.h btw, has to be included first otherwise it blows.
@vinx13 thanks for the tip, I have updated, lets see :)

@karlnapf karlnapf merged commit 41888fe into shogun-toolbox:develop Aug 8, 2018
ktiefe pushed a commit to ktiefe/shogun that referenced this pull request Jul 30, 2019
…hogun-toolbox#4384)

Also fix compile errors on gcc 6.3.0 via movinh gtest.h include to beginning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants