New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify LRR via using CDenseFeatures interface for cov, gram, sum #4384
Conversation
@@ -1001,6 +1001,42 @@ template< class ST > CDenseFeatures< ST >* CDenseFeatures< ST >::obtain_from_gen | |||
return (CDenseFeatures< ST >*) base_features; | |||
} | |||
|
|||
template <typename ST> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be even cooler if we moved those methods to CDotFeatures
and then specialize them in here for batch computation if not subsets are set (otherwise call CDotFeatures::cov which has an iterative implementation)
linalg::scale(x_bar, x_bar, 1.0 / ((float64_t)x_bar.size())); | ||
auto intercept = linalg::mean(lab) - linalg::dot(lab, x_bar); | ||
set_bias(intercept); | ||
auto feats = data->as<CDenseFeatures<float64_t>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the idea if this patch is that inside the algorithm, we only work against a very high level interface of CFeatures (deals with subsets etc).
For labels, we always get the vector (which creates a copy for subsets). I think since it is just a vector, this is fine, although we could also think about adding an interface to CDot/DenseFeatures that computes the dot product with a CDenseLabels which then is batch without subsets and iterative(no copy) if there are....probably too complicated for the marginal benefits of avoiding a single vector copy
|
||
SGVector<float64_t> w; | ||
if (N >= D) | ||
w = solve_using_covariance<float64_t>(feats, y, m_tau); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also generalise LRR for cases where D>N
|
||
set_w(w); | ||
|
||
if (m_use_bias) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make bias compitation optional which is inline with other shogun methods
{ | ||
auto x_bar = feats->sum(); | ||
linalg::scale(x_bar, x_bar, ((float64_t)1.0) / N); | ||
auto intercept = linalg::mean(y) - linalg::dot(w, x_bar); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency in the computations of features and labels, is it possible to do some linalg::mean
of features?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to CDenseFeatures::mean (locally, will push soon)
but the point here is: labels are directly used via SGVector (with a copy happening if there is a subset), while for the features, we dont touch the feature matrix
auto N = feats->get_num_vectors(); | ||
auto D = feats->get_num_features(); | ||
|
||
SGVector<float64_t> w; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the model contains both w and b and what we are doing is first find w, ignoring that b exists, and then find b provided the found w.
Is this standard practice in linear regression? I am a bit surprised that the equations to find w do not depend on the b since the gradient of the loss wrt to w has b.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you derive it via differentiating the squared loss wrt to w and b, you will get that
e.g. https://are.berkeley.edu/courses/EEP118/current/derive_ols.pdf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. My question was probably wrong. Even if b does not appear explicitly in the expression for w, that does not mean it is not in there. It has just been substituted.
However, then the expression for w should be in principle different for the cases (1) w/ bias; (2) w/o bias. No?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right! the expression for w was wrong, it assumed no bias. I am fixing it right now
Great stuff!
…On Wed, Jul 25, 2018, 19:40 Heiko Strathmann ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/shogun/regression/LinearRidgeRegression.cpp
<#4384 (comment)>
:
> - linalg::matrix_prod(feats_matrix, lab, y);
-
- auto decomposition = linalg::cholesky_factor(kernel_matrix);
- y = linalg::cholesky_solver(decomposition, y);
- set_w(y);
-
- auto x_bar = linalg::colwise_sum(feats_matrix);
- linalg::scale(x_bar, x_bar, 1.0 / ((float64_t)x_bar.size()));
- auto intercept = linalg::mean(lab) - linalg::dot(lab, x_bar);
- set_bias(intercept);
+ auto feats = data->as<CDenseFeatures<float64_t>>();
+ auto y = regression_labels(m_labels)->get_labels();
+ auto N = feats->get_num_vectors();
+ auto D = feats->get_num_features();
+
+ SGVector<float64_t> w;
You are right! the expression for w was wrong, it assumed no bias. I am
fixing it right now
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#4384 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABGrdgiQjyaOykYlntcul1WicNKaDNABks5uKK2qgaJpZM4VbIUW>
.
|
linalg::add_ridge(cov, m_tau); | ||
auto L = linalg::cholesky_factor(cov); | ||
|
||
if (m_use_bias) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iglesias voila! :)
If you could do the math and confirm this, that would be neat! :)
e204643
to
b135e00
Compare
Even nicer exercise for gsocers ;-)
It would fit very nicely in a Jupyter nb.
…On Thu, Jul 26, 2018, 20:38 Heiko Strathmann ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/shogun/regression/LinearRidgeRegression.cpp
<#4384 (comment)>
:
> + SGVector<T> x_bar;
+ T y_bar;
+ if (m_use_bias)
+ {
+ x_bar = feats->mean();
+ y_bar = linalg::mean(y);
+ }
+
+ SGVector<float64_t> w;
+ if (N >= D)
+ {
+ SGMatrix<T> cov = feats->cov();
+ linalg::add_ridge(cov, m_tau);
+ auto L = linalg::cholesky_factor(cov);
+
+ if (m_use_bias)
@iglesias <https://github.com/iglesias> voila! :)
If you could do the math and confirm this, that would be neat! :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4384 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABGrdiQQ3jdP9ty22kixpgbJes1K-kX5ks5uKgyMgaJpZM4VbIUW>
.
|
799788d
to
4641206
Compare
@@ -19,6 +19,19 @@ | |||
#include <algorithm> | |||
#include <string.h> | |||
|
|||
#define ASSERT_FLOATING_POINT \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lisitsyn @iglesias @vigsterkr our template style for features doesnt allow me to use type traits to define member functions of templated features, as all template types are instantiated. This is why I added this runtime check for floating point numbers. We can remove that once the features are cleaned up a bit. It for now just stops callers from doing nonsense (compute mean on bools or ints)
{ | ||
REQUIRE(m_labels,"No labels set\n"); | ||
|
||
if (!data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks to @shubham808 s work, we can now delete such redundant codes, while making algorithms fully templated
*/ | ||
class CLinearRidgeRegression : public CLinearMachine | ||
class CLinearRidgeRegression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shubham808 tada! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks smooth :)
5969cad
to
96989cd
Compare
auto N = feats->get_num_vectors(); | ||
auto D = feats->get_num_features(); | ||
|
||
auto y = regression_labels(m_labels)->get_labels()->as<T>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what we are paying for mixing templated and non templated types (here features and labels)
auto b = linalg::cholesky_solver(L, y); | ||
w = feats->dot(b); | ||
} | ||
set_w(w->as<float64_t>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what we are paying for mixing templated and non templated types (here LinearRidgeRegression and features)
6ddc11b
to
af49430
Compare
auto N = feats->get_num_vectors(); | ||
auto D = feats->get_num_features(); | ||
|
||
auto y = regression_labels(m_labels)->get_labels().as<T>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copying the labels vector to get the correctly typed version of y (consequence of float64 labels vs templated algorithm impl)
auto b = linalg::cholesky_solver(L, y); | ||
w = feats->dot(b); | ||
} | ||
set_w(w.as<float64_t>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copying the labels vector to get the correctly typed version of y (consequence of float64 w vs templated algorithm impl)
Need to ignore the stylechecker since class_list.py assumes that classes are defined in a single line, so cannot apply this: https://travis-ci.org/shogun-toolbox/shogun/jobs/410368055#L746 |
587aa7d
to
3afc831
Compare
* feature vectors of dimension \f$D\f$. | ||
*/ | ||
SGMatrix<ST> cov() const; | ||
/** Computes the \f$fNxN\f$ (uncentered, un-normalized) gram matrix of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely minor: missing newline.
* where \f$X\f$ is the \f$DxN\f$ dimensional feature matrix with \f$N\f$ | ||
* feature vectors of dimension \f$D\f$. | ||
*/ | ||
SGMatrix<ST> cov() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to call it noncentered_cov or something similar that makes the fact that is not centered explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I first met @lisitsyn I am in favour of short function names.
I was thinking about giving it a boolean flag for centring at first but then didnt do it. Could re-introduce it in a case where this is needed.
Generally, I would prefer to move towards making clear cuts of functionality, i.e. have a data centring module come first, and then doing the cov.
A preprocessor could also be responsible for this (at the cost of speedy bulk operations), then the code becomes less convoluted (mean is computed/removed at many places in the code)
What do you think>?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having a function called covariance that does not compute the covariance can lead to confusion :-P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it computes the covariance, just for a certain case of input.
It is also not called covariance_for_column_major_matrix ;)
If you feel strong, I can change it, otherwise I would just outsource mean computations into different code..
On another note, I wanna mimic the dot
API of DotFeatures, but just templated. What are your thoughts on that?
Of course it is good, no strong feelings :-)
I see the DenseFeatures::dot is a template already, and I am guessing the
call to linalg it does is to another template as well. What's your idea?
…On Sat, Aug 4, 2018, 16:07 Heiko Strathmann ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/shogun/features/DenseFeatures.h
<#4384 (comment)>
:
> +
+ /** Computes the empirical mean of all feature vectors
+ * @return Mean of all feature vectors
+ */
+ SGVector<ST> mean() const;
+
+ /** Computes the \f$DxD\f$ (uncentered, un-normalized) covariance matrix
+ *
+ *\f[
+ * X X^\top
+ * \f]
+ *
+ * where \f$X\f$ is the \f$DxN\f$ dimensional feature matrix with \f$N\f$
+ * feature vectors of dimension \f$D\f$.
+ */
+ SGMatrix<ST> cov() const;
Well it computes the covariance, just for a certain case of input.
It is also not called covariance_for_column_major_matrix ;)
If you feel strong, I can change it, otherwise I would just outsource mean
computations into different code..
On another note, I wanna mimic the dot API of DotFeatures, but just
templated. What are your thoughts on that?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4384 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABGrdn5p_oUKHR9cEowA-E37dcZehphOks5uNaqHgaJpZM4VbIUW>
.
|
We have |
@vinx13 do you have any idea on this error: https://travis-ci.org/shogun-toolbox/shogun/jobs/410945599#L2476 |
i don't have seen that before, but i will try to fix it |
The thing is I cannot reproduce this locally or on the buildbot. Similar error occur when gtest.h is not included as the first include in the unit tests |
I can reproduce locally with g++ 5.4.1 |
Sure, that sounds good. At the very least, to start experiment.
…On Sun, 5 Aug 2018 at 14:30, Heiko Strathmann ***@***.***> wrote:
We have virtual float64_t CDotFeatures::dot=0, which is a quite cool
design, as independently of any feature type, we can write code and
algorithms that work with these dot product features.
The problem, however, is the return type. Think about linear regression
and cov, and 32bit features. We could of course write an iterative cov
implementation that fills a 32bit cov matrix using the above API and
elementwise casting. We cannot, however, do a batch version without copying
the matrix. That is the first issue. The second issue is that the API is
not const, due to the on-the-fly features that are computed in a non
const inplace way.
So maybe we can build a second version of the API (keeping the old
intact), that serves a similar purpose, but that is both const and
templated. Thoughts?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4384 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABGrdgwOJKKOcrlNMKeSSuQfc9g4mi3Dks5uNuVPgaJpZM4VbIUW>
.
|
it is
#include <shogun/mathematics/linalg/LinalgNamespace.h> in DenseFeatures.h that causes the error about vairant. but i still don't see why.
…On Aug 6 2018, at 8:51 pm, Heiko Strathmann ***@***.***> wrote:
The thing is I cannot reproduce this locally or on the buildbot. Similar error occur when gtest.h is not included as the first include in the unit tests
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub ***@***.***/0?redirect=https%3A%2F%2Fgithub.com%2Fshogun-toolbox%2Fshogun%2Fpull%2F4384%23issuecomment-410697112&recipient=cmVwbHkrMDA2ZGNjNWJlMmRlOGQzMTIzMTVhODYzZmRiODJiYzQ0N2I4MTMyOTcxZGNkNzRlOTJjZjAwMDAwMDAxMTc4MDA2Mzc5MmExNjljZTE0N2JlODMwQHJlcGx5LmdpdGh1Yi5jb20%3D), or mute the thread ***@***.***/1?redirect=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAG3MW_F3rK1YegVvjT1hU0qKl9ZlKgXOks5uODu3gaJpZM4VbIUW&recipient=cmVwbHkrMDA2ZGNjNWJlMmRlOGQzMTIzMTVhODYzZmRiODJiYzQ0N2I4MTMyOTcxZGNkNzRlOTJjZjAwMDAwMDAxMTc4MDA2Mzc5MmExNjljZTE0N2JlODMwQHJlcGx5LmdpdGh1Yi5jb20%3D).
|
Aaaah it is probably the include order .... Can you try moving it around? |
src/shogun/features/DenseFeatures.h
Outdated
#include <shogun/lib/SGMatrix.h> | ||
#include <shogun/lib/common.h> | ||
#include <shogun/mathematics/linalg/LinalgNamespace.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be moved to .cpp file so that the inclusion order problem can be fixed
This thing of the include order is scary :-O
Are we missing guarding some header?
…On Tue, Aug 7, 2018, 17:14 Wuwei Lin ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/shogun/features/DenseFeatures.h
<#4384 (comment)>
:
> #include <shogun/lib/SGMatrix.h>
+#include <shogun/lib/common.h>
+#include <shogun/mathematics/linalg/LinalgNamespace.h>
this can be moved to .cpp file so that the inclusion order problem can be
fixed
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4384 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABGrdiyVTH469eYEPBUqJEACCKdoj0mvks5uOa7pgaJpZM4VbIUW>
.
|
ah yes of course, this was since I had some of the implementations in the header for some short time. |
…hogun-toolbox#4384) Also fix compile errors on gcc 6.3.0 via movinh gtest.h include to beginning
No description provided.