Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify LRR via using CDenseFeatures interface for cov, gram, sum #4384

Merged
merged 1 commit into from
Aug 8, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,21 @@
Linear Ridge Regression
=======================

A linear ridge regression model can be defined as :math:`y_i = \bf{w}^\top\bf{x_i} + b` where :math:`y_i` is the predicted value, :math:`\bf{x_i}` is a feature vector, :math:`\bf{w}` is the weight vector, and :math:`b` is a bias term.
A linear ridge regression model can be defined as :math:`y_i = \bf{w}^\top\bf{x_i}` where :math:`y_i` is the predicted value, :math:`\bf{x_i}` is a feature vector, :math:`\bf{w}` is the weight vector.
We aim to find the linear function that best explains the data, i.e. minimizes the squared loss plus a :math:`L_2` regularization term. One can show the solution can be written as:

.. math::
{\bf w}=\left(\tau I_{D}+XX^{\top}\right)^{-1}X^{\top}y

where :math:`X=\left[{\bf x}_{1},\dots{\bf x}_{N}\right]\in\mathbb{R}^{D\times N}` is the training data matrix, containing :math:`N` training samples of dimension :math:`D`, :math:`y=[y_{1},\dots,y_{N}]^{\top}\in\mathbb{R}^{N}` are the labels, and :math:`\tau>0` scales the regularization term.

The bias term is computed as :math:`b=\frac{1}{N}\sum_{i=1}^{N}y_{i}-{\bf w}\cdot\bar{\mathbf{x}}`, where :math:`\bar{\mathbf{x}}=\frac{1}{N}\sum_{i=1}^{N}{\bf x}_{i}`.
Alternatively if :math:`D>N`, the solution can be written as
.. math::
{\bf w}=X\left(\tau I_{N}+X^{\top}X\right)^{-1}y

In practice, an additional bias :math:`b=\frac{1}{N}\sum_{i=1}^{N}y_{i}-{\bf w}\cdot\bar{\mathbf{x}}` for
:math:`\bar{\mathbf{x}}=\frac{1}{N}\sum_{i=1}^{N}{\bf x}_{i}` can also be included, which effectively centers the :math:`X` before
computing the solution.

For the special case when :math:`\tau = 0`, a wrapper class :sgclass:`CLeastSquaresRegression` is available.

Expand All @@ -34,6 +40,10 @@ After training, we can extract :math:`{\bf w}` and the bias.

.. sgexample:: linear_ridge_regression.sg:extract_w

We could also have trained without bias and set it manually.

.. sgexample:: linear_ridge_regression.sg:manual_bias

Finally, we can evaluate the :sgclass:`CMeanSquaredError`.

.. sgexample:: linear_ridge_regression.sg:evaluate_error
Expand Down
11 changes: 10 additions & 1 deletion examples/meta/src/regression/linear_ridge_regression.sg
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,19 @@ real b = lrr.get_real("bias")
RealVector w = lrr.get_real_vector("w")
#[!extract_w]

#[!manual_bias]
Machine lrr2 = machine("LinearRidgeRegression", tau=0.001, labels=labels_train, use_bias=False)
lrr2.train(features_train)
real my_bias = 0.1
lrr2.put("bias", my_bias)
Labels labels_predict2 = lrr2.apply(features_test)
#[!manual_bias]

#![evaluate_error]
Evaluation eval = evaluation("MeanSquaredError")
real mse = eval.evaluate(labels_predict, labels_test)
#![evaluate_error]

# integration testing variables
RealVector output = labels_test.get_real_vector("labels")
RealVector output = labels_predict.get_real_vector("labels")
RealVector output2 = labels_predict2.get_real_vector("labels")
6 changes: 0 additions & 6 deletions src/interfaces/swig/Regression.i
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,6 @@
%rename(Regression) CRegression;
%rename(KernelRidgeRegression) CKernelRidgeRegression;
%rename(KRRNystrom) CKRRNystrom;
%rename(LinearRidgeRegression) CLinearRidgeRegression;
%rename(LeastSquaresRegression) CLeastSquaresRegression;
%rename(LeastAngleRegression) CLeastAngleRegression;
%rename(LibSVR) CLibSVR;
%rename(LibLinearRegression) CLibLinearRegression;
%rename(MKL) CMKL;
Expand All @@ -25,9 +22,6 @@
%include <shogun/regression/Regression.h>
%include <shogun/regression/KernelRidgeRegression.h>
%include <shogun/regression/KRRNystrom.h>
%include <shogun/regression/LinearRidgeRegression.h>
%include <shogun/regression/LeastSquaresRegression.h>
%include <shogun/regression/LeastAngleRegression.h>
%include <shogun/regression/svr/LibSVR.h>
%include <shogun/regression/svr/LibLinearRegression.h>
%include <shogun/classifier/mkl/MKL.h>
Expand Down
3 changes: 0 additions & 3 deletions src/interfaces/swig/Regression_includes.i
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,6 @@
#include <shogun/regression/GaussianProcessRegression.h>
#include <shogun/regression/KernelRidgeRegression.h>
#include <shogun/regression/KRRNystrom.h>
#include <shogun/regression/LinearRidgeRegression.h>
#include <shogun/regression/LeastSquaresRegression.h>
#include <shogun/regression/LeastAngleRegression.h>
#include <shogun/classifier/svm/SVM.h>
#include <shogun/classifier/svm/LibSVM.h>
#include <shogun/regression/svr/LibSVR.h>
Expand Down
2 changes: 2 additions & 0 deletions src/interfaces/swig/factory.i
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
%include <shogun/util/factory.h>

%template(features) shogun::features<float64_t>;
%template(labels) shogun::labels<float64_t>;


%newobject shogun::string_features(CFile*, EAlphabet alpha = DNA, EPrimitiveType primitive_type = PT_CHAR);
%newobject shogun::transformer(const std::string&);
2 changes: 1 addition & 1 deletion src/shogun/classifier/LDA.h
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,7 @@ class CLDA : public CDenseRealDispatch<CLDA, CLinearMachine>
/** @return object name */
virtual const char* get_name() const { return "LDA"; }

protected:
/** train LDA classifier
*
* @param data training data (parameter can be avoided if distance or
Expand All @@ -186,7 +187,6 @@ class CLDA : public CDenseRealDispatch<CLDA, CLinearMachine>
std::is_floating_point<ST>::value>>
bool train_machine_templated(CDenseFeatures<ST>* data);

protected:
/**
* Train the machine with the svd-based solver (@see CFisherLDA).
* @param features training data
Expand Down
60 changes: 60 additions & 0 deletions src/shogun/features/DenseFeatures.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,19 @@
#include <algorithm>
#include <string.h>

#define ASSERT_FLOATING_POINT \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lisitsyn @iglesias @vigsterkr our template style for features doesnt allow me to use type traits to define member functions of templated features, as all template types are instantiated. This is why I added this runtime check for floating point numbers. We can remove that once the features are cleaned up a bit. It for now just stops callers from doing nonsense (compute mean on bools or ints)

switch (get_feature_type()) \
{ \
case F_SHORTREAL: \
case F_DREAL: \
case F_LONGREAL: \
break; \
default: \
REQUIRE( \
false, "Only defined for %s with real type, not for %s.\n", \
get_name(), demangled_type<ST>().c_str()); \
}

namespace shogun {

template<class ST> CDenseFeatures<ST>::CDenseFeatures(int32_t size) : CDotFeatures(size)
Expand Down Expand Up @@ -1001,6 +1014,53 @@ template< class ST > CDenseFeatures< ST >* CDenseFeatures< ST >::obtain_from_gen
return (CDenseFeatures< ST >*) base_features;
}

template <typename ST>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be even cooler if we moved those methods to CDotFeatures and then specialize them in here for batch computation if not subsets are set (otherwise call CDotFeatures::cov which has an iterative implementation)

SGVector<ST> CDenseFeatures<ST>::sum() const
{
// TODO optimize non batch mode, but get_feature_vector is non const :(
SGVector<ST> result = linalg::rowwise_sum(get_feature_matrix());
return result;
}

template <typename ST>
SGVector<ST> CDenseFeatures<ST>::mean() const
{
ASSERT_FLOATING_POINT

auto result = sum();
ST scale = ((ST)1.0) / get_num_vectors();
linalg::scale(result, result, scale);
return result;
}

template <typename ST>
SGMatrix<ST> CDenseFeatures<ST>::cov() const
{
// TODO optimize non batch mode, but get_feature_vector is non const :(
auto mat = get_feature_matrix();
return linalg::matrix_prod(mat, mat, false, true);
}

template <typename ST>
SGMatrix<ST> CDenseFeatures<ST>::gram() const
{
// TODO optimize non batch mode, but get_feature_vector is non const :(
auto mat = get_feature_matrix();
return linalg::matrix_prod(mat, mat, true, false);
}

template <typename ST>
SGVector<ST> CDenseFeatures<ST>::dot(const SGVector<ST>& other) const
{
REQUIRE(
get_num_vectors() == other.size(), "Number of feature vectors (%d) "
"must match provided vector's size "
"(%d).\n",
get_num_features(), other.size());
// TODO optimize non batch mode, but get_feature_vector is non const :(
return linalg::matrix_prod(get_feature_matrix(), other, false);
}

template class CDenseFeatures<bool>;
template class CDenseFeatures<char>;
template class CDenseFeatures<int8_t>;
Expand Down
47 changes: 43 additions & 4 deletions src/shogun/features/DenseFeatures.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,13 @@

#include <shogun/lib/config.h>

#include <shogun/lib/common.h>
#include <shogun/lib/Cache.h>
#include <shogun/io/File.h>
#include <shogun/features/DotFeatures.h>
#include <shogun/features/StringFeatures.h>
#include <shogun/io/File.h>
#include <shogun/lib/Cache.h>
#include <shogun/lib/DataType.h>

#include <shogun/lib/SGMatrix.h>
#include <shogun/lib/common.h>

namespace shogun {
template<class ST> class CStringFeatures;
Expand Down Expand Up @@ -303,6 +302,46 @@ template<class ST> class CDenseFeatures: public CDotFeatures
virtual float64_t dot(int32_t vec_idx1, CDotFeatures* df,
int32_t vec_idx2);

/** Computes the sum of all feature vectors
* @return Sum of all feature vectors
*/
SGVector<ST> sum() const;

/** Computes the empirical mean of all feature vectors
* @return Mean of all feature vectors
*/
SGVector<ST> mean() const;

/** Computes the \f$DxD\f$ (uncentered, un-normalized) covariance matrix
*
*\f[
* X X^\top
* \f]
*
* where \f$X\f$ is the \f$DxN\f$ dimensional feature matrix with \f$N\f$
* feature vectors of dimension \f$D\f$.
*/
SGMatrix<ST> cov() const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to call it noncentered_cov or something similar that makes the fact that is not centered explicit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I first met @lisitsyn I am in favour of short function names.
I was thinking about giving it a boolean flag for centring at first but then didnt do it. Could re-introduce it in a case where this is needed.
Generally, I would prefer to move towards making clear cuts of functionality, i.e. have a data centring module come first, and then doing the cov.
A preprocessor could also be responsible for this (at the cost of speedy bulk operations), then the code becomes less convoluted (mean is computed/removed at many places in the code)
What do you think>?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having a function called covariance that does not compute the covariance can lead to confusion :-P

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it computes the covariance, just for a certain case of input.
It is also not called covariance_for_column_major_matrix ;)
If you feel strong, I can change it, otherwise I would just outsource mean computations into different code..
On another note, I wanna mimic the dot API of DotFeatures, but just templated. What are your thoughts on that?

/** Computes the \f$fNxN\f$ (uncentered, un-normalized) gram matrix of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely minor: missing newline.

* pairwise dot products, that is
*
*\f[
* X^\top X
* \f]
*
* where \f$X\f$ is the \f$DxN\f$ dimensional feature matrix with \f$N\f$
* feature vectors of dimension \f$D\f$.
*/
SGMatrix<ST> gram() const;

/** Computes the dot product of the feature matrix with a given vector.
*
* @param other Vector to compute dot products with, size must match number
* of feature vectors
* @return Vector as many entries as feature dimensions
*/
SGVector<ST> dot(const SGVector<ST>& other) const;

/** compute dot product between vector1 and a dense vector
*
* possible with subset
Expand Down
9 changes: 9 additions & 0 deletions src/shogun/lib/SGVector.h
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,15 @@ template<class T> class SGVector : public SGReferencedData

/** Wraps an Eigen3 row vector around the data of this matrix */
operator EigenRowVectorXtMap() const;

/** @return a (copied) typed vector with same content */
template <class X>
SGVector<X> as() const
{
SGVector<X> v(vlen);
std::copy(v.begin(), v.end(), vector);
return v;
}
#endif // SWIG

/** Set vector to a constant
Expand Down
4 changes: 3 additions & 1 deletion src/shogun/machine/LinearMachine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,9 @@ void CLinearMachine::init()

SG_ADD(&m_w, "w", "Parameter vector w.", MS_NOT_AVAILABLE);
SG_ADD(&bias, "bias", "Bias b.", MS_NOT_AVAILABLE);
SG_ADD(&features, "features", "Feature object.", MS_NOT_AVAILABLE);
SG_ADD(
(CFeatures**)&features, "features", "Feature object.",
MS_NOT_AVAILABLE);
}


Expand Down
25 changes: 6 additions & 19 deletions src/shogun/regression/LeastSquaresRegression.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,25 +18,12 @@

namespace shogun
{
/** @brief class to perform Least Squares Regression
*
* Internally it is solved via minimizing the following system
*
* \f[
* \frac{1}{2}\left(\sum_{i=1}^N(y_i-{\bf w}\cdot {\bf x}_i)^2\right)
* \f]
*
* which boils down to solving the linear system
*
* \f[
* {\bf w} = \left(\sum_{i=1}^N{\bf x}_i{\bf x}_i^T\right)^{-1}\left(\sum_{i=1}^N y_i{\bf x}_i\right)
* \f]
* where x are the training examples and y the vector of labels.
*
* The expressed solution is a linear method with bias 0 (cf. CLinearMachine).
*/
class CLeastSquaresRegression : public CLinearRidgeRegression
{
/** @brief class to perform Least Squares Regression
*
* Same as CLinearRidgeRegression, but without a regularization term.
*/
class CLeastSquaresRegression : public CLinearRidgeRegression
{
public:
/** problem type */
MACHINE_PROBLEM_TYPE(PT_REGRESSION);
Expand Down