Skip to content

Commit

Permalink
add psplines + adopt data structure + speed up linear learners
Browse files Browse the repository at this point in the history
  • Loading branch information
schalkdaniel committed Mar 28, 2018
1 parent 3609641 commit 99d1c58
Show file tree
Hide file tree
Showing 14 changed files with 779 additions and 62 deletions.
25 changes: 25 additions & 0 deletions other/notes.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
compboost Thesis:
============================

- Bei predictions mit neuen data ist nur source notwendig, kein target!
- compboost_module datei interface zwischen R und C++ (gut beschreiben)
- Dependencies: Nicht viele! Rcpp, RcppArmadillo
- Scope beschreiben! Es ist oft wichtig pass by reference zu machen, da dann Wrapper Klasse nicht gelöscht wird und dadurch kein destructor gecallt wird!!!
- Scoping und garbage collection beschreiben
- Polymorphism!! fast überall IterationLogger geht als Logger class durch!
- für lineares fitting wird das inverse von XtX weggespeichert damit nicht in jeder iteration neu!

- Dokumentation erklären R seite und c++ seite (roxygen vs. doxygen)
-

Beispiele:
- Logger ist so flexibel, dass meherere oob logger gleichzeitig tracken können mit verschiedenen lossen

Benchmark:
- Funktion schreiben, die wahren RAM misst und damit benchmarken!

compboost repo:
============================

- Readme section "For Developer"
- Wie wird kommentiert (doxygen, roxygen, ...)
20 changes: 19 additions & 1 deletion other/spline_test_vs_mboost.R
Original file line number Diff line number Diff line change
@@ -1,9 +1,27 @@
splines.cpp = "C:/Users/schal/OneDrive/github_repos/compboost/other/splines.cpp"
splines.cpp = "E:/OneDrive/github_repos/compboost/other/splines.cpp"

if (file.exists(splines.cpp)) {
Rcpp::sourceCpp(file = splines.cpp, rebuild = TRUE)
}

n.sim = 100000
p.sim = 32

X = matrix(runif(n.sim * p.sim), nrow = n.sim, ncol = p.sim)
pen = matrix(runif(p.sim^2), nrow = p.sim, ncol = p.sim)
y = runif(n.sim)
XtX = t(X) %*% X + 2.5 * pen

all.equal(testSolve(X, XtX, y), testInv(X, pen, y))

microbenchmark::microbenchmark(
"solve" = testSolve(X, pen, y),
"inv" = testInv(X, pen, y),
"R nnlv" = nnls::nnls(t(X) %*% X + 2.5 * pen, t(X) %*% y),
times = 10L
)


library(Matrix)

n.sim = 2000
Expand Down
8 changes: 4 additions & 4 deletions other/splines.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -223,13 +223,13 @@ arma::mat estimateSplines (const arma::vec& response, const arma::vec& x,
}

// [[Rcpp::export]]
arma::mat testSparse (arma::sp_mat X, arma::vec y)
arma::mat testSolve (arma::mat X, arma::mat XtX, arma::vec y)
{
return arma::spsolve(X, y, "lapack");
return arma::solve(XtX, X.t() * y);
}

// [[Rcpp::export]]
arma::mat testDense (arma::mat X, arma::vec y)
arma::mat testInv (arma::mat X, arma::mat XtX, arma::vec y)
{
return arma::solve(X, y);
return arma::inv(XtX) * X.t() * y;
}
131 changes: 128 additions & 3 deletions src/baselearner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ arma::mat PolynomialBlearner::instantiateData (const arma::mat& newdata)
void PolynomialBlearner::train (const arma::vec& response)
{
// parameter = arma::solve(data_ptr->getData(), response);
parameter = arma::inv(data_ptr->getData().t() * data_ptr->getData()) * data_ptr->getData().t() * response;
parameter = data_ptr->XtX_inv * data_ptr->getData().t() * response;
}

// Predict the learner:
Expand All @@ -178,7 +178,132 @@ arma::mat PolynomialBlearner::predict (data::Data* newdata)
// Destructor:
PolynomialBlearner::~PolynomialBlearner () {}

// CustomBlearner Baselearner:
// PSplienBlearner:
// ----------------------

/**
* \brief Constructor of `PSplineBlearner` class
*
* This constructor sets the members such as n_knots etc. The more computational
* complex data are stored within the data object which should be initialized
* first (e.g. in the factory or otherwise).
*
* One note about the used knots. The number of inner knots is specified
* by `n_knots`. These inner knots are then wrapped by the minimal and maximal
* value of the given data. For instance we have a feature
* \f[
* x = (1, 2, \dots, 2.5, 6)
* \f]
* and we want to have 3 knots, then the inner knots with boundaries are:
* \f[
* U = (1.00, 2.25, 3.50, 4.75, 6.00)
* \f]
* To get a full base these knots are wrapped by `degree` (\f$p\f$) numbers
* on either side. If we choose `degree = 2` then we have
* \f$n_\mathrm{knots} + 2(p + 1) = 3 + 2(2 + 1) 9\f$ final knots:
* \f[
* U = (-1.50, -0.25, 1.00, 2.25, 3.50, 4.75, 6.00, 7.25, 8.50)
* \f]
* Finally we get a \f$9 - (p + 1)\f$ splines for which we can calculate the
* base.
*
* \param data `data::Data*` Target data used for training etc.
* \param identifier `std::string` Identifier for one specific baselearner
* \param degree `unsigned int` Polynomial degree of the splines
* \param n_knots `unsigned int` Number of inner knots used
* \param penalty `double` Regularization parameter `penalty = 0` yields
* b splines while a bigger penalty forces the splines into a global
* polynomial form.
* \param differences `unsigned int` Number of differences used for the
* penalty matrix.
*/

PSplineBlearner::PSplineBlearner (data::Data* data, const std::string& identifier,
const unsigned int& degree, const unsigned int& n_knots, const double& penalty,
const unsigned int& differences)
: degree ( degree ),
n_knots ( n_knots ),
penalty ( penalty ),
differences ( differences )
{
// Called from parent class 'Baselearner':
Baselearner::setData(data);
Baselearner::setIdentifier(identifier);
}

/**
* \brief Clean copy of baselearner
*
* \returns `Baselearner*` An exact copy of the actual baselearner.
*/
Baselearner* PSplineBlearner::clone ()
{
Baselearner* newbl = new PSplineBlearner (*this);
newbl->copyMembers(this->parameter, this->blearner_identifier, this->data_ptr);

return newbl;
}

/**
* \brief Instantiate data matrix (design matrix)
*
* This function is ment to create the design matrix which is then stored
* within the data object. This should be done just once and then reused all
* the time.
*
* Note that this function sets the `data_mat` object of the data object!
*
* \param newdata `arma::mat` Input data which is transformed to the design matrix
*
* \returns `arma::mat` of transformed data
*/
arma::mat PSplineBlearner::instantiateData (const arma::mat& newdata)
{
// Data object has to be created prior! That means that data_ptr must have
// initialized knots, and penalty matrix!
return createBasis (newdata, degree, data_ptr->knots);
}

/**
* \brief Training of a baselearner
*
* This function sets the `parameter` member of the parent class `Baselearner`.
*
* \param response `arma::vec` Response variable of the training.
*/
void PSplineBlearner::train (const arma::vec& response)
{
parameter = data_ptr->XtX_inv * data_ptr->data_mat.t() * response;
}

/**
* \brief Predict on training data
*
* \returns `arma::mat` of predicted values
*/
arma::mat PSplineBlearner::predict ()
{
return data_ptr->data_mat * parameter;
}

/**
* \brief Predict on newdata
*
* \param newdata `data::Data*` new source data object
*
* \returns `arma::mat` of predicted values
*/
arma::mat PSplineBlearner::predict (data::Data* newdata)
{
return instantiateData(newdata->getData()) * parameter;
}


/// Destructor
PSplineBlearner::~PSplineBlearner () {}


// CustomBlearner:
// -----------------------

CustomBlearner::CustomBlearner (data::Data* data, const std::string& identifier,
Expand Down Expand Up @@ -238,7 +363,7 @@ arma::mat CustomBlearner::predict (data::Data* newdata)
CustomBlearner::~CustomBlearner () {}


// CustomCppBlearner Baselearner:
// CustomCppBlearner:
// -----------------------

CustomCppBlearner::CustomCppBlearner (data::Data* data, const std::string& identifier,
Expand Down
73 changes: 44 additions & 29 deletions src/baselearner.h
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
#include <string>

#include "data.h"
#include "splines.h"

namespace blearner {

Expand Down Expand Up @@ -157,38 +158,52 @@ class PolynomialBlearner : public Baselearner
* Orthogonalization.
*
* Please note, that this baselearner is just the dummy object. The most
* functionality is done within the corresponding factory object.
* functionality is done while creating the data target which contains the
* most object which are used here.
*
*/

// class PSplineBlearner : public Baselearner
// {
// private:
//
// /// Degree of polynomial functions as base models
// unsigned int degree;
//
// /// Number of inner knots
// unsigned int n_knots;
//
// /// Vector of knots used to create the basis
// arma::vec* knots;
//
// /// Penalty parameter
// double penalty;
//
// /// Degree of freedoms (alternative way to define the penalty parameter)
// double df;
//
// /// Differences of penalty matrix
// unsigned int differences;
//
// /// Penalty matrix for differences
// arma::sp_mat* K;
//
//
//
// };
class PSplineBlearner : public Baselearner
{
private:

/// Degree of polynomial functions as base models
unsigned int degree;

/// Number of inner knots
unsigned int n_knots;

/// Penalty parameter
double penalty;

/// Differences of penalty matrix
unsigned int differences;

public:
/// Default constructor of `PSplineBlearner` class
PSplineBlearner (data::Data*, const std::string&, const unsigned int&,
const unsigned int&, const double&, const unsigned int&);

/// Clean copy of baselearner
Baselearner* clone ();

/// Instatiate data matrix (design matrix)
arma::mat instantiateData (const arma::mat&);

/// Trianing of a baselearner
void train (const arma::vec&);

/// Predict on training data
arma::mat predict ();

/// Predict on newdata
arma::mat predict (data::Data*);


/// Destructor
~PSplineBlearner ();

};

// CustomBlearner:
// -----------------------
Expand Down

0 comments on commit 99d1c58

Please sign in to comment.