New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cost functions now support Stan Math, Kept the previous classes for backward compatability. #4294
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's some serious anti-patterns in these commits that actually needs to be addressed, as this in its current status is not gonna work.
@@ -0,0 +1,178 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should use the shorter version of the license, see for example SGObject.h
@@ -0,0 +1,173 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should use the shorter version of the license
#include <vector> | ||
|
||
|
||
namespace shogun{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace
#define FIRSTORDERSAGCOSTFUNCTIONINTERFACE_H | ||
|
||
#include <stan/math.hpp> | ||
#include <Eigen/Dense> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we never include eigen directly in shogun headers, only in linalg but even then it's always via shogun/mathematics/eigen3.h
* where \f$(y_i,x_i)\f$ is the i-th sample, | ||
* \f$y_i\f$ is the label and \f$x_i\f$ is the features | ||
*/ | ||
class FirstOrderSAGCostFunctionInterface : public FirstOrderSAGCostFunction{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess this class is never gonna be part of an SGObject, hence we could make a pass over all the anti-shogun-patterns that you've used below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG_ADD is not used anywhere in this class...
@@ -0,0 +1,184 @@ | |||
/* | |||
* Copyright (c) The Shogun Machine Learning Toolbox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the shorter license
|
||
protected: | ||
/** X is the training data in column major matrix format */ | ||
Eigen::Matrix<float64_t, Eigen::Dynamic, Eigen::Dynamic>* m_X; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have not added these to the param framework, nor this will be ever be able to be added... you need to use SGVector for that
Eigen::Matrix<float64_t, Eigen::Dynamic, Eigen::Dynamic>* m_X; | ||
|
||
/** y is the ground truth, or the correct prediction */ | ||
Eigen::Matrix<float64_t, 1, Eigen::Dynamic>* m_y; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above... SGVector
Eigen::Matrix<float64_t, 1, Eigen::Dynamic>* m_y; | ||
|
||
/** trainable_parameters are the variables that are optimized for */ | ||
Eigen::Matrix<stan::math::var, Eigen::Dynamic, 1>* m_trainable_parameters; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this currently is foobar... you wont be able to serialize reproducible this class
std::vector< std::function < stan::math::var(int32_t) > >* m_cost_for_ith_point; | ||
|
||
/** total_cost is the total cost to be minimized, that in this case is a form of sum of cost_for_ith_point*/ | ||
std::function < stan::math::var(std::vector<stan::math::var>*) >* m_total_cost; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent + problems with serialization
plz run clang-format on your code prior pushing into a PR |
* | ||
* @return sample gradient of target variables | ||
*/ | ||
virtual SGVector<float64_t> get_gradient()=0; | ||
|
||
/** Get the cost given current target variables | ||
/** Get the cost given current target variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you adjust your editor so it doesnt cause all this noise?
Eigen::Matrix<float64_t, 1, Eigen::Dynamic>* y_new | ||
) | ||
{ | ||
REQUIRE(X_new!=NULL, "X must be non NULL"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a user facing error?
If no, pls say "No X provided.\n" (also missing newline)
std::vector< std::function < stan::math::var(int32_t) > >* new_cost_f | ||
) | ||
{ | ||
REQUIRE(new_cost_f, "The cost function must be a non NULL vector of stan variables"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above, and also for all other error msgs
SGVector<float64_t> FirstOrderSAGCostFunctionInterface::get_gradient() | ||
{ | ||
int32_t num_of_variables = m_trainable_parameters->rows(); | ||
REQUIRE(num_of_variables > 0, "Number of training parameters must be greater than 0"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls always print the provided number
"Number of training parameters (%d) must be positive.\n"
https://github.com/shogun-toolbox/shogun/wiki/Assertions
int32_t n = get_sample_size(); | ||
REQUIRE(n>0 , "Number of sample must be greater than 0"); | ||
|
||
for(auto i=0; i<n; ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: for (auto i : range(n))
for all range based loops
auto grad = get_gradient(); | ||
for(auto j=0; j<params_num; ++j) | ||
{ | ||
average_gradients[j]+= (grad[j]/n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use linalg here pls, rather than those vanilla loops
|
||
/** Get the SAMPLE gradient value wrt target variables | ||
* | ||
* WARNING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont understand this warning thing. Could you make it more clear?
} | ||
|
||
|
||
TEST(LeastSquareTestCostFunction, ONALINE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats ONALINE?
Pls try to do slightly more elaborate test names :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, meant that the points are on a line (ONALINE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
points_on_a_line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some small comments after a quick look for the time being. Before delving deeper, let's start getting addressed the other relevant points raised in the other reviews.
#include <Eigen/Dense> | ||
|
||
using namespace shogun; | ||
using namespace Eigen; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just do this for shogun namespace. This way it will also become consistent here in Eigen and Stan. If you really want to save typing some characters for any special symbol, just bring it to scope individually; for example, using Eigen::Matrix;
.
{ | ||
int32_t params_num = m_X->cols(); | ||
SGVector<float64_t> ret(params_num); | ||
for(auto i=0; i<params_num; ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking a little bit on style. I would not use auto
here. For one-liner scopes, I think the Shogun style used to be to omit the curly braces.
{ | ||
int n = 3; | ||
auto X = Matrix<float64_t, Dynamic, Dynamic>(); | ||
X.resize(2,3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about something like Eigen::MatrixXd X(2,3)
instead?
stan::math::var res = wx_y * wx_y; | ||
return res; | ||
}; | ||
std::vector< std::function < stan::math::var(int32_t) > > cost_for_ith_point(n, f_i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the spaces inside the template parameters, nowadays it is all fine to have <<
or >>
in templates :-)
X(0, 2) = 1; | ||
X(1, 0) = 0; | ||
X(1, 1) = 1; | ||
X(1, 2) = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a chance of re-using this test data or the general setup? If so it would be good to put it into a function or a fixture. It also makes the test smaller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
Alright, addressed all changes. |
using stan::math::var; | ||
using std::function; | ||
|
||
/** This is a temporary fix. The variables are now variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to figure out how we can store (if we want a none primitive typed std::vector, like stan::math:var)... hence the temporary story... just add a FIXME/TODO term there and then its good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iglesias As Viktor said. The original cost function needs a reference to an SGVector<float64_t> of the parameters. The parameters have now been changed in the cost function to SGVectorstan::math::var so we're going to have to change the interface of the minimizers before returning a SGVectorstan::math::var,
*/ | ||
|
||
#include <shogun/lib/config.h> | ||
#ifndef FIRSTORDERSAGCOSTFUNCTIONINTERFACE_UNITTEST_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some coding guideline preventing from using more underscores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure to be honest. I looked at the other files and they all had _UNITTEST_H so I followed the convention. If there is no guideline I'll add more _ between words
using Eigen::Dynamic; | ||
using Eigen::Matrix; | ||
using stan::math::var; | ||
using std::function; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not feel good to have type aliases in headers. They also become effective in places where the header is included, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
second that this actually contaminates any file that will include this header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that if you use using std::function; in a header, then you'd need to access it from another file which has the include as FirstOrderSAGCostFunctionInterface::function no? In other words, shouldn't it be under the scope of the .h file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually you're right, I just forgot my c++! Including it in the header file does include it in other files as well. I'll get rid of it.
{ | ||
/** @brief The first order stochastic cost function base class for | ||
* implementing | ||
* The SAG Cost function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: newline (after implementing) and capitalization (implementing \n The) are odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I ran clang-format on it and didn't check the styling. Will fix the styling of the comments
|
||
/** Default constructor, use setter helpers to set X, y, | ||
*trainable_parameters, and | ||
* cost_function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not informative doc :-) I'd remove very general comments like this one, they add noise.
|
||
/** total_cost is the total cost to be minimized, that in this case is a | ||
* form of sum of cost_for_ith_point*/ | ||
function<var(Matrix<var, Dynamic, 1>*)>* m_total_cost; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially silly question alert! Why having an attribute of this type instead of a method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iglesias Not silly at all! I actually thought about this, but I wanted to separate the logic of calculating the cost function from the class so that base class can support any cost function. We can also make a pure virtual method and let the user implement it. Both work.
|
||
protected: | ||
/** X is the training data in column major matrix format */ | ||
SGMatrix<float64_t>* m_X; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are all these members pointers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iglesias Just a convention (I try to pass things by reference/pointer instead of value to avoid copying). But I agree I will get rid of SGMatrix's pointers since it already implements that under the hood.
@@ -31,8 +31,8 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you are already touching this file, it would be nice to do a tiny extra effort and update the Copyright to the more modern shorter version.
* where \f$(y_i,x_i)\f$ is the i-th sample, | ||
* \f$y_i\f$ is the label and \f$x_i\f$ is the features | ||
*/ | ||
class FirstOrderSAGCostFunctionInterface : public FirstOrderSAGCostFunction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a bit odd that the interface inherits from the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, how about FirstOrderSAGArbitraryCostFunction ??
I have the feeling that @vigsterkr's first general comment is not addressed. @FaroukY, did you discuss with anybody about it? |
How when it is needed, I think. We can sync in a few days about what
attributes these cost functions should have, which ones make sense to make
part of the public api.
For instance right now I see that the Interface class even contains the
training data.
…On 25 May 2018 at 09:33, Viktor Gal ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In tests/unit/optimization/FirstOrderSAGCostFunctionInterface_unittest.cc
<#4294 (comment)>
:
> + *
+ * Authors: Elfarouk
+ */
+
+#include <gtest/gtest.h>
+#include <shogun/optimization/FirstOrderSAGCostFunctionInterface.h>
+#include "FirstOrderSAGCostFunctionInterface_unittest.h"
+#include <Eigen/Dense>
+
+using namespace shogun;
+using Eigen::Matrix;
+using Eigen::Dynamic;
+using stan::math::var;
+using std::function;
+
+/** This is a temporary fix. The variables are now variables
we need to figure out how we can store (if we want a none primitive typed
std::vector, like stan::math:var)... hence the temporary story... just add
a FIXME/TODO term there and then its good
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4294 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABGrdqb2zbTIE2Z2t7vyIb87A8-3wD2lks5t17O6gaJpZM4UIEvl>
.
|
using Eigen::Matrix; | ||
|
||
FirstOrderSAGCostFunctionInterface::FirstOrderSAGCostFunctionInterface( | ||
SGMatrix<float64_t>* X, SGMatrix<float64_t>* y, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these could be both SGMatrix passed as value...
} | ||
|
||
void FirstOrderSAGCostFunctionInterface::set_training_data( | ||
SGMatrix<float64_t>* X_new, SGMatrix<float64_t>* y_new) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there was an email from @karlnapf on the mailinglist not long ago about SGVector and SGMatrix and how they are passed around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just pass them as value
|
||
bool FirstOrderSAGCostFunctionInterface::next_sample() | ||
{ | ||
auto num_of_samples = get_sample_size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's just no need for this variable on the stack... and just extra line of code. just do:
if (m_index_of_sample >= get_sample_size())
return false;
|
||
SGVector<float64_t> FirstOrderSAGCostFunctionInterface::get_gradient() | ||
{ | ||
int32_t num_of_variables = m_trainable_parameters->rows(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto
|
||
float64_t FirstOrderSAGCostFunctionInterface::get_cost() | ||
{ | ||
int32_t n = get_sample_size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto
int32_t params_num = m_trainable_parameters->rows(); | ||
SGVector<float64_t> average_gradients(params_num); | ||
|
||
int32_t old_index_sample = m_index_of_sample; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto
SGVector<float64_t> average_gradients(params_num); | ||
|
||
int32_t old_index_sample = m_index_of_sample; | ||
int32_t n = get_sample_size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto
for (index_t i = 0; i < n; ++i) | ||
{ | ||
m_index_of_sample = i; | ||
auto grad = get_gradient(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just do average_gradients += get_gradient()
using Eigen::Dynamic; | ||
using Eigen::Matrix; | ||
using stan::math::var; | ||
using std::function; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
second that this actually contaminates any file that will include this header
|
||
FirstOrderSAGCostFunctionInterface::FirstOrderSAGCostFunctionInterface( | ||
SGMatrix<float64_t>* X, SGMatrix<float64_t>* y, | ||
Matrix<var, Dynamic, 1>* trainable_parameters, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the story behind these being pointers instead of ref?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what is the convention in Shogun. But I always used pointers in my previous C++ coding (Just to avoid the syntax of reference). If you'd like, I think we can change it to reference here.
int32_t num_of_variables = m_trainable_parameters->rows(); | ||
REQUIRE( | ||
num_of_variables > 0, | ||
"Number of training parameters must be greater than 0"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing newline. Also pls always print the probided number
"Number of training parameters (%d) must be positive.\n"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but in this case, I'll just print: "Number of training parameters must be greater than 0, you provided 0".... since you can't have num_of_variables<0
f_i.grad(); | ||
|
||
for (auto i = 0; i < num_of_variables; ++i) | ||
gradients[i] = (*m_trainable_parameters)(i, 0).adj(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't this be done without a loop using std::transform
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't know this, but Eigen has unaryExpr which allows you to apply any function to all elements in the matrix. So I've removed the vanilla loop and replaced it with a call to that function.
{ | ||
int32_t n = get_sample_size(); | ||
Matrix<var, Dynamic, 1> cost_argument(n); | ||
for (auto i = 0; i < n; ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as abovek would be better to do this without loop (if easy)
If not, at least do a for (auto i : range(n))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is a bid harder to do without a loop. Since we're calling a function inside and passing an argument, Eigen::unaryExpr wouldn't work. I changed it to range(n).
* cost_function */ | ||
FirstOrderSAGCostFunctionInterface( | ||
SGMatrix<float64_t>* X, SGMatrix<float64_t>* y, | ||
Matrix<var, Dynamic, 1>* trainable_parameters, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we maybe alias things like Matrix<var, Dynamic, 1>
into something more readable?
* */ | ||
virtual bool next_sample(); | ||
|
||
/** Get the SAMPLE gradient value wrt target variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this SAMPLE? rather than sample
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to emphasise it. Changed it to lower case.
* where \f$(y_i,x_i)\f$ is the i-th sample, | ||
* \f$y_i\f$ is the label and \f$x_i\f$ is the features | ||
*/ | ||
class StanFirstOrderSAGCostFunction : public FirstOrderSAGCostFunction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this relationship between StanFirstOrderSAGCostFunction
and FirstOrderSAGCostFunction
makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I've changed the parent class to be FirstOrderStochasticCostFunction in the next set of commits, since all these implemented functions are directly from it. (FirstOrderSAGCostFunction also inherits from FirstOrderStochasticCostFunction so it makes sense here to inherit from it as the Stan version is an alternative of FirstOrderSAGCostFunction).
SGMatrix<float64_t> m_y; | ||
|
||
/** trainable_parameters are the variables that are optimized for */ | ||
StanVector& m_trainable_parameters; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that having a reference member makes default-constructability odd. This might be an issue for generic features such as the parameter framework, clone, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just thinking of it as a design problem. As in aggregation vs composition. The cost function doesn't own the parameters, so I felt aggregation works better here (implemented in C++ as a reference) What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a reference member should not be used but maybe it is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any news about this concern?
SGMatrix<float64_t> m_X; | ||
|
||
/** y is the ground truth, or the correct prediction */ | ||
SGMatrix<float64_t> m_y; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FirstOrderSAGCostFunction
does not have members for the training data. Why are they needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iglesias In the unit test of FirstOrderSAGCostFunction, we implement an additional class called CRegressionExample that wraps FirstOrderSAGCostFunction and contains the training data. I've simply removed the necessity for that class since it just acted as a wrapper around the cost function and data, and have included the data in the loss function, or atleast a reference to the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must be careful and we must no add unnecessary relationships. If it turns out that the training data is a member both in the cost function, and in the neural network, and in etc etc, that is going to cause lot of usage confusion.
It does not sounds unreasonable that the tests have a facility packaged to prepare input data and avoid code duplication (e.g. inside a class such as the CRegressionExample you are mentioning). What did you find wrong with it?
…ochasticCostFunction, and made get_gradient() thread safe. [ci skip]
…a neural network layer [ci skip]
…n from logic of layer
…neural network, the logistic/softmax/relu layers will inherit from it [ci skip]
…t aliased instead of A
…lled StanNeuralLogisticLayer and computes activations as required [ci skip]
var x1(0), x2(0), x3(0); | ||
w(0, 0) = x1; | ||
w(1, 0) = x2; | ||
w(2, 0) = x3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The syntax to initialize this StanVector
is a bit awkward. Can you do better? For example with something std::vector-like it could just look like std::vector<var>{x1, x2, x3}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, well StanVector is just an alias for Eigen<var, Dynamic, 1> so I am not sure I can change the constructor unless I define a wrapper class for Eigen<var, Dynamic, 1>.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have a look at initializing Eigen::Matrix in Eigen documentation.
https://eigen.tuxfamily.org/dox/group__TutorialAdvancedInitialization.html
} | ||
|
||
function<var(const StanVector&, float64_t)> | ||
cost_for_ith_datapoints(SGMatrix<float64_t>& X, SGMatrix<float64_t>& y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be refactoring the stan tests once the neural network is working
auto fun = new SquareErrorTestCostFunction( | ||
X, y, w, cost_for_ith_point, total_cost); | ||
|
||
auto opt = new SGDMinimizer(fun); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a smart pointer instead (either Shogun's Some with a SGObject or std::unique otherwise). Then the delete at the end goes away.
|
||
auto f_i = cost_for_ith_datapoints(X, y); | ||
|
||
Matrix<function<var(const StanVector&, float64_t)>, Dynamic, 1> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a case where auto really makes sense actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to cast here, since the output is not exactly:
Matrix<function<var(const StanVector&, float64_t)>, Dynamic, 1>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean?
return total_cost; | ||
}; | ||
|
||
auto fun = new SquareErrorTestCostFunction( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No deletion for this one and the ConstLearningRate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iglesias They're handled in the minimizer's destructor.
layers The class is composed of parameters, and is only responsible for forward propagation of the layers. The Network will be used to generate a StanMatrix of outputs, which will then be used with Minimizers and StanCostFunctions to minimize the neural network's parameters [ci skip]
* [squared error measure](http://en.wikipedia.org/wiki/Mean_squared_error) is | ||
* used | ||
*/ | ||
class StanNeuralLogisticLayer : public StanNeuralLinearLayer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it is only a naming issue, but saying that a LogisticLayer is a LinearLayer sounds odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iglesias Just a naming issue. Its just by Linear Layer, I mean a layer that has an activation of h(x)=x. Then, the logistic layer applies the sigmoid \sigma to its output to get \sigma(h(x))=\sigma(x) or a sigmoid layer. So its just a specialisation of the linear layer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any plans to fix?
/** default constructor */ | ||
StanNeuralLogisticLayer(); | ||
|
||
/** Constuctor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor typo: constructor.
CDynamicObjectArray* layers); | ||
|
||
|
||
virtual const char* get_name() const { return "NeuralLogisticLayer"; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing Stan prefixed in the name?
*/ | ||
virtual void initialize_neural_network(float64_t sigma = 0.01f); | ||
|
||
virtual ~CNeuralNetwork(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean destructor of StanNeuralNetwork
?
/** apply machine to data in means of regression problem */ | ||
virtual CRegressionLabels* apply_regression(CFeatures* data); | ||
/** apply machine to data in means of multiclass classification problem */ | ||
virtual CMulticlassLabels* apply_multiclass(CFeatures* data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure these ones should be here. Let's check with @karlnapf or @vigsterkr.
} | ||
|
||
|
||
SGVector<float64_t>* StanNeuralNetwork::get_layer_parameters(int32_t i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks strange. The signature in the header is StanVector& get_layer_parameters(int32_t i);
.
…nd initialize_parameters to pass indices of start and end indices of the stan vector of parameters
…orairly and replace it with indices for stan vector
…ation, but there is still some syntax errors that are being fixed, once they are addressed, we can start the testing
…tax errors apart from compute_activations(input) which has been silenced for now, needs discussion before fixing; 3) Got rid of get_section() logic and replaced it with other logic that doesn't have to copy the vector; 4) Got rid of get_larger_activation logic since it wasn't needed ; 5) Fixed all signature errors of compute_activations(params, i, j, layers) and its specializations; 6) Got rid of some typos, still many typos that need addressing TODO
… done, the next part is to test it to check implementation details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has already become unreasonably large. It is difficult to keep the track (even Chromium is struggling here to load the full set of changes). Could it be an idea to separate in isolated yet meaningful parts?
Please try to address the issues that have been pointed out in previews reviews. At least get back to the comments and explain why you are not addressing them.
* [squared error measure](http://en.wikipedia.org/wiki/Mean_squared_error) is | ||
* used | ||
*/ | ||
class StanNeuralLogisticLayer : public StanNeuralLinearLayer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any plans to fix?
SGMatrix<float64_t> m_X; | ||
|
||
/** y is the ground truth, or the correct prediction */ | ||
SGMatrix<float64_t> m_y; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must be careful and we must no add unnecessary relationships. If it turns out that the training data is a member both in the cost function, and in the neural network, and in etc etc, that is going to cause lot of usage confusion.
It does not sounds unreasonable that the tests have a facility packaged to prepare input data and avoid code duplication (e.g. inside a class such as the CRegressionExample you are mentioning). What did you find wrong with it?
SGMatrix<float64_t> m_y; | ||
|
||
/** trainable_parameters are the variables that are optimized for */ | ||
StanVector& m_trainable_parameters; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any news about this concern?
I previously discussed with Viktor that I'd first finish writing the whole Neural Network module (Which is already done now), then write the unit tests for it and make sure the logic is correct, and then we can go back and refactor all the points addressed previously. I've already addressed some points mentioned, but the main focus was getting a MVP of the Neural network running only on stan for gradient calculation. I'm not ignoring the points, but I'm just more concerned now with finishing the actual logic (Then I can refactor/change names and do the other required changes). I'm currently writing the unit tests for the whole neural network code, and this will be the last file to be added to this PR. It should be a standalone PR changing to a tested NN to use stan only. |
Hi @FaroukY! That's all right. Though I find that a ~2000 lines patch for a so-called MVP has a bit of over-kill smell. Still, I do suggest you to keep in mind the comments that have already been made. Just think that you don't want to build upon something that will need to be completely changed from its foundations. I am not necessarily saying that is the case here, but I believe you should keep it mind. |
We can now write any arbitrary cost functions in Stan Math, and get the gradients with respect to all variables, regardless how complex the cost function is. This PR has the class FirstOrderSAGCostFunctionInterface which acts as an interface for building Stochastic Average Cost functions. I've also added a unit test to show how to use the class. All reference values where validated on Tensorflow.