Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cost functions now support Stan Math, Kept the previous classes for backward compatability. #4294

Closed
wants to merge 39 commits into from

Conversation

FaroukY
Copy link
Contributor

@FaroukY FaroukY commented May 22, 2018

We can now write any arbitrary cost functions in Stan Math, and get the gradients with respect to all variables, regardless how complex the cost function is. This PR has the class FirstOrderSAGCostFunctionInterface which acts as an interface for building Stochastic Average Cost functions. I've also added a unit test to show how to use the class. All reference values where validated on Tensorflow.

Copy link
Member

@vigsterkr vigsterkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's some serious anti-patterns in these commits that actually needs to be addressed, as this in its current status is not gonna work.

@@ -0,0 +1,178 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should use the shorter version of the license, see for example SGObject.h

@@ -0,0 +1,173 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use the shorter version of the license

#include <vector>


namespace shogun{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace

#define FIRSTORDERSAGCOSTFUNCTIONINTERFACE_H

#include <stan/math.hpp>
#include <Eigen/Dense>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we never include eigen directly in shogun headers, only in linalg but even then it's always via shogun/mathematics/eigen3.h

* where \f$(y_i,x_i)\f$ is the i-th sample,
* \f$y_i\f$ is the label and \f$x_i\f$ is the features
*/
class FirstOrderSAGCostFunctionInterface : public FirstOrderSAGCostFunction{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess this class is never gonna be part of an SGObject, hence we could make a pass over all the anti-shogun-patterns that you've used below

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG_ADD is not used anywhere in this class...

@@ -0,0 +1,184 @@
/*
* Copyright (c) The Shogun Machine Learning Toolbox
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the shorter license


protected:
/** X is the training data in column major matrix format */
Eigen::Matrix<float64_t, Eigen::Dynamic, Eigen::Dynamic>* m_X;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have not added these to the param framework, nor this will be ever be able to be added... you need to use SGVector for that

Eigen::Matrix<float64_t, Eigen::Dynamic, Eigen::Dynamic>* m_X;

/** y is the ground truth, or the correct prediction */
Eigen::Matrix<float64_t, 1, Eigen::Dynamic>* m_y;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above... SGVector

Eigen::Matrix<float64_t, 1, Eigen::Dynamic>* m_y;

/** trainable_parameters are the variables that are optimized for */
Eigen::Matrix<stan::math::var, Eigen::Dynamic, 1>* m_trainable_parameters;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this currently is foobar... you wont be able to serialize reproducible this class

std::vector< std::function < stan::math::var(int32_t) > >* m_cost_for_ith_point;

/** total_cost is the total cost to be minimized, that in this case is a form of sum of cost_for_ith_point*/
std::function < stan::math::var(std::vector<stan::math::var>*) >* m_total_cost;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent + problems with serialization

@vigsterkr
Copy link
Member

plz run clang-format on your code prior pushing into a PR

*
* @return sample gradient of target variables
*/
virtual SGVector<float64_t> get_gradient()=0;

/** Get the cost given current target variables
/** Get the cost given current target variables
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you adjust your editor so it doesnt cause all this noise?

Eigen::Matrix<float64_t, 1, Eigen::Dynamic>* y_new
)
{
REQUIRE(X_new!=NULL, "X must be non NULL");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a user facing error?
If no, pls say "No X provided.\n" (also missing newline)

std::vector< std::function < stan::math::var(int32_t) > >* new_cost_f
)
{
REQUIRE(new_cost_f, "The cost function must be a non NULL vector of stan variables");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above, and also for all other error msgs

SGVector<float64_t> FirstOrderSAGCostFunctionInterface::get_gradient()
{
int32_t num_of_variables = m_trainable_parameters->rows();
REQUIRE(num_of_variables > 0, "Number of training parameters must be greater than 0");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls always print the provided number
"Number of training parameters (%d) must be positive.\n"
https://github.com/shogun-toolbox/shogun/wiki/Assertions

int32_t n = get_sample_size();
REQUIRE(n>0 , "Number of sample must be greater than 0");

for(auto i=0; i<n; ++i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: for (auto i : range(n)) for all range based loops

auto grad = get_gradient();
for(auto j=0; j<params_num; ++j)
{
average_gradients[j]+= (grad[j]/n);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use linalg here pls, rather than those vanilla loops


/** Get the SAMPLE gradient value wrt target variables
*
* WARNING
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont understand this warning thing. Could you make it more clear?

}


TEST(LeastSquareTestCostFunction, ONALINE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats ONALINE?
Pls try to do slightly more elaborate test names :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, meant that the points are on a line (ONALINE)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

points_on_a_line

Copy link
Collaborator

@iglesias iglesias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some small comments after a quick look for the time being. Before delving deeper, let's start getting addressed the other relevant points raised in the other reviews.

#include <Eigen/Dense>

using namespace shogun;
using namespace Eigen;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just do this for shogun namespace. This way it will also become consistent here in Eigen and Stan. If you really want to save typing some characters for any special symbol, just bring it to scope individually; for example, using Eigen::Matrix;.

{
int32_t params_num = m_X->cols();
SGVector<float64_t> ret(params_num);
for(auto i=0; i<params_num; ++i)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking a little bit on style. I would not use auto here. For one-liner scopes, I think the Shogun style used to be to omit the curly braces.

{
int n = 3;
auto X = Matrix<float64_t, Dynamic, Dynamic>();
X.resize(2,3);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like Eigen::MatrixXd X(2,3) instead?

stan::math::var res = wx_y * wx_y;
return res;
};
std::vector< std::function < stan::math::var(int32_t) > > cost_for_ith_point(n, f_i);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the spaces inside the template parameters, nowadays it is all fine to have << or >> in templates :-)

X(0, 2) = 1;
X(1, 0) = 0;
X(1, 1) = 1;
X(1, 2) = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a chance of re-using this test data or the general setup? If so it would be good to put it into a function or a fixture. It also makes the test smaller

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@FaroukY
Copy link
Contributor Author

FaroukY commented May 24, 2018

Alright, addressed all changes.

using stan::math::var;
using std::function;

/** This is a temporary fix. The variables are now variables
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to figure out how we can store (if we want a none primitive typed std::vector, like stan::math:var)... hence the temporary story... just add a FIXME/TODO term there and then its good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iglesias As Viktor said. The original cost function needs a reference to an SGVector<float64_t> of the parameters. The parameters have now been changed in the cost function to SGVectorstan::math::var so we're going to have to change the interface of the minimizers before returning a SGVectorstan::math::var,

*/

#include <shogun/lib/config.h>
#ifndef FIRSTORDERSAGCOSTFUNCTIONINTERFACE_UNITTEST_H
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some coding guideline preventing from using more underscores?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure to be honest. I looked at the other files and they all had _UNITTEST_H so I followed the convention. If there is no guideline I'll add more _ between words

using Eigen::Dynamic;
using Eigen::Matrix;
using stan::math::var;
using std::function;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not feel good to have type aliases in headers. They also become effective in places where the header is included, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second that this actually contaminates any file that will include this header

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that if you use using std::function; in a header, then you'd need to access it from another file which has the include as FirstOrderSAGCostFunctionInterface::function no? In other words, shouldn't it be under the scope of the .h file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you're right, I just forgot my c++! Including it in the header file does include it in other files as well. I'll get rid of it.

{
/** @brief The first order stochastic cost function base class for
* implementing
* The SAG Cost function
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: newline (after implementing) and capitalization (implementing \n The) are odd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I ran clang-format on it and didn't check the styling. Will fix the styling of the comments


/** Default constructor, use setter helpers to set X, y,
*trainable_parameters, and
* cost_function
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not informative doc :-) I'd remove very general comments like this one, they add noise.


/** total_cost is the total cost to be minimized, that in this case is a
* form of sum of cost_for_ith_point*/
function<var(Matrix<var, Dynamic, 1>*)>* m_total_cost;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially silly question alert! Why having an attribute of this type instead of a method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iglesias Not silly at all! I actually thought about this, but I wanted to separate the logic of calculating the cost function from the class so that base class can support any cost function. We can also make a pure virtual method and let the user implement it. Both work.


protected:
/** X is the training data in column major matrix format */
SGMatrix<float64_t>* m_X;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are all these members pointers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iglesias Just a convention (I try to pass things by reference/pointer instead of value to avoid copying). But I agree I will get rid of SGMatrix's pointers since it already implements that under the hood.

@@ -31,8 +31,8 @@

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you are already touching this file, it would be nice to do a tiny extra effort and update the Copyright to the more modern shorter version.

* where \f$(y_i,x_i)\f$ is the i-th sample,
* \f$y_i\f$ is the label and \f$x_i\f$ is the features
*/
class FirstOrderSAGCostFunctionInterface : public FirstOrderSAGCostFunction
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit odd that the interface inherits from the class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, how about FirstOrderSAGArbitraryCostFunction ??

@iglesias
Copy link
Collaborator

I have the feeling that @vigsterkr's first general comment is not addressed. @FaroukY, did you discuss with anybody about it?

@iglesias
Copy link
Collaborator

iglesias commented May 25, 2018 via email

using Eigen::Matrix;

FirstOrderSAGCostFunctionInterface::FirstOrderSAGCostFunctionInterface(
SGMatrix<float64_t>* X, SGMatrix<float64_t>* y,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these could be both SGMatrix passed as value...

}

void FirstOrderSAGCostFunctionInterface::set_training_data(
SGMatrix<float64_t>* X_new, SGMatrix<float64_t>* y_new)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there was an email from @karlnapf on the mailinglist not long ago about SGVector and SGMatrix and how they are passed around

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just pass them as value


bool FirstOrderSAGCostFunctionInterface::next_sample()
{
auto num_of_samples = get_sample_size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's just no need for this variable on the stack... and just extra line of code. just do:

if (m_index_of_sample >= get_sample_size())
	return false; 


SGVector<float64_t> FirstOrderSAGCostFunctionInterface::get_gradient()
{
int32_t num_of_variables = m_trainable_parameters->rows();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto


float64_t FirstOrderSAGCostFunctionInterface::get_cost()
{
int32_t n = get_sample_size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto

int32_t params_num = m_trainable_parameters->rows();
SGVector<float64_t> average_gradients(params_num);

int32_t old_index_sample = m_index_of_sample;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto

SGVector<float64_t> average_gradients(params_num);

int32_t old_index_sample = m_index_of_sample;
int32_t n = get_sample_size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto

for (index_t i = 0; i < n; ++i)
{
m_index_of_sample = i;
auto grad = get_gradient();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this variable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just do average_gradients += get_gradient()

using Eigen::Dynamic;
using Eigen::Matrix;
using stan::math::var;
using std::function;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second that this actually contaminates any file that will include this header


FirstOrderSAGCostFunctionInterface::FirstOrderSAGCostFunctionInterface(
SGMatrix<float64_t>* X, SGMatrix<float64_t>* y,
Matrix<var, Dynamic, 1>* trainable_parameters,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the story behind these being pointers instead of ref?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what is the convention in Shogun. But I always used pointers in my previous C++ coding (Just to avoid the syntax of reference). If you'd like, I think we can change it to reference here.

int32_t num_of_variables = m_trainable_parameters->rows();
REQUIRE(
num_of_variables > 0,
"Number of training parameters must be greater than 0");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing newline. Also pls always print the probided number
"Number of training parameters (%d) must be positive.\n"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but in this case, I'll just print: "Number of training parameters must be greater than 0, you provided 0".... since you can't have num_of_variables<0

f_i.grad();

for (auto i = 0; i < num_of_variables; ++i)
gradients[i] = (*m_trainable_parameters)(i, 0).adj();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't this be done without a loop using std::transform or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know this, but Eigen has unaryExpr which allows you to apply any function to all elements in the matrix. So I've removed the vanilla loop and replaced it with a call to that function.

{
int32_t n = get_sample_size();
Matrix<var, Dynamic, 1> cost_argument(n);
for (auto i = 0; i < n; ++i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as abovek would be better to do this without loop (if easy)
If not, at least do a for (auto i : range(n))

Copy link
Contributor Author

@FaroukY FaroukY May 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a bid harder to do without a loop. Since we're calling a function inside and passing an argument, Eigen::unaryExpr wouldn't work. I changed it to range(n).

* cost_function */
FirstOrderSAGCostFunctionInterface(
SGMatrix<float64_t>* X, SGMatrix<float64_t>* y,
Matrix<var, Dynamic, 1>* trainable_parameters,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we maybe alias things like Matrix<var, Dynamic, 1> into something more readable?

* */
virtual bool next_sample();

/** Get the SAMPLE gradient value wrt target variables
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this SAMPLE? rather than sample

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to emphasise it. Changed it to lower case.

* where \f$(y_i,x_i)\f$ is the i-th sample,
* \f$y_i\f$ is the label and \f$x_i\f$ is the features
*/
class StanFirstOrderSAGCostFunction : public FirstOrderSAGCostFunction
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this relationship between StanFirstOrderSAGCostFunction and FirstOrderSAGCostFunction makes sense.

Copy link
Contributor Author

@FaroukY FaroukY Jun 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I've changed the parent class to be FirstOrderStochasticCostFunction in the next set of commits, since all these implemented functions are directly from it. (FirstOrderSAGCostFunction also inherits from FirstOrderStochasticCostFunction so it makes sense here to inherit from it as the Stan version is an alternative of FirstOrderSAGCostFunction).

SGMatrix<float64_t> m_y;

/** trainable_parameters are the variables that are optimized for */
StanVector& m_trainable_parameters;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that having a reference member makes default-constructability odd. This might be an issue for generic features such as the parameter framework, clone, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just thinking of it as a design problem. As in aggregation vs composition. The cost function doesn't own the parameters, so I felt aggregation works better here (implemented in C++ as a reference) What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a reference member should not be used but maybe it is ok.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any news about this concern?

SGMatrix<float64_t> m_X;

/** y is the ground truth, or the correct prediction */
SGMatrix<float64_t> m_y;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FirstOrderSAGCostFunction does not have members for the training data. Why are they needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iglesias In the unit test of FirstOrderSAGCostFunction, we implement an additional class called CRegressionExample that wraps FirstOrderSAGCostFunction and contains the training data. I've simply removed the necessity for that class since it just acted as a wrapper around the cost function and data, and have included the data in the loss function, or atleast a reference to the data.

Copy link
Collaborator

@iglesias iglesias Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must be careful and we must no add unnecessary relationships. If it turns out that the training data is a member both in the cost function, and in the neural network, and in etc etc, that is going to cause lot of usage confusion.

It does not sounds unreasonable that the tests have a facility packaged to prepare input data and avoid code duplication (e.g. inside a class such as the CRegressionExample you are mentioning). What did you find wrong with it?

var x1(0), x2(0), x3(0);
w(0, 0) = x1;
w(1, 0) = x2;
w(2, 0) = x3;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The syntax to initialize this StanVector is a bit awkward. Can you do better? For example with something std::vector-like it could just look like std::vector<var>{x1, x2, x3}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, well StanVector is just an alias for Eigen<var, Dynamic, 1> so I am not sure I can change the constructor unless I define a wrapper class for Eigen<var, Dynamic, 1>.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have a look at initializing Eigen::Matrix in Eigen documentation.

https://eigen.tuxfamily.org/dox/group__TutorialAdvancedInitialization.html

}

function<var(const StanVector&, float64_t)>
cost_for_ith_datapoints(SGMatrix<float64_t>& X, SGMatrix<float64_t>& y)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be refactoring the stan tests once the neural network is working

auto fun = new SquareErrorTestCostFunction(
X, y, w, cost_for_ith_point, total_cost);

auto opt = new SGDMinimizer(fun);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a smart pointer instead (either Shogun's Some with a SGObject or std::unique otherwise). Then the delete at the end goes away.


auto f_i = cost_for_ith_datapoints(X, y);

Matrix<function<var(const StanVector&, float64_t)>, Dynamic, 1>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a case where auto really makes sense actually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to cast here, since the output is not exactly:

Matrix<function<var(const StanVector&, float64_t)>, Dynamic, 1>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

return total_cost;
};

auto fun = new SquareErrorTestCostFunction(
Copy link
Collaborator

@iglesias iglesias Jun 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No deletion for this one and the ConstLearningRate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iglesias They're handled in the minimizer's destructor.

layers

The class is composed of parameters, and is only responsible for forward
propagation of the layers.

The Network will be used to generate a StanMatrix of outputs, which will
then be used with Minimizers and StanCostFunctions to minimize the
neural network's parameters

[ci skip]
* [squared error measure](http://en.wikipedia.org/wiki/Mean_squared_error) is
* used
*/
class StanNeuralLogisticLayer : public StanNeuralLinearLayer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it is only a naming issue, but saying that a LogisticLayer is a LinearLayer sounds odd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iglesias Just a naming issue. Its just by Linear Layer, I mean a layer that has an activation of h(x)=x. Then, the logistic layer applies the sigmoid \sigma to its output to get \sigma(h(x))=\sigma(x) or a sigmoid layer. So its just a specialisation of the linear layer

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any plans to fix?

/** default constructor */
StanNeuralLogisticLayer();

/** Constuctor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo: constructor.

CDynamicObjectArray* layers);


virtual const char* get_name() const { return "NeuralLogisticLayer"; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing Stan prefixed in the name?

*/
virtual void initialize_neural_network(float64_t sigma = 0.01f);

virtual ~CNeuralNetwork();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean destructor of StanNeuralNetwork?

/** apply machine to data in means of regression problem */
virtual CRegressionLabels* apply_regression(CFeatures* data);
/** apply machine to data in means of multiclass classification problem */
virtual CMulticlassLabels* apply_multiclass(CFeatures* data);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure these ones should be here. Let's check with @karlnapf or @vigsterkr.

}


SGVector<float64_t>* StanNeuralNetwork::get_layer_parameters(int32_t i)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks strange. The signature in the header is StanVector& get_layer_parameters(int32_t i);.

…nd initialize_parameters to pass indices of start and end indices of the stan vector of parameters
…orairly and replace it with indices for stan vector
…ation, but there is still some syntax errors that are being fixed, once they are addressed, we can start the testing
…tax errors apart from compute_activations(input) which has been silenced for now, needs discussion before fixing; 3) Got rid of get_section() logic and replaced it with other logic that doesn't have to copy the vector; 4) Got rid of get_larger_activation logic since it wasn't needed ; 5) Fixed all signature errors of compute_activations(params, i, j, layers) and its specializations; 6) Got rid of some typos, still many typos that need addressing TODO
… done, the next part is to test it to check implementation details
Copy link
Collaborator

@iglesias iglesias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has already become unreasonably large. It is difficult to keep the track (even Chromium is struggling here to load the full set of changes). Could it be an idea to separate in isolated yet meaningful parts?

Please try to address the issues that have been pointed out in previews reviews. At least get back to the comments and explain why you are not addressing them.

* [squared error measure](http://en.wikipedia.org/wiki/Mean_squared_error) is
* used
*/
class StanNeuralLogisticLayer : public StanNeuralLinearLayer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any plans to fix?

SGMatrix<float64_t> m_X;

/** y is the ground truth, or the correct prediction */
SGMatrix<float64_t> m_y;
Copy link
Collaborator

@iglesias iglesias Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must be careful and we must no add unnecessary relationships. If it turns out that the training data is a member both in the cost function, and in the neural network, and in etc etc, that is going to cause lot of usage confusion.

It does not sounds unreasonable that the tests have a facility packaged to prepare input data and avoid code duplication (e.g. inside a class such as the CRegressionExample you are mentioning). What did you find wrong with it?

SGMatrix<float64_t> m_y;

/** trainable_parameters are the variables that are optimized for */
StanVector& m_trainable_parameters;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any news about this concern?

@FaroukY
Copy link
Contributor Author

FaroukY commented Jul 4, 2018

@iglesias

I previously discussed with Viktor that I'd first finish writing the whole Neural Network module (Which is already done now), then write the unit tests for it and make sure the logic is correct, and then we can go back and refactor all the points addressed previously. I've already addressed some points mentioned, but the main focus was getting a MVP of the Neural network running only on stan for gradient calculation. I'm not ignoring the points, but I'm just more concerned now with finishing the actual logic (Then I can refactor/change names and do the other required changes).

I'm currently writing the unit tests for the whole neural network code, and this will be the last file to be added to this PR. It should be a standalone PR changing to a tested NN to use stan only.

@iglesias
Copy link
Collaborator

iglesias commented Jul 4, 2018

Hi @FaroukY! That's all right. Though I find that a ~2000 lines patch for a so-called MVP has a bit of over-kill smell.

Still, I do suggest you to keep in mind the comments that have already been made. Just think that you don't want to build upon something that will need to be completely changed from its foundations. I am not necessarily saying that is the case here, but I believe you should keep it mind.

@iglesias iglesias closed this Jul 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants