Cost functions now support Stan Math, Kept the previous classes for backward compatability. #4294

FaroukY · 2018-05-22T06:56:43Z

We can now write any arbitrary cost functions in Stan Math, and get the gradients with respect to all variables, regardless how complex the cost function is. This PR has the class FirstOrderSAGCostFunctionInterface which acts as an interface for building Stochastic Average Cost functions. I've also added a unit test to show how to use the class. All reference values where validated on Tensorflow.

vigsterkr

there's some serious anti-patterns in these commits that actually needs to be addressed, as this in its current status is not gonna work.

vigsterkr · 2018-05-22T07:11:17Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.cpp

@@ -0,0 +1,178 @@
+/*


you should use the shorter version of the license, see for example SGObject.h

vigsterkr · 2018-05-22T07:11:51Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.h

@@ -0,0 +1,173 @@
+/*


should use the shorter version of the license

vigsterkr · 2018-05-22T07:12:15Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.h

+#include <vector>
+
+
+namespace shogun{


vigsterkr · 2018-05-22T07:13:21Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.h

+#define FIRSTORDERSAGCOSTFUNCTIONINTERFACE_H
+
+#include <stan/math.hpp>
+#include <Eigen/Dense>


we never include eigen directly in shogun headers, only in linalg but even then it's always via shogun/mathematics/eigen3.h

vigsterkr · 2018-05-22T07:14:27Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.h

+   * where \f$(y_i,x_i)\f$ is the i-th sample,
+   * \f$y_i\f$ is the label and \f$x_i\f$ is the features
+   */
+  class FirstOrderSAGCostFunctionInterface : public FirstOrderSAGCostFunction{


i guess this class is never gonna be part of an SGObject, hence we could make a pass over all the anti-shogun-patterns that you've used below

SG_ADD is not used anywhere in this class...

vigsterkr · 2018-05-22T07:16:46Z

tests/unit/optimization/FirstOrderSAGCostFunctionInterface_unittest.cc

@@ -0,0 +1,184 @@
+/*
+* Copyright (c) The Shogun Machine Learning Toolbox


use the shorter license

vigsterkr · 2018-05-22T07:20:06Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.h

+
+  protected:
+  	/** X is the training data in column major matrix format */
+  	Eigen::Matrix<float64_t, Eigen::Dynamic, Eigen::Dynamic>* m_X;


you have not added these to the param framework, nor this will be ever be able to be added... you need to use SGVector for that

vigsterkr · 2018-05-22T07:20:14Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.h

+  	Eigen::Matrix<float64_t, Eigen::Dynamic, Eigen::Dynamic>* m_X;
+
+  	/** y is the ground truth, or the correct prediction */
+  	Eigen::Matrix<float64_t, 1, Eigen::Dynamic>* m_y;


same as above... SGVector

vigsterkr · 2018-05-22T07:20:38Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.h

+  	Eigen::Matrix<float64_t, 1, Eigen::Dynamic>* m_y;
+
+  	/** trainable_parameters are the variables that are optimized for */
+  	Eigen::Matrix<stan::math::var, Eigen::Dynamic, 1>* m_trainable_parameters;


this currently is foobar... you wont be able to serialize reproducible this class

vigsterkr · 2018-05-22T07:20:54Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.h

+    std::vector< std::function < stan::math::var(int32_t) > >* m_cost_for_ith_point;
+
+    /** total_cost is the total cost to be minimized, that in this case is a form of sum of cost_for_ith_point*/
+    std::function < stan::math::var(std::vector<stan::math::var>*) >* m_total_cost;


indent + problems with serialization

vigsterkr · 2018-05-22T07:27:16Z

plz run clang-format on your code prior pushing into a PR

karlnapf · 2018-05-22T11:50:22Z

src/shogun/optimization/FirstOrderSAGCostFunction.h

 	 *
 	 * @return sample gradient of target variables
 	 */
 	virtual SGVector<float64_t> get_gradient()=0;

-	/** Get the cost given current target variables 
+	/** Get the cost given current target variables


could you adjust your editor so it doesnt cause all this noise?

karlnapf · 2018-05-22T11:51:13Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.cpp

+   Eigen::Matrix<float64_t, 1, Eigen::Dynamic>* y_new
+ )
+ {
+  REQUIRE(X_new!=NULL, "X must be non NULL");


Is this a user facing error?
If no, pls say "No X provided.\n" (also missing newline)

karlnapf · 2018-05-22T11:51:32Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.cpp

+   std::vector< std::function < stan::math::var(int32_t) > >* new_cost_f
+ )
+ {
+   REQUIRE(new_cost_f, "The cost function must be a non NULL vector of stan variables");


see above, and also for all other error msgs

karlnapf · 2018-05-22T11:52:23Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.cpp

+SGVector<float64_t> FirstOrderSAGCostFunctionInterface::get_gradient()
+{
+  int32_t num_of_variables = m_trainable_parameters->rows();
+  REQUIRE(num_of_variables > 0, "Number of training parameters must be greater than 0");


Pls always print the provided number
"Number of training parameters (%d) must be positive.\n"
https://github.com/shogun-toolbox/shogun/wiki/Assertions

karlnapf · 2018-05-22T11:52:53Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.cpp

+  int32_t n = get_sample_size();
+  REQUIRE(n>0 , "Number of sample must be greater than 0");
+
+  for(auto i=0; i<n; ++i)


minor: for (auto i : range(n)) for all range based loops

karlnapf · 2018-05-22T11:53:23Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.cpp

+        auto grad = get_gradient();
+        for(auto j=0; j<params_num; ++j)
+        {
+          average_gradients[j]+= (grad[j]/n);


can you use linalg here pls, rather than those vanilla loops

karlnapf · 2018-05-22T11:54:08Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.h

+
+  	/** Get the SAMPLE gradient value wrt target variables
+  	 *
+  	 * WARNING


I dont understand this warning thing. Could you make it more clear?

karlnapf · 2018-05-22T11:55:33Z

tests/unit/optimization/FirstOrderSAGCostFunctionInterface_unittest.cc

+}
+
+
+TEST(LeastSquareTestCostFunction, ONALINE)


whats ONALINE?
Pls try to do slightly more elaborate test names :)

Sorry, meant that the points are on a line (ONALINE)

points_on_a_line

iglesias

Just some small comments after a quick look for the time being. Before delving deeper, let's start getting addressed the other relevant points raised in the other reviews.

iglesias · 2018-05-22T16:53:43Z

tests/unit/optimization/FirstOrderSAGCostFunctionInterface_unittest.cc

+#include <Eigen/Dense>
+
+using namespace shogun;
+using namespace Eigen;


Let's just do this for shogun namespace. This way it will also become consistent here in Eigen and Stan. If you really want to save typing some characters for any special symbol, just bring it to scope individually; for example, using Eigen::Matrix;.

iglesias · 2018-05-22T16:55:15Z

tests/unit/optimization/FirstOrderSAGCostFunctionInterface_unittest.cc

+{
+  int32_t params_num = m_X->cols();
+  SGVector<float64_t> ret(params_num);
+  for(auto i=0; i<params_num; ++i)


Nitpicking a little bit on style. I would not use auto here. For one-liner scopes, I think the Shogun style used to be to omit the curly braces.

iglesias · 2018-05-22T16:56:56Z

tests/unit/optimization/FirstOrderSAGCostFunctionInterface_unittest.cc

+{
+  int n = 3;
+  auto X = Matrix<float64_t, Dynamic, Dynamic>();
+  X.resize(2,3);


What about something like Eigen::MatrixXd X(2,3) instead?

iglesias · 2018-05-22T16:59:34Z

tests/unit/optimization/FirstOrderSAGCostFunctionInterface_unittest.cc

+    stan::math::var res = wx_y * wx_y;
+    return res;
+  };
+  std::vector< std::function < stan::math::var(int32_t) > > cost_for_ith_point(n, f_i);


Please remove the spaces inside the template parameters, nowadays it is all fine to have << or >> in templates :-)

karlnapf · 2018-05-24T10:09:26Z

tests/unit/optimization/FirstOrderSAGCostFunctionInterface_unittest.cc

+	X(0, 2) = 1;
+	X(1, 0) = 0;
+	X(1, 1) = 1;
+	X(1, 2) = 2;


is there a chance of re-using this test data or the general setup? If so it would be good to put it into a function or a fixture. It also makes the test smaller

FaroukY · 2018-05-24T20:07:52Z

Alright, addressed all changes.

iglesias · 2018-05-24T20:21:36Z

tests/unit/optimization/FirstOrderSAGCostFunctionInterface_unittest.cc

+using stan::math::var;
+using std::function;
+
+/** This is a temporary fix. The variables are now variables


What does this mean?

we need to figure out how we can store (if we want a none primitive typed std::vector, like stan::math:var)... hence the temporary story... just add a FIXME/TODO term there and then its good

@iglesias As Viktor said. The original cost function needs a reference to an SGVector<float64_t> of the parameters. The parameters have now been changed in the cost function to SGVectorstan::math::var so we're going to have to change the interface of the minimizers before returning a SGVectorstan::math::var,

iglesias · 2018-05-24T20:23:06Z

tests/unit/optimization/FirstOrderSAGCostFunctionInterface_unittest.h

+ */
+
+#include <shogun/lib/config.h>
+#ifndef FIRSTORDERSAGCOSTFUNCTIONINTERFACE_UNITTEST_H


Is there some coding guideline preventing from using more underscores?

Not sure to be honest. I looked at the other files and they all had _UNITTEST_H so I followed the convention. If there is no guideline I'll add more _ between words

iglesias · 2018-05-24T20:27:47Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.h

+using Eigen::Dynamic;
+using Eigen::Matrix;
+using stan::math::var;
+using std::function;


It does not feel good to have type aliases in headers. They also become effective in places where the header is included, no?

second that this actually contaminates any file that will include this header

I thought that if you use using std::function; in a header, then you'd need to access it from another file which has the include as FirstOrderSAGCostFunctionInterface::function no? In other words, shouldn't it be under the scope of the .h file?

Actually you're right, I just forgot my c++! Including it in the header file does include it in other files as well. I'll get rid of it.

iglesias · 2018-05-24T20:29:02Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.h

+{
+	/** @brief The first order stochastic cost function base class for
+	 * implementing
+	 *  The SAG Cost function


Style: newline (after implementing) and capitalization (implementing \n The) are odd.

Sorry, I ran clang-format on it and didn't check the styling. Will fix the styling of the comments

iglesias · 2018-05-24T20:30:52Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.h

+
+		/** Default constructor, use setter helpers to set X, y,
+		*trainable_parameters, and
+		*		cost_function


This is not informative doc :-) I'd remove very general comments like this one, they add noise.

iglesias · 2018-05-24T20:33:32Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.h

+
+		/** total_cost is the total cost to be minimized, that in this case is a
+		 * form of sum of cost_for_ith_point*/
+		function<var(Matrix<var, Dynamic, 1>*)>* m_total_cost;


Potentially silly question alert! Why having an attribute of this type instead of a method?

@iglesias Not silly at all! I actually thought about this, but I wanted to separate the logic of calculating the cost function from the class so that base class can support any cost function. We can also make a pure virtual method and let the user implement it. Both work.

iglesias · 2018-05-24T20:33:56Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.h

+
+	protected:
+		/** X is the training data in column major matrix format */
+		SGMatrix<float64_t>* m_X;


Why are all these members pointers?

@iglesias Just a convention (I try to pass things by reference/pointer instead of value to avoid copying). But I agree I will get rid of SGMatrix's pointers since it already implements that under the hood.

iglesias · 2018-05-24T20:41:11Z

src/shogun/optimization/FirstOrderSAGCostFunction.h

@@ -31,8 +31,8 @@



As you are already touching this file, it would be nice to do a tiny extra effort and update the Copyright to the more modern shorter version.

iglesias · 2018-05-24T20:44:25Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.h

+	 * where \f$(y_i,x_i)\f$ is the i-th sample,
+	 * \f$y_i\f$ is the label and \f$x_i\f$ is the features
+	 */
+	class FirstOrderSAGCostFunctionInterface : public FirstOrderSAGCostFunction


It seems a bit odd that the interface inherits from the class.

Hmm, how about FirstOrderSAGArbitraryCostFunction ??

iglesias · 2018-05-24T20:45:43Z

I have the feeling that @vigsterkr's first general comment is not addressed. @FaroukY, did you discuss with anybody about it?

iglesias · 2018-05-25T08:03:51Z

How when it is needed, I think. We can sync in a few days about what attributes these cost functions should have, which ones make sense to make part of the public api. For instance right now I see that the Interface class even contains the training data.

…

On 25 May 2018 at 09:33, Viktor Gal ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In tests/unit/optimization/FirstOrderSAGCostFunctionInterface_unittest.cc <#4294 (comment)> : > + * + * Authors: Elfarouk + */ + +#include <gtest/gtest.h> +#include <shogun/optimization/FirstOrderSAGCostFunctionInterface.h> +#include "FirstOrderSAGCostFunctionInterface_unittest.h" +#include <Eigen/Dense> + +using namespace shogun; +using Eigen::Matrix; +using Eigen::Dynamic; +using stan::math::var; +using std::function; + +/** This is a temporary fix. The variables are now variables we need to figure out how we can store (if we want a none primitive typed std::vector, like stan::math:var)... hence the temporary story... just add a FIXME/TODO term there and then its good — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4294 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABGrdqb2zbTIE2Z2t7vyIb87A8-3wD2lks5t17O6gaJpZM4UIEvl> .

vigsterkr · 2018-05-25T07:34:27Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.cpp

+using Eigen::Matrix;
+
+FirstOrderSAGCostFunctionInterface::FirstOrderSAGCostFunctionInterface(
+    SGMatrix<float64_t>* X, SGMatrix<float64_t>* y,


these could be both SGMatrix passed as value...

vigsterkr · 2018-05-25T07:35:08Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.cpp

+}
+
+void FirstOrderSAGCostFunctionInterface::set_training_data(
+    SGMatrix<float64_t>* X_new, SGMatrix<float64_t>* y_new)


there was an email from @karlnapf on the mailinglist not long ago about SGVector and SGMatrix and how they are passed around

just pass them as value

vigsterkr · 2018-05-25T08:12:20Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.cpp

+
+bool FirstOrderSAGCostFunctionInterface::next_sample()
+{
+	auto num_of_samples = get_sample_size();


there's just no need for this variable on the stack... and just extra line of code. just do:

if (m_index_of_sample >= get_sample_size()) return false;

vigsterkr · 2018-05-25T08:12:28Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.cpp

+
+SGVector<float64_t> FirstOrderSAGCostFunctionInterface::get_gradient()
+{
+	int32_t num_of_variables = m_trainable_parameters->rows();


vigsterkr · 2018-05-25T08:12:47Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.cpp

+
+float64_t FirstOrderSAGCostFunctionInterface::get_cost()
+{
+	int32_t n = get_sample_size();


vigsterkr · 2018-05-25T08:13:16Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.cpp

+	int32_t params_num = m_trainable_parameters->rows();
+	SGVector<float64_t> average_gradients(params_num);
+
+	int32_t old_index_sample = m_index_of_sample;


vigsterkr · 2018-05-25T08:13:21Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.cpp

+	SGVector<float64_t> average_gradients(params_num);
+
+	int32_t old_index_sample = m_index_of_sample;
+	int32_t n = get_sample_size();


vigsterkr · 2018-05-25T08:13:34Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.cpp

+	for (index_t i = 0; i < n; ++i)
+	{
+		m_index_of_sample = i;
+		auto grad = get_gradient();


do we need this variable?

just do average_gradients += get_gradient()

vigsterkr · 2018-05-25T08:15:18Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.h

+using Eigen::Dynamic;
+using Eigen::Matrix;
+using stan::math::var;
+using std::function;


second that this actually contaminates any file that will include this header

vigsterkr · 2018-05-25T08:15:49Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.cpp

+
+FirstOrderSAGCostFunctionInterface::FirstOrderSAGCostFunctionInterface(
+    SGMatrix<float64_t>* X, SGMatrix<float64_t>* y,
+    Matrix<var, Dynamic, 1>* trainable_parameters,


what's the story behind these being pointers instead of ref?

I am not sure what is the convention in Shogun. But I always used pointers in my previous C++ coding (Just to avoid the syntax of reference). If you'd like, I think we can change it to reference here.

karlnapf · 2018-05-25T09:34:30Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.cpp

+	int32_t num_of_variables = m_trainable_parameters->rows();
+	REQUIRE(
+	    num_of_variables > 0,
+	    "Number of training parameters must be greater than 0");


missing newline. Also pls always print the probided number
"Number of training parameters (%d) must be positive.\n"

Sure, but in this case, I'll just print: "Number of training parameters must be greater than 0, you provided 0".... since you can't have num_of_variables<0

karlnapf · 2018-05-25T09:35:11Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.cpp

+	f_i.grad();
+
+	for (auto i = 0; i < num_of_variables; ++i)
+		gradients[i] = (*m_trainable_parameters)(i, 0).adj();


can't this be done without a loop using std::transform or something?

Didn't know this, but Eigen has unaryExpr which allows you to apply any function to all elements in the matrix. So I've removed the vanilla loop and replaced it with a call to that function.

karlnapf · 2018-05-25T09:35:42Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.cpp

+{
+	int32_t n = get_sample_size();
+	Matrix<var, Dynamic, 1> cost_argument(n);
+	for (auto i = 0; i < n; ++i)


as abovek would be better to do this without loop (if easy)
If not, at least do a for (auto i : range(n))

This one is a bid harder to do without a loop. Since we're calling a function inside and passing an argument, Eigen::unaryExpr wouldn't work. I changed it to range(n).

karlnapf · 2018-05-25T09:39:13Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.h

+		 * cost_function */
+		FirstOrderSAGCostFunctionInterface(
+		    SGMatrix<float64_t>* X, SGMatrix<float64_t>* y,
+		    Matrix<var, Dynamic, 1>* trainable_parameters,


could we maybe alias things like Matrix<var, Dynamic, 1> into something more readable?

karlnapf · 2018-05-25T09:39:39Z

src/shogun/optimization/FirstOrderSAGCostFunctionInterface.h

+		 * */
+		virtual bool next_sample();
+
+		/** Get the SAMPLE gradient value wrt target variables


why is this SAMPLE? rather than sample

Just wanted to emphasise it. Changed it to lower case.

iglesias · 2018-06-15T13:22:19Z

src/shogun/optimization/StanFirstOrderSAGCostFunction.h

+	 * where \f$(y_i,x_i)\f$ is the i-th sample,
+	 * \f$y_i\f$ is the label and \f$x_i\f$ is the features
+	 */
+	class StanFirstOrderSAGCostFunction : public FirstOrderSAGCostFunction


Not sure if this relationship between StanFirstOrderSAGCostFunction and FirstOrderSAGCostFunction makes sense.

Agreed. I've changed the parent class to be FirstOrderStochasticCostFunction in the next set of commits, since all these implemented functions are directly from it. (FirstOrderSAGCostFunction also inherits from FirstOrderStochasticCostFunction so it makes sense here to inherit from it as the Stan version is an alternative of FirstOrderSAGCostFunction).

iglesias · 2018-06-15T13:26:50Z

src/shogun/optimization/StanFirstOrderSAGCostFunction.h

+		SGMatrix<float64_t> m_y;
+
+		/** trainable_parameters are the variables that are optimized for */
+		StanVector& m_trainable_parameters;


I think that having a reference member makes default-constructability odd. This might be an issue for generic features such as the parameter framework, clone, etc.

I was just thinking of it as a design problem. As in aggregation vs composition. The cost function doesn't own the parameters, so I felt aggregation works better here (implemented in C++ as a reference) What do you think?

I think a reference member should not be used but maybe it is ok.

Any news about this concern?

iglesias · 2018-06-15T13:31:02Z

src/shogun/optimization/StanFirstOrderSAGCostFunction.h

+		SGMatrix<float64_t> m_X;
+
+		/** y is the ground truth, or the correct prediction */
+		SGMatrix<float64_t> m_y;


FirstOrderSAGCostFunction does not have members for the training data. Why are they needed here?

@iglesias In the unit test of FirstOrderSAGCostFunction, we implement an additional class called CRegressionExample that wraps FirstOrderSAGCostFunction and contains the training data. I've simply removed the necessity for that class since it just acted as a wrapper around the cost function and data, and have included the data in the loss function, or atleast a reference to the data.

We must be careful and we must no add unnecessary relationships. If it turns out that the training data is a member both in the cost function, and in the neural network, and in etc etc, that is going to cause lot of usage confusion.

It does not sounds unreasonable that the tests have a facility packaged to prepare input data and avoid code duplication (e.g. inside a class such as the CRegressionExample you are mentioning). What did you find wrong with it?

…ochasticCostFunction, and made get_gradient() thread safe. [ci skip]

…a neural network layer [ci skip]

…n from logic of layer

…neural network, the logistic/softmax/relu layers will inherit from it [ci skip]

…t aliased instead of A

…lled StanNeuralLogisticLayer and computes activations as required [ci skip]

iglesias · 2018-06-19T15:20:38Z

tests/unit/optimization/StanStochasticMinimizers_unittest.cc

+	var x1(0), x2(0), x3(0);
+	w(0, 0) = x1;
+	w(1, 0) = x2;
+	w(2, 0) = x3;


The syntax to initialize this StanVector is a bit awkward. Can you do better? For example with something std::vector-like it could just look like std::vector<var>{x1, x2, x3}.

Hmm, well StanVector is just an alias for Eigen<var, Dynamic, 1> so I am not sure I can change the constructor unless I define a wrapper class for Eigen<var, Dynamic, 1>.

Please have a look at initializing Eigen::Matrix in Eigen documentation.

https://eigen.tuxfamily.org/dox/group__TutorialAdvancedInitialization.html

iglesias · 2018-06-19T15:25:16Z

tests/unit/optimization/StanStochasticMinimizers_unittest.cc

+}
+
+function<var(const StanVector&, float64_t)>
+cost_for_ith_datapoints(SGMatrix<float64_t>& X, SGMatrix<float64_t>& y)


Will be refactoring the stan tests once the neural network is working

iglesias · 2018-06-19T15:27:19Z

tests/unit/optimization/StanStochasticMinimizers_unittest.cc

+	auto fun = new SquareErrorTestCostFunction(
+	    X, y, w, cost_for_ith_point, total_cost);
+
+	auto opt = new SGDMinimizer(fun);


Use a smart pointer instead (either Shogun's Some with a SGObject or std::unique otherwise). Then the delete at the end goes away.

iglesias · 2018-06-19T15:29:36Z

tests/unit/optimization/StanStochasticMinimizers_unittest.cc

+
+	auto f_i = cost_for_ith_datapoints(X, y);
+
+	Matrix<function<var(const StanVector&, float64_t)>, Dynamic, 1>


This is a case where auto really makes sense actually.

I wanted to cast here, since the output is not exactly:

Matrix<function<var(const StanVector&, float64_t)>, Dynamic, 1>

What do you mean?

iglesias · 2018-06-19T15:30:26Z

tests/unit/optimization/StanStochasticMinimizers_unittest.cc

+		return total_cost;
+	};
+
+	auto fun = new SquareErrorTestCostFunction(


No deletion for this one and the ConstLearningRate?

@iglesias They're handled in the minimizer's destructor.

… MVP

layers The class is composed of parameters, and is only responsible for forward propagation of the layers. The Network will be used to generate a StanMatrix of outputs, which will then be used with Minimizers and StanCostFunctions to minimize the neural network's parameters [ci skip]

iglesias · 2018-06-25T11:42:51Z

src/shogun/neuralnets/StanNeuralLogisticLayer.h

+ * [squared error measure](http://en.wikipedia.org/wiki/Mean_squared_error) is
+ * used
+ */
+class StanNeuralLogisticLayer : public StanNeuralLinearLayer


Perhaps it is only a naming issue, but saying that a LogisticLayer is a LinearLayer sounds odd.

@iglesias Just a naming issue. Its just by Linear Layer, I mean a layer that has an activation of h(x)=x. Then, the logistic layer applies the sigmoid \sigma to its output to get \sigma(h(x))=\sigma(x) or a sigmoid layer. So its just a specialisation of the linear layer

Any plans to fix?

iglesias · 2018-06-25T11:43:03Z

src/shogun/neuralnets/StanNeuralLogisticLayer.h

+	/** default constructor */
+	StanNeuralLogisticLayer();
+
+	/** Constuctor


Minor typo: constructor.

iglesias · 2018-06-25T11:44:11Z

src/shogun/neuralnets/StanNeuralLogisticLayer.h

+			CDynamicObjectArray* layers);
+
+
+	virtual const char* get_name() const { return "NeuralLogisticLayer"; }


Missing Stan prefixed in the name?

iglesias · 2018-06-25T11:45:49Z

src/shogun/neuralnets/StanNeuralNetwork.h

+	 */
+	virtual void initialize_neural_network(float64_t sigma = 0.01f);
+
+	virtual ~CNeuralNetwork();


Did you mean destructor of StanNeuralNetwork?

iglesias · 2018-06-25T11:47:00Z

src/shogun/neuralnets/StanNeuralNetwork.h

+	/** apply machine to data in means of regression problem */
+	virtual CRegressionLabels* apply_regression(CFeatures* data);
+	/** apply machine to data in means of multiclass classification problem */
+	virtual CMulticlassLabels* apply_multiclass(CFeatures* data);


I am not sure these ones should be here. Let's check with @karlnapf or @vigsterkr.

iglesias · 2018-06-25T11:48:30Z

src/shogun/neuralnets/StanNeuralNetwork.cpp

+ }
+
+
+ SGVector<float64_t>* StanNeuralNetwork::get_layer_parameters(int32_t i)


This looks strange. The signature in the header is StanVector& get_layer_parameters(int32_t i);.

…nd initialize_parameters to pass indices of start and end indices of the stan vector of parameters

…orairly and replace it with indices for stan vector

…ation, but there is still some syntax errors that are being fixed, once they are addressed, we can start the testing

…tax errors apart from compute_activations(input) which has been silenced for now, needs discussion before fixing; 3) Got rid of get_section() logic and replaced it with other logic that doesn't have to copy the vector; 4) Got rid of get_larger_activation logic since it wasn't needed ; 5) Fixed all signature errors of compute_activations(params, i, j, layers) and its specializations; 6) Got rid of some typos, still many typos that need addressing TODO

… done, the next part is to test it to check implementation details

iglesias

This PR has already become unreasonably large. It is difficult to keep the track (even Chromium is struggling here to load the full set of changes). Could it be an idea to separate in isolated yet meaningful parts?

Please try to address the issues that have been pointed out in previews reviews. At least get back to the comments and explain why you are not addressing them.

iglesias · 2018-07-02T11:36:27Z

src/shogun/neuralnets/StanNeuralLogisticLayer.h

+ * [squared error measure](http://en.wikipedia.org/wiki/Mean_squared_error) is
+ * used
+ */
+class StanNeuralLogisticLayer : public StanNeuralLinearLayer


Any plans to fix?

iglesias · 2018-07-02T11:43:34Z

src/shogun/optimization/StanFirstOrderSAGCostFunction.h

+		SGMatrix<float64_t> m_X;
+
+		/** y is the ground truth, or the correct prediction */
+		SGMatrix<float64_t> m_y;


We must be careful and we must no add unnecessary relationships. If it turns out that the training data is a member both in the cost function, and in the neural network, and in etc etc, that is going to cause lot of usage confusion.

It does not sounds unreasonable that the tests have a facility packaged to prepare input data and avoid code duplication (e.g. inside a class such as the CRegressionExample you are mentioning). What did you find wrong with it?

iglesias · 2018-07-02T11:44:02Z

src/shogun/optimization/StanFirstOrderSAGCostFunction.h

+		SGMatrix<float64_t> m_y;
+
+		/** trainable_parameters are the variables that are optimized for */
+		StanVector& m_trainable_parameters;


Any news about this concern?

FaroukY · 2018-07-04T14:45:21Z

@iglesias

I previously discussed with Viktor that I'd first finish writing the whole Neural Network module (Which is already done now), then write the unit tests for it and make sure the logic is correct, and then we can go back and refactor all the points addressed previously. I've already addressed some points mentioned, but the main focus was getting a MVP of the Neural network running only on stan for gradient calculation. I'm not ignoring the points, but I'm just more concerned now with finishing the actual logic (Then I can refactor/change names and do the other required changes).

I'm currently writing the unit tests for the whole neural network code, and this will be the last file to be added to this PR. It should be a standalone PR changing to a tested NN to use stan only.

iglesias · 2018-07-04T15:18:40Z

Hi @FaroukY! That's all right. Though I find that a ~2000 lines patch for a so-called MVP has a bit of over-kill smell.

Still, I do suggest you to keep in mind the comments that have already been made. Just think that you don't want to build upon something that will need to be completely changed from its foundations. I am not necessarily saying that is the case here, but I believe you should keep it mind.

vigsterkr requested changes May 22, 2018

View reviewed changes

karlnapf reviewed May 22, 2018

View reviewed changes

iglesias requested changes May 22, 2018

View reviewed changes

karlnapf reviewed May 24, 2018

View reviewed changes

iglesias reviewed May 24, 2018

View reviewed changes

vigsterkr requested changes May 25, 2018

View reviewed changes

karlnapf reviewed May 25, 2018

View reviewed changes

iglesias reviewed Jun 15, 2018

View reviewed changes

FaroukY added 7 commits June 16, 2018 11:14

changed parent class of StanFirstOrderSAGCostFunction to FirstOrderSt…

cf2e3f1

…ochasticCostFunction, and made get_gradient() thread safe. [ci skip]

Created a new class StanNeuralLayer which will be the base class for …

d79f878

…a neural network layer [ci skip]

Defined StanMatrix just like StanVector

6ef7f67

updated the API of StanNeuralLayer and got rid of gradient computatio…

11f8c1b

…n from logic of layer

Wrote the class StanNeuralLinearLayer which is a linear layer in the …

272144d

…neural network, the logistic/softmax/relu layers will inherit from it [ci skip]

fix bug in StanNeuralLinearLayer class where m_stan_activations wasn'…

91cfef9

…t aliased instead of A

Created a Logistic layer for the stan Neural network, the class is ca…

cd56095

…lled StanNeuralLogisticLayer and computes activations as required [ci skip]

iglesias reviewed Jun 19, 2018

View reviewed changes

FaroukY added 2 commits June 20, 2018 22:43

removed regularization temporarily from neural layer untill we have a…

6592f3f

… MVP

iglesias requested changes Jun 25, 2018

View reviewed changes

FaroukY added 8 commits June 25, 2018 21:32

removed apply_multiclass since its not needed in the this case

955a289

Changed a typo in Stan and changed interface of compute_activations a…

11f6dbd

…nd initialize_parameters to pass indices of start and end indices of the stan vector of parameters

adapted the logistic linear layer to the new API and fixed a few typos

88b2c3c

Changed initialize parameters interface to remove regularization temp…

42c626f

…orairly and replace it with indices for stan vector

[ci skip] refactorred some code in neural net, finished the implement…

8ae7fac

…ation, but there is still some syntax errors that are being fixed, once they are addressed, we can start the testing

Added the input layer headers and implementation using stan

a1a5b41

[ci skip] various updates to interfaces, neural network module is now…

aea5dfa

… done, the next part is to test it to check implementation details

iglesias requested changes Jul 2, 2018

View reviewed changes

iglesias closed this Jul 11, 2018

		@@ -0,0 +1,184 @@
		/*
		* Copyright (c) The Shogun Machine Learning Toolbox


		auto f_i = cost_for_ith_datapoints(X, y);

		Matrix<function<var(const StanVector&, float64_t)>, Dynamic, 1>

		CDynamicObjectArray* layers);


		virtual const char* get_name() const { return "NeuralLogisticLayer"; }

		}


		SGVector<float64_t>* StanNeuralNetwork::get_layer_parameters(int32_t i)

		}


		TEST(LeastSquareTestCostFunction, ONALINE)

Cost functions now support Stan Math, Kept the previous classes for backward compatability. #4294

Cost functions now support Stan Math, Kept the previous classes for backward compatability. #4294

Conversation

FaroukY commented May 22, 2018

vigsterkr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vigsterkr commented May 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iglesias left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FaroukY commented May 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iglesias commented May 24, 2018

iglesias commented May 25, 2018 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FaroukY May 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iglesias left a comment •

edited

FaroukY May 26, 2018 •

edited