Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cost functions now support Stan Math, Kept the previous classes for backward compatability. #4294

Closed
wants to merge 39 commits into from
Closed
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
220b11d
Changed ordering of includes to avoid boost issues
FaroukY May 22, 2018
ff2dd1a
Built a class on stan that can define any arbitrary cost function wit…
FaroukY May 22, 2018
3cbecec
Wrote two unit tests to the new class to demonstrate how to write a n…
FaroukY May 22, 2018
110337f
Changed the order of including the headers
FaroukY May 22, 2018
8829625
Clang format
FaroukY May 23, 2018
7737c07
Adjusted editor to get rid of all the noise
FaroukY May 24, 2018
74dddef
Required changes in the review addressed
FaroukY May 24, 2018
1e8002c
Added the required changes from iteration 2 on #4294, several changes…
FaroukY May 26, 2018
afc5327
Changed unit tests so it works with the new style including a few typ…
FaroukY May 26, 2018
bc16178
Changed file names from FirstOrderSAGCostFunctionInterface to StanFir…
FaroukY Jun 2, 2018
9dd2d19
Changed the way I check for empty vector
FaroukY Jun 2, 2018
34d8aaa
Got rid of setters for changing behavior of cost function and paramet…
FaroukY Jun 2, 2018
8d89e4a
Added safeguards in constructor of StanFirstOrderSAGCostFunction to m…
FaroukY Jun 2, 2018
1bc7318
Shortened the names of very long variables using template typedefs
FaroukY Jun 2, 2018
a926972
Clang formatting
FaroukY Jun 2, 2018
1c60921
Remove memory from headers as its not required
FaroukY Jun 2, 2018
4e8947b
Changed the interface of the Cost funtion so that it works fine with …
FaroukY Jun 8, 2018
c29fb06
Wrote new unit tests to test SGD Minimizer with the cost function tha…
FaroukY Jun 8, 2018
21a9b7f
updated old unit tests to work with new updated interface of cost fun…
FaroukY Jun 8, 2018
b719131
clang-formatting on all edited files [ci skip]
FaroukY Jun 8, 2018
cdadb8d
Addressed some changed from the reviews
FaroukY Jun 15, 2018
87a663e
Changed the unittest to suit the changed Interface of the cost functi…
FaroukY Jun 15, 2018
cf2e3f1
changed parent class of StanFirstOrderSAGCostFunction to FirstOrderSt…
FaroukY Jun 16, 2018
d79f878
Created a new class StanNeuralLayer which will be the base class for …
FaroukY Jun 17, 2018
6ef7f67
Defined StanMatrix just like StanVector
FaroukY Jun 17, 2018
11f8c1b
updated the API of StanNeuralLayer and got rid of gradient computatio…
FaroukY Jun 17, 2018
272144d
Wrote the class StanNeuralLinearLayer which is a linear layer in the …
FaroukY Jun 17, 2018
91cfef9
fix bug in StanNeuralLinearLayer class where m_stan_activations wasn'…
FaroukY Jun 17, 2018
cd56095
Created a Logistic layer for the stan Neural network, the class is ca…
FaroukY Jun 17, 2018
6592f3f
removed regularization temporarily from neural layer untill we have a…
FaroukY Jun 20, 2018
5ba8e0e
Started creating the neural network class that uses the stan neural
FaroukY Jun 20, 2018
955a289
removed apply_multiclass since its not needed in the this case
FaroukY Jun 25, 2018
11f6dbd
Changed a typo in Stan and changed interface of compute_activations a…
FaroukY Jun 25, 2018
88b2c3c
adapted the logistic linear layer to the new API and fixed a few typos
FaroukY Jun 25, 2018
42c626f
Changed initialize parameters interface to remove regularization temp…
FaroukY Jun 25, 2018
8ae7fac
[ci skip] refactorred some code in neural net, finished the implement…
FaroukY Jun 25, 2018
3398d7d
[ci skip] 1) Added set_batch_size and implemented it 2) Fixed all syn…
FaroukY Jun 26, 2018
a1a5b41
Added the input layer headers and implementation using stan
FaroukY Jul 2, 2018
aea5dfa
[ci skip] various updates to interfaces, neural network module is now…
FaroukY Jul 2, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
39 changes: 7 additions & 32 deletions src/shogun/optimization/FirstOrderSAGCostFunction.h
Original file line number Diff line number Diff line change
@@ -1,38 +1,13 @@
/*
* Copyright (c) The Shogun Machine Learning Toolbox
* Written (w) 2015 Wu Lin
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* 1. Redistributions of source code must retain the above copyright notice, this
* list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright notice,
* this list of conditions and the following disclaimer in the documentation
* and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
* WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
* ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
* SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
* The views and conclusions contained in the software and documentation are those
* of the authors and should not be interpreted as representing official policies,
* either expressed or implied, of the Shogun Development Team.
*
*/
/*
* This software is distributed under BSD 3-clause license (see LICENSE file).
*
* Authors: Wu Lin
*/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you are already touching this file, it would be nice to do a tiny extra effort and update the Copyright to the more modern shorter version.

#ifndef FIRSTORDERSAGCOSTFUNCTION_H
#define FIRSTORDERSAGCOSTFUNCTION_H
#include <shogun/lib/config.h>
#include <shogun/optimization/FirstOrderStochasticCostFunction.h>
#include <shogun/lib/config.h>
namespace shogun
{
/** @brief The class is about a stochastic cost function for stochastic average minimizers.
Expand Down Expand Up @@ -106,7 +81,7 @@ class FirstOrderSAGCostFunction
*
* For least squares cost function, that is the value of
* \f$\frac{\partial f_i(w) }{\partial w}\f$ given \f$w\f$ is known
* where the index \f$i\f$ is obtained by next_sample()
* where the index \f$i\f$ is obtained by next_sample()
*
* @return sample gradient of target variables
*/
Expand Down
3 changes: 2 additions & 1 deletion src/shogun/optimization/FirstOrderStochasticMinimizer.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,9 @@
*/
#ifndef FIRSTORDERSTOCHASTICMINIMIZER_H
#define FIRSTORDERSTOCHASTICMINIMIZER_H
#include <shogun/optimization/FirstOrderMinimizer.h>

#include <shogun/optimization/FirstOrderStochasticCostFunction.h>
#include <shogun/optimization/FirstOrderMinimizer.h>
#include <shogun/optimization/DescendUpdater.h>
#include <shogun/optimization/LearningRate.h>
namespace shogun
Expand Down
140 changes: 140 additions & 0 deletions src/shogun/optimization/StanFirstOrderSAGCostFunction.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
/*
* This software is distributed under BSD 3-clause license (see LICENSE file).
*
* Authors: Elfarouk
*/

#include <shogun/optimization/StanFirstOrderSAGCostFunction.h>
#include <shogun/base/range.h>
#include <shogun/mathematics/Math.h>
using namespace shogun;
using stan::math::var;
using std::function;
using Eigen::Matrix;
using Eigen::Dynamic;

StanFirstOrderSAGCostFunction::StanFirstOrderSAGCostFunction(
SGMatrix<float64_t> X, SGMatrix<float64_t> y,
StanVector* trainable_parameters,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still dont get why we need pointers.

StanFunctionsVector<float64_t>* cost_for_ith_point,
FunctionReturnsStan<StanVector*>* total_cost)
{
REQUIRE(X.size() > 0, "Empty X provided");
REQUIRE(y.size() > 0, "Empty y provided");
auto num_of_variables = trainable_parameters->rows();
REQUIRE(
num_of_variables > 0, "Provided %d variables in the parameters, more "
"than 0 parameters required",
num_of_variables);
REQUIRE(cost_for_ith_point != NULL, "Cost for ith point is not provided");
REQUIRE(total_cost != NULL, "Total cost function is not provided");
m_X = X;
m_y = y;
m_trainable_parameters = trainable_parameters;
m_cost_for_ith_point = cost_for_ith_point;
m_total_cost = total_cost;
m_ref_trainable_parameters = SGVector<float64_t>(num_of_variables);
for (auto i : range(num_of_variables))
{
m_ref_trainable_parameters[i] = (*m_trainable_parameters)(i, 0).val();
}
}

void StanFirstOrderSAGCostFunction::set_training_data(
SGMatrix<float64_t> X_new, SGMatrix<float64_t> y_new)
{
REQUIRE(X_new.size() > 0, "Empty X provided");
REQUIRE(y_new.size() > 0, "Empty y provided");
this->m_X = X_new;
this->m_y = y_new;
}

StanFirstOrderSAGCostFunction::~StanFirstOrderSAGCostFunction()
{
}

void StanFirstOrderSAGCostFunction::begin_sample()
{
m_index_of_sample = -1;
}

bool StanFirstOrderSAGCostFunction::next_sample()
{
++m_index_of_sample;
return m_index_of_sample < get_sample_size();
}

void StanFirstOrderSAGCostFunction::update_stan_vectors_to_reference_values()
{
auto num_of_variables = m_trainable_parameters->rows();
for (auto i : range(num_of_variables))
{
(*m_trainable_parameters)(i, 0) = m_ref_trainable_parameters[i];
}
}
SGVector<float64_t> StanFirstOrderSAGCostFunction::get_gradient()
{
auto num_of_variables = m_trainable_parameters->rows();
REQUIRE(
num_of_variables > 0,
"Number of sample must be greater than 0, you provided no samples");

update_stan_vectors_to_reference_values();
var f_i = (*m_cost_for_ith_point)(m_index_of_sample, 0)(
m_trainable_parameters, m_index_of_sample);

stan::math::set_zero_all_adjoints();
f_i.grad();

SGVector<float64_t>::EigenVectorXt gradients =
m_trainable_parameters->unaryExpr(
[](stan::math::var x) -> float64_t { return x.adj(); });
// clone needed because gradients is local variable
return SGVector<float64_t>(gradients).clone();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could just simply wrap the EigenVectorXt with SGVector and not do cloning...
namely

SGVector<float64>(gradients.data(), gradients.size())

but i reckon it'd be better that simply just create an

SGVector<float64_t> gradients(size);
SGVector<float64_t>::EigenVectorXt mapped_gradients = gradients;
/// now you can use mapped_gradients as an eigen vector

return gradients;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vigsterkr But the data was being destroyed since the variable was local. I tried the second one but when I returned gradients, it just contained garbage values?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you return SGVector<float64_t> that should trigger a copy-ctor that should ++ the ref counter and hence shouldn't delete the data itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be many examples where this is used in shogun.... e.g. https://github.com/shogun-toolbox/shogun/blob/develop/src/shogun/machine/gp/GaussianLikelihood.cpp#L238

}

float64_t StanFirstOrderSAGCostFunction::get_cost()
{
auto n = get_sample_size();
StanVector cost_argument(n);

update_stan_vectors_to_reference_values();
for (auto i : range(n))
{
cost_argument(i, 0) =
(*m_cost_for_ith_point)(i, 0)(m_trainable_parameters, i);
}
var cost = (*m_total_cost)(&cost_argument);
return cost.val();
}

index_t StanFirstOrderSAGCostFunction::get_sample_size()
{
return m_X.num_cols;
}

SGVector<float64_t> StanFirstOrderSAGCostFunction::get_average_gradient()
{
int32_t params_num = m_trainable_parameters->rows();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto

SGVector<float64_t> average_gradients(params_num);

auto old_index_sample = m_index_of_sample;
auto n = get_sample_size();
REQUIRE(
n > 0,
"Number of sample must be greater than 0, you provided no samples");

for (index_t i = 0; i < n; ++i)
{
m_index_of_sample = i;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmmm this feels very very strange... as this definitely makes the whole cost function not thread-safe. i'm not saying that they should be, but i'm not convinced atm that this is actually required, or the only way to solve what you want

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the things I'm discussing with you today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be addressed in the next set of commits.

average_gradients += get_gradient();
}
average_gradients.scale(1.0 / n);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz use linalg::scale()

m_index_of_sample = old_index_sample;
return average_gradients;
}

SGVector<float64_t> StanFirstOrderSAGCostFunction::obtain_variable_reference()
{
return m_ref_trainable_parameters;
}
155 changes: 155 additions & 0 deletions src/shogun/optimization/StanFirstOrderSAGCostFunction.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
/*
* This software is distributed under BSD 3-clause license (see LICENSE file).
*
* Authors: Elfarouk
*/

#ifndef StanFirstOrderSAGCostFunction_H
#define StanFirstOrderSAGCostFunction_H

#include <stan/math.hpp>
#include <functional>
#include <shogun/lib/SGMatrix.h>
#include <shogun/lib/SGVector.h>
#include <shogun/lib/config.h>
#include <shogun/mathematics/eigen3.h>
#include <shogun/optimization/FirstOrderSAGCostFunction.h>
using StanVector = Eigen::Matrix<stan::math::var, Eigen::Dynamic, 1>;
template <class T>
using FunctionReturnsStan = std::function<stan::math::var(T)>;
template <class T>
using FunctionStanVectorArg = std::function<stan::math::var(StanVector*, T)>;
template <class S>
using StanFunctionsVector =
Eigen::Matrix<FunctionStanVectorArg<S>, Eigen::Dynamic, 1>;
namespace shogun
{
/** @brief The first order stochastic cost function base class for
* implementing the SAG Cost function
*
* The class gives the implementation used in first order stochastic
* minimizers
*
* The cost function must be Written as a finite sample-specific sum of
* cost.
* For example, least squares cost function,
* \f[
* f(w)=\frac{ \sum_i{ (y_i-w^T x_i)^2 } }{2}
* \f]
* where \f$(y_i,x_i)\f$ is the i-th sample,
* \f$y_i\f$ is the label and \f$x_i\f$ is the features
*/
class StanFirstOrderSAGCostFunction : public FirstOrderSAGCostFunction
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this relationship between StanFirstOrderSAGCostFunction and FirstOrderSAGCostFunction makes sense.

Copy link
Contributor Author

@FaroukY FaroukY Jun 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I've changed the parent class to be FirstOrderStochasticCostFunction in the next set of commits, since all these implemented functions are directly from it. (FirstOrderSAGCostFunction also inherits from FirstOrderStochasticCostFunction so it makes sense here to inherit from it as the Stan version is an alternative of FirstOrderSAGCostFunction).

{
public:
StanFirstOrderSAGCostFunction(
SGMatrix<float64_t> X, SGMatrix<float64_t> y,
StanVector* trainable_parameters,
StanFunctionsVector<float64_t>* cost_for_ith_point,
FunctionReturnsStan<StanVector*>* total_cost);

StanFirstOrderSAGCostFunction(){};

/** Setter for the training data X */
virtual void
set_training_data(SGMatrix<float64_t> X_new, SGMatrix<float64_t> y_new);

virtual ~StanFirstOrderSAGCostFunction();

/** Initialize to generate a sample sequence
*
*/
virtual void begin_sample();

/** Get next sample
*
* @return false if reach the end of the sample sequence
* */
virtual bool next_sample();

/** Get the sample gradient value wrt target variables
*
* WARNING
* This method does return
* \f$ \frac{\partial f_i(w) }{\partial w} \f$,
* instead of
* \f$\sum_i{ \frac{\partial f_i(w) }{\partial w} }\f$
*
* For least squares cost function, that is the value of
* \f$\frac{\partial f_i(w) }{\partial w}\f$ given \f$w\f$ is known
* where the index \f$i\f$ is obtained by next_sample()
*
* @return sample gradient of variables
*/
virtual SGVector<float64_t> get_gradient();

/** Get the cost given current target variables
*
* For least squares, that is the value of \f$f(w)\f$.
*
* @return cost
*/
virtual float64_t get_cost();

/** Get the sample size
*
* @return the sample size
*/
virtual index_t get_sample_size();

/** Get the average gradient value wrt target variables
*
* Note that the average gradient is the mean of sample gradient from
* get_gradient()
* if samples are generated (uniformly) at random.
*
* WARNING
* This method returns
* \f$ \frac{\sum_i^n{ \frac{\partial f_i(w) }{\partial w} }}{n}\f$
*
* For least squares, that is the value of
* \f$ \frac{\frac{\partial f(w) }{\partial w}}{n} \f$ given \f$w\f$ is
* known
* where \f$f(w)=\frac{ \sum_i^n{ (y_i-w^t x_i)^2 } }{2}\f$
*
* @return average gradient of target variables
*/
virtual SGVector<float64_t> get_average_gradient();

virtual SGVector<float64_t> obtain_variable_reference();

/** Updates m_trainable_parameters values to m_ref_trainable_parameters
*/
void update_stan_vectors_to_reference_values();

protected:
/** X is the training data in column major matrix format */
SGMatrix<float64_t> m_X;

/** y is the ground truth, or the correct prediction */
SGMatrix<float64_t> m_y;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FirstOrderSAGCostFunction does not have members for the training data. Why are they needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iglesias In the unit test of FirstOrderSAGCostFunction, we implement an additional class called CRegressionExample that wraps FirstOrderSAGCostFunction and contains the training data. I've simply removed the necessity for that class since it just acted as a wrapper around the cost function and data, and have included the data in the loss function, or atleast a reference to the data.

Copy link
Collaborator

@iglesias iglesias Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must be careful and we must no add unnecessary relationships. If it turns out that the training data is a member both in the cost function, and in the neural network, and in etc etc, that is going to cause lot of usage confusion.

It does not sounds unreasonable that the tests have a facility packaged to prepare input data and avoid code duplication (e.g. inside a class such as the CRegressionExample you are mentioning). What did you find wrong with it?


/** trainable_parameters are the variables that are optimized for */
StanVector* m_trainable_parameters;

/** cost_for_ith_point is the cost contributed by each point in the
* training data */

StanFunctionsVector<float64_t>* m_cost_for_ith_point;

/** total_cost is the total cost to be minimized, that in this case is a
* form of sum of cost_for_ith_point*/
// std::function<stan::math::var(StanVector*)>* m_total_cost;
FunctionReturnsStan<StanVector*>* m_total_cost;

/** Reference values for trainable_parameters so that minimizers can
* perform inplace updates */
SGVector<float64_t> m_ref_trainable_parameters;

/** index_of_sample is the index of the column in X for the current
* sample */
index_t m_index_of_sample;
};
}

#endif /* StanFirstOrderSAGCostFunction_H */