New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Flexible ReLU #1341

Merged
merged 25 commits into from Apr 11, 2018

Conversation

Projects
None yet
5 participants
@Manthan-R-Sheth
Contributor

Manthan-R-Sheth commented Mar 30, 2018

This is in continuity to #1281.
Function call overheads have been removed and code is simplified to include the calculation in Forward() and Backward().

@sourabhvarshney111

Rest all things looks good. I think @zoq @rcurtin can review this now.

* author = {Suo Qiu, Xiangmin Xu and Bolun Cai},
* title = {FReLU: Flexible Rectified Linear Units for Improving
* Convolutional Neural Networks}
* journal = {arxiv preprint},

This comment has been minimized.

@sourabhvarshney111

sourabhvarshney111 Mar 30, 2018

Contributor

If possible, can you add the URL too?

@sourabhvarshney111

This comment has been minimized.

Contributor

sourabhvarshney111 commented Mar 30, 2018

@Manthan-R-Sheth I have added very minor changes to this code. Can you please create a pull request from my repo?

@sourabhvarshney111

This comment has been minimized.

Contributor

sourabhvarshney111 commented Mar 31, 2018

@Manthan-R-Sheth I think this is complete. Is it?

@Manthan-R-Sheth

This comment has been minimized.

Contributor

Manthan-R-Sheth commented Mar 31, 2018

Lets wait for @zoq and @rcurtin

* For more information, read the following paper:
*
* @code
* @article{

This comment has been minimized.

@zoq

zoq Apr 1, 2018

Member

This is missing a name, also do you mind to move this into the class description block, that way, doxygen can pick this up.

* title = {FReLU: Flexible Rectified Linear Units for Improving
* Convolutional Neural Networks}
* journal = {arxiv preprint},
* URL = {https://arxiv.org/pdf/1706.08098.pdf},

This comment has been minimized.

@zoq

zoq Apr 1, 2018

Member

Can you link to the arxiv page instead using the pdf link here.

*
*@tparam OutputDataType Type of the output data (arma::colvec, arma::mat,
* arma::sp_mat or arma::cube)
*

This comment has been minimized.

@zoq

zoq Apr 1, 2018

Member

Looks like we could remove the extra line here.

*
*@tparam InputDataType Type of the input data ( arma::colvec, arma::mar,
* arma::sp_mat or arma::cube)
*

This comment has been minimized.

@zoq

zoq Apr 1, 2018

Member

Do you mind to remove the extra line and the extra space before arma::colvec?

@rcurtin

Thanks for taking over this PR; I'm glad to see it updated. Since alpha should be trainable, shouldn't we also have a Gradient() function and Parameters()? I also left a few comments about efficiency.

const InputType&& input, OutputType&& output)
{
output = arma::max(arma::zeros<InputType>(input.n_rows, input.n_cols), input)
+ alpha;

This comment has been minimized.

@rcurtin

rcurtin Apr 2, 2018

Member

A minor style issue---the second line here should be doubly indented (four spaces, not two). Also I think you might be able to do this with a lambda passed to .transform(), which could make it a little faster.

DataType derivative;
//! Compute the first derivative of FlexibleReLU function.
derivative = input;

This comment has been minimized.

@rcurtin

rcurtin Apr 2, 2018

Member

You can save a copy here if we do derivative.set_size(input.n_rows, input.n_cols), which will just set the size.

for (size_t i = 0; i < input.n_elem; i++)
{
derivative(i) = input(i) > 0? 1 : 0;
}

This comment has been minimized.

@rcurtin

rcurtin Apr 2, 2018

Member

You could also replace this with a call to .transform().

@Manthan-R-Sheth

This comment has been minimized.

Contributor

Manthan-R-Sheth commented Apr 3, 2018

@rcurtin
Yes your are right, since there is a trainable parameter, Gradient() and Parameters() should be implemented. I have done that and I have optimized the code too.
Do you think CheckGradient() test should be implemented here now?

@rcurtin

Yeah, even though Gradient() is simple I think it is still a good idea to add a CheckGradient() test. I had a few other comments, sorry if they contradict what I wrote earlier about transform().

}
arma::mat zeros = arma::zeros<arma::Mat<eT>>(input.n_rows, input.n_cols);
gradient(0) = arma::accu(error % arma::min(zeros, input)) / input.n_cols;

This comment has been minimized.

@rcurtin

rcurtin Apr 3, 2018

Member

It would be a lot better here if we could avoid allocating zeros; that will be time-consuming.

int i = -1;
output = arma::zeros<InputType>(input.n_rows, input.n_cols);
output.transform([input, &i, this](double val) { ++i;
return (std::max(input(i), 0.0) + alpha(0)); } );

This comment has been minimized.

@rcurtin

rcurtin Apr 3, 2018

Member

Ack, I did not realize that transform() is in-place. Why not use something like arma::clamp() instead then? i.e. output = arma::clamp(input, 0.0, DBL_MAX) + alpha(0);

This comment has been minimized.

@Manthan-R-Sheth

Manthan-R-Sheth Apr 3, 2018

Contributor

Yes we just needed a function like that

derivative.set_size(input.n_rows, input.n_cols);
int i = -1;
derivative.transform([input, &i](double val) { ++i;
return (input(i) > 0? 1 : 0); } );

This comment has been minimized.

@rcurtin

rcurtin Apr 3, 2018

Member

Here I think we could just use a boolean expression: derivation = (input > 0); (or some variant like that)

This comment has been minimized.

@Manthan-R-Sheth

Manthan-R-Sheth Apr 3, 2018

Contributor

I have used this :
derivative = arma::sign(input);
derivative.elem(arma::find(derivative < 0.0)) += 1;

This should be good to go or the lambda was faster already?

@Manthan-R-Sheth

This comment has been minimized.

Contributor

Manthan-R-Sheth commented Apr 3, 2018

@rcurtin
I have updated according to the suggestions.

@rcurtin

This comment has been minimized.

Member

rcurtin commented Apr 4, 2018

Looks like the gradient test is failing; can you look into it please?

/home/travis/build/mlpack/mlpack/src/mlpack/tests/ann_layer_test.cpp(677): fatal error in "GradientFlexibleReLULayerTest": critical check CheckGradient(function) <= 1e-4 failed [0.45575305843508807 > 0.0001]

Manthan-R-Sheth added some commits Apr 4, 2018

@Manthan-R-Sheth

This comment has been minimized.

Contributor

Manthan-R-Sheth commented Apr 5, 2018

@rcurtin @zoq
Can you have a look at the Backward() and Gradient() functions. It look right to me.
Maybe understanding what is wrong can make CheckGradient() pass too.
Infact it is passing in my local system. Is it again the random seed issue here?

@zoq

This comment has been minimized.

Member

zoq commented Apr 5, 2018

I don't think that's the case here, have to take a closer look at the code, especially the Gradient step.

@Manthan-R-Sheth

This comment has been minimized.

Contributor

Manthan-R-Sheth commented Apr 5, 2018

@zoq @rcurtin
I gave the same input for these two tests :-

  1. bin/mlpack_test -t ANNLayerTest
  2. bin/mlpack_test -t ANNLayerTest/GradientFlexibleReLULayerTest

and I found that the errors after model->Evaluate() are different and so are gradients. This is why its passing in 1 but failing in 2. Infact on printing the values of parameters for both cases, it was found that 2 has estimated and original gradient 0 unlike case 1.
Any ideas on what could be a possible cause for this.

DataType derivative;
//! Compute the first derivative of FlexibleReLU function.
derivative = arma::sign(input);
derivative.elem(arma::find(derivative < 0.0)) += 1;

This comment has been minimized.

@zoq

zoq Apr 5, 2018

Member

We could do something like: derivative.elem(find(input > 0) ).ones(); here, I think it's easier to get. Let me know what you think.

This comment has been minimized.

@Manthan-R-Sheth

Manthan-R-Sheth Apr 6, 2018

Contributor

i think i will have to initialize derivative with zeros() or input before using derivative.elem(find(input > 0) ).ones();.
So isn't present implementation faster than initializing the matrix and then operating on it?
Tell me if i missed anything.

This comment has been minimized.

@zoq

zoq Apr 6, 2018

Member

you are right

This comment has been minimized.

@rcurtin

rcurtin Apr 10, 2018

Member

It would be possible to also do derivative = arma::clamp(arma::sign(input), 0.0, 1.0); but I don't think this will currently give any acceleration. It may be slightly easier to read though.

{
gradient = arma::zeros<arma::Mat<eT>>(1, 1);
}
gradient(0) = arma::accu(error) / input.n_cols;

This comment has been minimized.

@zoq

zoq Apr 5, 2018

Member

Shouldn't this be gradient(0) = arma::sum(error)?

This comment has been minimized.

@Manthan-R-Sheth

Manthan-R-Sheth Apr 6, 2018

Contributor

sum() would give a row vector (error for each instance in input), but we need a double values for alpha (trainable parameter) and so i used accu().
it is accu(error) as the derivative of the output with respect to alpha is 1 in case of flexible relu.

This comment has been minimized.

@zoq

zoq Apr 6, 2018

Member

Ah, right, will run some tests later today.

This comment has been minimized.

@Manthan-R-Sheth

Manthan-R-Sheth Apr 6, 2018

Contributor

sure @zoq thanks.

{
if (gradient.n_elem == 0)
{
gradient = arma::zeros<arma::Mat<eT>>(1, 1);

This comment has been minimized.

@zoq

zoq Apr 5, 2018

Member

Let's use set_size here, to make this step somewhat faster.

@zoq

This comment has been minimized.

Member

zoq commented Apr 7, 2018

Sorry for the slow response on this one, the main issue is that the output is clipped (zeroed if < 0), so a really small perturbation in the positive/negative direction ends up with the same result and has no effect. An easy solution is to rely on positive weights, which isn't perfect since we don't cover the complete range of the frelu function. So, in addition, we could compare the gradient against a precomputed one with positive/negative weights. Let me know what you think.

Here is the modified test:

/**
 * Flexible ReLU layer numerically gradient test.
 */
BOOST_AUTO_TEST_CASE(GradientFlexibleReLULayerTest)
{
  // Add function gradient instantiation.
  struct GradientFunction
  {
    GradientFunction()
    {
      input = arma::randu(2, 1);
      target = arma::mat("1");

      model = new FFN<NegativeLogLikelihood<>, RandomInitialization>(
          NegativeLogLikelihood<>(), RandomInitialization(0.1, 0.5));

      model->Predictors() = input;
      model->Responses() = target;
      model->Add<LinearNoBias<> >(2, 5);
      model->Add<FlexibleReLU<> >(0.05);
      model->Add<LogSoftMax<> >();
    }

    ~GradientFunction()
    {
      delete model;
    }

    double Gradient(arma::mat& gradient) const
    {
      arma::mat output;
      double error = model->Evaluate(model->Parameters(), 0, 1);
      model->Gradient(model->Parameters(), 0, gradient, 1);
      return error;
    }

    arma::mat& Parameters() { return model->Parameters(); }

    FFN<NegativeLogLikelihood<>, RandomInitialization>* model;
    arma::mat input, target;
  } function;

  BOOST_REQUIRE_LE(CheckGradient(function), 1e-4);
}
@zoq

zoq approved these changes Apr 8, 2018

Thanks for looking into all the issues; I'll go ahead and merge this in 3 days to leave time for any other comments and I will fix some remaining minor style issues afterwards.

@rcurtin

Looks good to me; I'm looking forward to seeing this merged. Thanks for the contribution. :)

DataType derivative;
//! Compute the first derivative of FlexibleReLU function.
derivative = arma::sign(input);
derivative.elem(arma::find(derivative < 0.0)) += 1;

This comment has been minimized.

@rcurtin

rcurtin Apr 10, 2018

Member

It would be possible to also do derivative = arma::clamp(arma::sign(input), 0.0, 1.0); but I don't think this will currently give any acceleration. It may be slightly easier to read though.

@Manthan-R-Sheth

This comment has been minimized.

Contributor

Manthan-R-Sheth commented Apr 10, 2018

@rcurtin @zoq
I have done the changes. Thanks.

@zoq zoq merged commit 0c5b9e6 into mlpack:master Apr 11, 2018

5 checks passed

Memory Checks
Details
Static Code Analysis Checks Build finished.
Details
Style Checks Build finished.
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@zoq

This comment has been minimized.

Member

zoq commented Apr 11, 2018

@Manthan-R-Sheth thanks for the great contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment