Implement Flexible ReLU #1341

Manthan-R-Sheth · 2018-03-30T09:49:03Z

This is in continuity to #1281.
Function call overheads have been removed and code is simplified to include the calculation in Forward() and Backward().

sourabhvarshney111

Rest all things looks good. I think @zoq @rcurtin can review this now.

sourabhvarshney111 · 2018-03-30T09:58:02Z

src/mlpack/methods/ann/layer/flexible_relu.hpp

+ *  author  = {Suo Qiu, Xiangmin Xu and Bolun Cai},
+ *  title   = {FReLU: Flexible Rectified Linear Units for Improving
+ *             Convolutional Neural Networks}
+ *  journal = {arxiv preprint},


If possible, can you add the URL too?

sourabhvarshney111 · 2018-03-30T10:23:15Z

@Manthan-R-Sheth I have added very minor changes to this code. Can you please create a pull request from my repo?

sourabhvarshney111 · 2018-03-31T10:00:01Z

@Manthan-R-Sheth I think this is complete. Is it?

Manthan-R-Sheth · 2018-03-31T10:02:35Z

Lets wait for @zoq and @rcurtin

zoq · 2018-04-01T18:32:55Z

src/mlpack/methods/ann/layer/flexible_relu.hpp

+ * For more information, read the following paper:
+ *
+ * @code
+ * @article{


This is missing a name, also do you mind to move this into the class description block, that way, doxygen can pick this up.

zoq · 2018-04-01T18:34:28Z

src/mlpack/methods/ann/layer/flexible_relu.hpp

+ *  title   = {FReLU: Flexible Rectified Linear Units for Improving
+ *             Convolutional Neural Networks}
+ *  journal = {arxiv preprint},
+ *  URL     = {https://arxiv.org/pdf/1706.08098.pdf},


Can you link to the arxiv page instead using the pdf link here.

zoq · 2018-04-01T18:35:10Z

src/mlpack/methods/ann/layer/flexible_relu.hpp

+ *
+ *@tparam OutputDataType Type of the output data (arma::colvec, arma::mat,
+ *        arma::sp_mat or arma::cube)
+ *


Looks like we could remove the extra line here.

zoq · 2018-04-01T18:35:31Z

src/mlpack/methods/ann/layer/flexible_relu.hpp

+ *
+ *@tparam InputDataType Type of the input data ( arma::colvec, arma::mar,
+ *        arma::sp_mat or arma::cube)
+ *


Do you mind to remove the extra line and the extra space before arma::colvec?

rcurtin

Thanks for taking over this PR; I'm glad to see it updated. Since alpha should be trainable, shouldn't we also have a Gradient() function and Parameters()? I also left a few comments about efficiency.

rcurtin · 2018-04-02T16:39:07Z

src/mlpack/methods/ann/layer/flexible_relu_impl.hpp

+    const InputType&& input, OutputType&& output)
+{
+  output = arma::max(arma::zeros<InputType>(input.n_rows, input.n_cols), input)
+    + alpha;


A minor style issue---the second line here should be doubly indented (four spaces, not two). Also I think you might be able to do this with a lambda passed to .transform(), which could make it a little faster.

rcurtin · 2018-04-02T16:39:51Z

src/mlpack/methods/ann/layer/flexible_relu_impl.hpp

+  DataType derivative;
+
+  //! Compute the first derivative of FlexibleReLU function.
+  derivative = input;


You can save a copy here if we do derivative.set_size(input.n_rows, input.n_cols), which will just set the size.

rcurtin · 2018-04-02T16:41:17Z

src/mlpack/methods/ann/layer/flexible_relu_impl.hpp

+  for (size_t i = 0; i < input.n_elem; i++)
+  {
+    derivative(i) = input(i) > 0? 1 : 0;
+  }


You could also replace this with a call to .transform().

Manthan-R-Sheth · 2018-04-03T04:51:35Z

@rcurtin
Yes your are right, since there is a trainable parameter, Gradient() and Parameters() should be implemented. I have done that and I have optimized the code too.
Do you think CheckGradient() test should be implemented here now?

rcurtin

Yeah, even though Gradient() is simple I think it is still a good idea to add a CheckGradient() test. I had a few other comments, sorry if they contradict what I wrote earlier about transform().

rcurtin · 2018-04-03T16:52:02Z

src/mlpack/methods/ann/layer/flexible_relu_impl.hpp

+  }
+
+  arma::mat zeros = arma::zeros<arma::Mat<eT>>(input.n_rows, input.n_cols);
+  gradient(0) = arma::accu(error % arma::min(zeros, input)) / input.n_cols;


It would be a lot better here if we could avoid allocating zeros; that will be time-consuming.

rcurtin · 2018-04-03T16:53:17Z

src/mlpack/methods/ann/layer/flexible_relu_impl.hpp

+  int i = -1;
+  output = arma::zeros<InputType>(input.n_rows, input.n_cols);
+  output.transform([input, &i, this](double val) { ++i;
+      return (std::max(input(i), 0.0) + alpha(0)); } );


Ack, I did not realize that transform() is in-place. Why not use something like arma::clamp() instead then? i.e. output = arma::clamp(input, 0.0, DBL_MAX) + alpha(0);

Yes we just needed a function like that

rcurtin · 2018-04-03T16:53:45Z

src/mlpack/methods/ann/layer/flexible_relu_impl.hpp

+  derivative.set_size(input.n_rows, input.n_cols);
+  int i = -1;
+  derivative.transform([input, &i](double val) { ++i;
+    return (input(i) > 0? 1 : 0); } );


Here I think we could just use a boolean expression: derivation = (input > 0); (or some variant like that)

I have used this :
derivative = arma::sign(input);
derivative.elem(arma::find(derivative < 0.0)) += 1;

This should be good to go or the lambda was faster already?

Manthan-R-Sheth · 2018-04-03T20:11:43Z

@rcurtin
I have updated according to the suggestions.

rcurtin · 2018-04-04T19:52:26Z

Looks like the gradient test is failing; can you look into it please?

/home/travis/build/mlpack/mlpack/src/mlpack/tests/ann_layer_test.cpp(677): fatal error in "GradientFlexibleReLULayerTest": critical check CheckGradient(function) <= 1e-4 failed [0.45575305843508807 > 0.0001]

Manthan-R-Sheth · 2018-04-05T16:55:18Z

@rcurtin @zoq
Can you have a look at the Backward() and Gradient() functions. It look right to me.
Maybe understanding what is wrong can make CheckGradient() pass too.
Infact it is passing in my local system. Is it again the random seed issue here?

zoq · 2018-04-05T18:02:13Z

I don't think that's the case here, have to take a closer look at the code, especially the Gradient step.

Manthan-R-Sheth · 2018-04-05T19:58:50Z

@zoq @rcurtin
I gave the same input for these two tests :-

bin/mlpack_test -t ANNLayerTest
bin/mlpack_test -t ANNLayerTest/GradientFlexibleReLULayerTest

and I found that the errors after model->Evaluate() are different and so are gradients. This is why its passing in 1 but failing in 2. Infact on printing the values of parameters for both cases, it was found that 2 has estimated and original gradient 0 unlike case 1.
Any ideas on what could be a possible cause for this.

zoq · 2018-04-05T22:56:51Z

src/mlpack/methods/ann/layer/flexible_relu_impl.hpp

+  DataType derivative;
+  //! Compute the first derivative of FlexibleReLU function.
+  derivative = arma::sign(input);
+  derivative.elem(arma::find(derivative < 0.0)) += 1;


We could do something like: derivative.elem(find(input > 0) ).ones(); here, I think it's easier to get. Let me know what you think.

i think i will have to initialize derivative with zeros() or input before using derivative.elem(find(input > 0) ).ones();.
So isn't present implementation faster than initializing the matrix and then operating on it?
Tell me if i missed anything.

you are right

It would be possible to also do derivative = arma::clamp(arma::sign(input), 0.0, 1.0); but I don't think this will currently give any acceleration. It may be slightly easier to read though.

zoq · 2018-04-05T22:58:27Z

src/mlpack/methods/ann/layer/flexible_relu_impl.hpp

+  {
+    gradient = arma::zeros<arma::Mat<eT>>(1, 1);
+  }
+  gradient(0) = arma::accu(error) / input.n_cols;


Shouldn't this be gradient(0) = arma::sum(error)?

sum() would give a row vector (error for each instance in input), but we need a double values for alpha (trainable parameter) and so i used accu().
it is accu(error) as the derivative of the output with respect to alpha is 1 in case of flexible relu.

Ah, right, will run some tests later today.

sure @zoq thanks.

zoq · 2018-04-05T22:58:57Z

src/mlpack/methods/ann/layer/flexible_relu_impl.hpp

+{
+  if (gradient.n_elem == 0)
+  {
+    gradient = arma::zeros<arma::Mat<eT>>(1, 1);


Let's use set_size here, to make this step somewhat faster.

zoq · 2018-04-07T11:31:46Z

Sorry for the slow response on this one, the main issue is that the output is clipped (zeroed if < 0), so a really small perturbation in the positive/negative direction ends up with the same result and has no effect. An easy solution is to rely on positive weights, which isn't perfect since we don't cover the complete range of the frelu function. So, in addition, we could compare the gradient against a precomputed one with positive/negative weights. Let me know what you think.

Here is the modified test:

/**
 * Flexible ReLU layer numerically gradient test.
 */
BOOST_AUTO_TEST_CASE(GradientFlexibleReLULayerTest)
{
  // Add function gradient instantiation.
  struct GradientFunction
  {
    GradientFunction()
    {
      input = arma::randu(2, 1);
      target = arma::mat("1");

      model = new FFN<NegativeLogLikelihood<>, RandomInitialization>(
          NegativeLogLikelihood<>(), RandomInitialization(0.1, 0.5));

      model->Predictors() = input;
      model->Responses() = target;
      model->Add<LinearNoBias<> >(2, 5);
      model->Add<FlexibleReLU<> >(0.05);
      model->Add<LogSoftMax<> >();
    }

    ~GradientFunction()
    {
      delete model;
    }

    double Gradient(arma::mat& gradient) const
    {
      arma::mat output;
      double error = model->Evaluate(model->Parameters(), 0, 1);
      model->Gradient(model->Parameters(), 0, gradient, 1);
      return error;
    }

    arma::mat& Parameters() { return model->Parameters(); }

    FFN<NegativeLogLikelihood<>, RandomInitialization>* model;
    arma::mat input, target;
  } function;

  BOOST_REQUIRE_LE(CheckGradient(function), 1e-4);
}

zoq

Thanks for looking into all the issues; I'll go ahead and merge this in 3 days to leave time for any other comments and I will fix some remaining minor style issues afterwards.

rcurtin

Looks good to me; I'm looking forward to seeing this merged. Thanks for the contribution. :)

rcurtin · 2018-04-10T16:06:28Z

src/mlpack/methods/ann/layer/flexible_relu_impl.hpp

+  DataType derivative;
+  //! Compute the first derivative of FlexibleReLU function.
+  derivative = arma::sign(input);
+  derivative.elem(arma::find(derivative < 0.0)) += 1;


It would be possible to also do derivative = arma::clamp(arma::sign(input), 0.0, 1.0); but I don't think this will currently give any acceleration. It may be slightly easier to read though.

Manthan-R-Sheth · 2018-04-10T19:36:56Z

@rcurtin @zoq
I have done the changes. Thanks.

zoq · 2018-04-11T20:44:19Z

@Manthan-R-Sheth thanks for the great contribution.

sourabhvarshney111 reviewed Mar 30, 2018

View reviewed changes

zoq reviewed Apr 1, 2018

View reviewed changes

rcurtin reviewed Apr 2, 2018

View reviewed changes

rcurtin reviewed Apr 3, 2018

View reviewed changes

aarushgupta and others added 19 commits April 4, 2018 01:44

Added flexibleReLU activation layer

591e57d

Added files to CMakeLists.txt

2f179b6

Added test for FReLU

61c804a

Fixed some errors

9c92a0e

Fixed style errors

842214e

Fixed build errors

4571a7d

Modified ann_layer_test.cpp

49efbbd

Fixed typecasting error

0f38dc1

Modified comments

0c4b121

Simplify code to reduce function call overhead

0c5dc0c

Add URL and author name

d32616b

fix citations

7e92699

fix style errors

cf47d4b

Add gradient() and lambdas

b72d5d2

Fix style errors

4f724a5

update lambda

051e29d

include this in lambda

5ed7247

add checkgradient and update passes

b3451ab

update head

00e2c9f

Manthan-R-Sheth force-pushed the flexiblerelu branch from dd5c8c8 to 00e2c9f Compare April 3, 2018 20:23

update test to uniform distribution

e1dc309

Manthan-R-Sheth added 2 commits April 5, 2018 02:54

update gradient method

da21d21

gradient update

45e82cd

zoq reviewed Apr 5, 2018

View reviewed changes

optimised gradient

36f2377

update gradient for positive parameters

476302a

zoq approved these changes Apr 8, 2018

View reviewed changes

rcurtin approved these changes Apr 10, 2018

View reviewed changes

improve readability

32a9e58

zoq merged commit 0c5b9e6 into mlpack:master Apr 11, 2018

zoq mentioned this pull request Apr 13, 2018

Added FlexibleReLU activation layer #1281

Closed

Implement Flexible ReLU #1341

Implement Flexible ReLU #1341

Conversation

Manthan-R-Sheth commented Mar 30, 2018

sourabhvarshney111 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sourabhvarshney111 commented Mar 30, 2018

sourabhvarshney111 commented Mar 31, 2018

Manthan-R-Sheth commented Mar 31, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rcurtin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Manthan-R-Sheth commented Apr 3, 2018

rcurtin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Manthan-R-Sheth commented Apr 3, 2018

rcurtin commented Apr 4, 2018

Manthan-R-Sheth commented Apr 5, 2018 • edited

zoq commented Apr 5, 2018

Manthan-R-Sheth commented Apr 5, 2018 • edited

Choose a reason for hiding this comment

Manthan-R-Sheth Apr 6, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zoq commented Apr 7, 2018

zoq left a comment

Choose a reason for hiding this comment

rcurtin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Manthan-R-Sheth commented Apr 10, 2018

zoq commented Apr 11, 2018

Manthan-R-Sheth commented Apr 5, 2018 •

edited

Manthan-R-Sheth commented Apr 5, 2018 •

edited

Manthan-R-Sheth Apr 6, 2018 •

edited