New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross entropy with logit #1104

Closed
wants to merge 8 commits into
base: master
from

Conversation

Projects
None yet
5 participants
@kris-singh
Contributor

kris-singh commented Aug 26, 2017

No description provided.

@kris-singh

This comment has been minimized.

Show comment
Hide comment
@kris-singh

kris-singh Aug 27, 2017

Contributor

Not sure about the error from travis why this happened.

Contributor

kris-singh commented Aug 27, 2017

Not sure about the error from travis why this happened.

@kris-singh

This comment has been minimized.

Show comment
Hide comment
@kris-singh

kris-singh Aug 27, 2017

Contributor

@lozhnikov This works fine should we merge this. Or just discard this.

Contributor

kris-singh commented Aug 27, 2017

@lozhnikov This works fine should we merge this. Or just discard this.

@lozhnikov

We found that the layer works better than SigmoidLayer + CrossEntropyError sometimes. But later we realized that we used incorrect parameters and these two approaches give equal results on correct parameters. So, I am not sure that the layer is needed except some special cases.
P.S. Sorry for that, that was my mistake.

kris-singh added some commits Aug 29, 2017

@kris-singh

This comment has been minimized.

Show comment
Hide comment
@kris-singh

kris-singh Aug 29, 2017

Contributor

The tests pass locally. Checked thrice.

Contributor

kris-singh commented Aug 29, 2017

The tests pass locally. Checked thrice.

@lozhnikov

This comment has been minimized.

Show comment
Hide comment
@lozhnikov

lozhnikov Aug 29, 2017

Contributor
  8/114 Test   #9: ANNLayerTest .....................***Failed    2.91 sec
Running 31 test cases...
unknown location(0): fatal error in "SimpleSigmoidCrossEntropyLayerTest": memory access violation at address: 0x00000000: no mapping at fault address
/home/travis/build/mlpack/mlpack/src/mlpack/tests/ann_layer_test.cpp(1044): last checkpoint

The tests pass locally. Checked thrice.

I think valgrind could help.

Contributor

lozhnikov commented Aug 29, 2017

  8/114 Test   #9: ANNLayerTest .....................***Failed    2.91 sec
Running 31 test cases...
unknown location(0): fatal error in "SimpleSigmoidCrossEntropyLayerTest": memory access violation at address: 0x00000000: no mapping at fault address
/home/travis/build/mlpack/mlpack/src/mlpack/tests/ann_layer_test.cpp(1044): last checkpoint

The tests pass locally. Checked thrice.

I think valgrind could help.

SoftplusFunction::Deriv(static_cast<arma::mat>(-arma::abs(input(positive))),
temp);
output(positive) = 1 - target(positive) - temp;

This comment has been minimized.

@zoq

zoq Aug 29, 2017

Member

Looks like we should set the size of output first, if we like to index it here.

@zoq

zoq Aug 29, 2017

Member

Looks like we should set the size of output first, if we like to index it here.

@kris-singh

This comment has been minimized.

Show comment
Hide comment
@kris-singh

kris-singh Aug 29, 2017

Contributor

Finally, all tests pass. Let me know if anything else needs to be done.

Contributor

kris-singh commented Aug 29, 2017

Finally, all tests pass. Let me know if anything else needs to be done.

@kris-singh

This comment has been minimized.

Show comment
Hide comment
@kris-singh

kris-singh Aug 31, 2017

Contributor

Should we merge this if there are no more comments.

Contributor

kris-singh commented Aug 31, 2017

Should we merge this if there are no more comments.

kris-singh added some commits Sep 1, 2017

@kris-singh

This comment has been minimized.

Show comment
Hide comment
@kris-singh

kris-singh Sep 3, 2017

Contributor

@zoq All your comments were fixed.

Contributor

kris-singh commented Sep 3, 2017

@zoq All your comments were fixed.

@zoq

zoq approved these changes Sep 3, 2017

Looks good to me, just made some minor issues abut the style.

Show outdated Hide outdated src/mlpack/methods/ann/layer/sigmoid_cross_entropy_error.hpp
Show outdated Hide outdated src/mlpack/methods/ann/layer/sigmoid_cross_entropy_error.hpp
BOOST_REQUIRE_EQUAL(output.n_rows, input2.n_rows);
BOOST_REQUIRE_EQUAL(output.n_cols, input2.n_cols);
}

This comment has been minimized.

@zoq

zoq Sep 3, 2017

Member

Another pedantic style issue, can you remove the extra line here?

@zoq

zoq Sep 3, 2017

Member

Another pedantic style issue, can you remove the extra line here?

kris-singh added some commits Sep 3, 2017

@kris-singh

This comment has been minimized.

Show comment
Hide comment
@kris-singh

kris-singh Sep 3, 2017

Contributor

Okay i fixed all the comments.

Contributor

kris-singh commented Sep 3, 2017

Okay i fixed all the comments.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Sep 3, 2017

Member

Thanks, that was fast, looks good for me. I think this is ready to be merged.

Member

zoq commented Sep 3, 2017

Thanks, that was fast, looks good for me. I think this is ready to be merged.

@rcurtin

rcurtin approved these changes Sep 5, 2017

Looks good to me, just a few comments about optimization and comments. I think maybe I found a bug in Backward() too, please check on that. Otherwise I think it's ready.

/**
* The SigmoidCrossEntropyError performance function measures the network's
* performance according to the cross-entropy function.
* between the input and target distributions.

This comment has been minimized.

@rcurtin

rcurtin Sep 5, 2017

Member

Maybe this is an extra sentence, or the period from the previous line is in the wrong place?

@rcurtin

rcurtin Sep 5, 2017

Member

Maybe this is an extra sentence, or the period from the previous line is in the wrong place?

* The functions is much more numerically stable as can be found from the
* formula below.
* \f$max(x, 0) - x * z + \log(1 + e^{-|x|})\f$
* For more detail look here goo.gl/tRjS6j

This comment has been minimized.

@rcurtin

rcurtin Sep 5, 2017

Member

Ah, I don't know how long that link will be good, so we should avoid using it, put the details here, or maybe find a paper that we can cite.

@rcurtin

rcurtin Sep 5, 2017

Member

Ah, I don't know how long that link will be good, so we should avoid using it, put the details here, or maybe find a paper that we can cite.

arma::mat output;
SoftplusFunction::Fn(static_cast<arma::mat>(-arma::abs(input)), output);
double loss = arma::accu(input(positive)) - arma::accu(input % target) +
arma::accu(output);

This comment has been minimized.

@rcurtin

rcurtin Sep 5, 2017

Member

I think you could optimize this to avoid the positive vector, maybe like this:

double loss = arma::accu(arma::max(input, 0)) - arma::accu(input % target) + arma::accu(output);

I am not 100% sure that arma::accu(arma::max(input, 0)) will avoid creating a temporary, but if it doesn't, you could also just loop over the input matrix.

@rcurtin

rcurtin Sep 5, 2017

Member

I think you could optimize this to avoid the positive vector, maybe like this:

double loss = arma::accu(arma::max(input, 0)) - arma::accu(input % target) + arma::accu(output);

I am not 100% sure that arma::accu(arma::max(input, 0)) will avoid creating a temporary, but if it doesn't, you could also just loop over the input matrix.

This comment has been minimized.

@kris-singh

kris-singh Sep 16, 2017

Contributor

Could you explain arma::accu(arma::max(input, 0)). This would return a max number along every column whereas arma::find(input > 0) find the indicies of every non-negative input. So, how could one replace another.

@kris-singh

kris-singh Sep 16, 2017

Contributor

Could you explain arma::accu(arma::max(input, 0)). This would return a max number along every column whereas arma::find(input > 0) find the indicies of every non-negative input. So, how could one replace another.

This comment has been minimized.

@rcurtin

rcurtin Sep 16, 2017

Member

Ah yeah, sorry, I used the wrong function here. Instead of arma::accu(arma::max(input, 0)) I meant arma::accu(arma::clamp(input, 0, std::numeric_limits<double>::max())).

@rcurtin

rcurtin Sep 16, 2017

Member

Ah yeah, sorry, I used the wrong function here. Instead of arma::accu(arma::max(input, 0)) I meant arma::accu(arma::clamp(input, 0, std::numeric_limits<double>::max())).

This comment has been minimized.

@ShikharJ

ShikharJ Jan 18, 2018

Member

@rcurtin arma::clamp does create a copy of the original matrix. How do you wish to proceed here?

@ShikharJ

ShikharJ Jan 18, 2018

Member

@rcurtin arma::clamp does create a copy of the original matrix. How do you wish to proceed here?

This comment has been minimized.

@rcurtin

rcurtin Jan 18, 2018

Member

@ShikharJ: are you sure that the combination of accu(clamp(...)) will create a temporary matrix? It looks like clamp() returns an arma::mtOp<...> which signifies that the clamp operation should be done, but I think then accu() will use an arma::Proxy<> object, which I think in turn will create a temporary for an mtOp<>. The best way to check for sure would be to create a simple program, compile with -DARMA_EXTRA_DEBUG, and parse through the output to see if the constructor for arma::Mat<eT> is called during the accu(clamp()) operation.

In any case, if you do confirm that this makes a copy, you can just write a loop over the input matrix and manually clamp the elements to be greater than zero as you sum them.

@rcurtin

rcurtin Jan 18, 2018

Member

@ShikharJ: are you sure that the combination of accu(clamp(...)) will create a temporary matrix? It looks like clamp() returns an arma::mtOp<...> which signifies that the clamp operation should be done, but I think then accu() will use an arma::Proxy<> object, which I think in turn will create a temporary for an mtOp<>. The best way to check for sure would be to create a simple program, compile with -DARMA_EXTRA_DEBUG, and parse through the output to see if the constructor for arma::Mat<eT> is called during the accu(clamp()) operation.

In any case, if you do confirm that this makes a copy, you can just write a loop over the input matrix and manually clamp the elements to be greater than zero as you sum them.

This comment has been minimized.

@ShikharJ

ShikharJ Jan 19, 2018

Member

The Armadillo documentation mentions this about clamp():

Create a copy of X with each element clamped to be between min_val and max_val

accu() should just return the sum of the intended elements. So all in all, accu(clamp(...)) will internally use a temporary, but just return the sum in the end. Am I misunderstanding the context of your concern?

@ShikharJ

ShikharJ Jan 19, 2018

Member

The Armadillo documentation mentions this about clamp():

Create a copy of X with each element clamped to be between min_val and max_val

accu() should just return the sum of the intended elements. So all in all, accu(clamp(...)) will internally use a temporary, but just return the sum in the end. Am I misunderstanding the context of your concern?

This comment has been minimized.

@rcurtin

rcurtin Jan 21, 2018

Member

Yes, so the documentation is not misleading but it doesn't mention that Armadillo is sometimes able to take advantage of expressions and avoid copies. Based on what I saw in the Armadillo code, Armadillo is not currently able to avoid a copy in this case, so if it is ok with you, I think it would be better to manually write a loop to calculate the sum here.

@rcurtin

rcurtin Jan 21, 2018

Member

Yes, so the documentation is not misleading but it doesn't mention that Armadillo is sometimes able to take advantage of expressions and avoid copies. Based on what I saw in the Armadillo code, Armadillo is not currently able to avoid a copy in this case, so if it is ok with you, I think it would be better to manually write a loop to calculate the sum here.

This comment has been minimized.

@rcurtin

rcurtin Jan 21, 2018

Member

Ah, I see in #1200 that you already wrote the loop. Thanks! :)

@rcurtin

rcurtin Jan 21, 2018

Member

Ah, I see in #1200 that you already wrote the loop. Thanks! :)

output(positive) = 1 - target(positive) - temp;
SoftplusFunction::Deriv(static_cast<arma::mat>(-arma::abs(input(negative))),
temp);
output(negative) = -(target(negative)) - negative;

This comment has been minimized.

@rcurtin

rcurtin Sep 5, 2017

Member

Do you think you could implement this all as a single loop over the target matrix? i.e.

for (size_t i = 0; i < output.n_elem; ++i)
{
  const double val = target[i];
  if (val > 0)
  {
    output[i] = 1 - val - Deriv(-std::abs(input[i]));
  }
  else
  {
    output[i] = -val - Deriv(-std::abs(input[i]));
  }
}

Also, is -(target(negative)) - negative correct, or should that be -(target(negative)) - temp?

@rcurtin

rcurtin Sep 5, 2017

Member

Do you think you could implement this all as a single loop over the target matrix? i.e.

for (size_t i = 0; i < output.n_elem; ++i)
{
  const double val = target[i];
  if (val > 0)
  {
    output[i] = 1 - val - Deriv(-std::abs(input[i]));
  }
  else
  {
    output[i] = -val - Deriv(-std::abs(input[i]));
  }
}

Also, is -(target(negative)) - negative correct, or should that be -(target(negative)) - temp?

This comment has been minimized.

@lozhnikov

lozhnikov Sep 5, 2017

Contributor

I think -(target(negative)) + temp is correct i.e. the sign of the second term was incorrect.
If x < 0 then the derivative of log(1+exp(-abs(x))) is equal to the derivative of log(1 + exp(x)).

@lozhnikov

lozhnikov Sep 5, 2017

Contributor

I think -(target(negative)) + temp is correct i.e. the sign of the second term was incorrect.
If x < 0 then the derivative of log(1+exp(-abs(x))) is equal to the derivative of log(1 + exp(x)).

// Test the Forward function on a user generator input and compare it against
// the manually calculated result.
input1 = arma::mat("0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5");

This comment has been minimized.

@rcurtin

rcurtin Sep 5, 2017

Member

Since the behavior is different depending on whether the input is positive or negative, should you add some tests with negative inputs?

@rcurtin

rcurtin Sep 5, 2017

Member

Since the behavior is different depending on whether the input is positive or negative, should you add some tests with negative inputs?

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Sep 14, 2017

Member

@kris-singh: did you want to handle the issues I pointed out? Once those are done I think that everyone agrees this is ready for merge (@zoq or @lozhnikov correct me if I am wrong there please).

Member

rcurtin commented Sep 14, 2017

@kris-singh: did you want to handle the issues I pointed out? Once those are done I think that everyone agrees this is ready for merge (@zoq or @lozhnikov correct me if I am wrong there please).

@kris-singh

This comment has been minimized.

Show comment
Hide comment
@kris-singh

kris-singh Sep 14, 2017

Contributor
Contributor

kris-singh commented Sep 14, 2017

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Sep 14, 2017

Member

Of course, look when you have a chance. You did a lot of hard work over the summer so I want to make sure we can merge as much as we can. :)

Member

rcurtin commented Sep 14, 2017

Of course, look when you have a chance. You did a lot of hard work over the summer so I want to make sure we can merge as much as we can. :)

@lozhnikov

This comment has been minimized.

Show comment
Hide comment
@lozhnikov

lozhnikov Sep 14, 2017

Contributor

@rcurtin Let me look closely into the test. I don't understand why the test didn't point out this issue (#1104 (comment) #1104 (comment)).

Contributor

lozhnikov commented Sep 14, 2017

@rcurtin Let me look closely into the test. I don't understand why the test didn't point out this issue (#1104 (comment) #1104 (comment)).

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Sep 15, 2017

Member

@lozhnikov: I think that the issue wasn't pointed out by the tests because the Forward() function is only being tested with positive inputs:

+  // Test the Forward function on a user generator input and compare it against
+  // the manually calculated result.
+  input1 = arma::mat("0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5");
Member

rcurtin commented Sep 15, 2017

@lozhnikov: I think that the issue wasn't pointed out by the tests because the Forward() function is only being tested with positive inputs:

+  // Test the Forward function on a user generator input and compare it against
+  // the manually calculated result.
+  input1 = arma::mat("0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5");

@ShikharJ ShikharJ referenced this pull request Jan 17, 2018

Merged

Cross Entropy With Logits #1200

2 of 2 tasks complete
@ShikharJ

This comment has been minimized.

Show comment
Hide comment
@ShikharJ

ShikharJ Feb 1, 2018

Member

@zoq I think this PR can also be closed now. Thanks @kris-singh for your work. This has been incorporated in #1200.

Member

ShikharJ commented Feb 1, 2018

@zoq I think this PR can also be closed now. Thanks @kris-singh for your work. This has been incorporated in #1200.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Feb 1, 2018

Member

Correct, thanks @kris-singh and @ShikharJ for the great work.

Member

zoq commented Feb 1, 2018

Correct, thanks @kris-singh and @ShikharJ for the great work.

@zoq zoq closed this Feb 1, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment