New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
normal_lcdf and normal_lccdf give infinite gradients for infinite inputs. #2881
Comments
Thank you for opening this issue, @jsocolar. |
@jsocolar, thanks for bringing this up. I'm putting up a minimal example in C++ to walk through it. If you pop this into a file called #include <stan/math.hpp>
#include <gtest/gtest.h>
TEST(normal_lcdf, inf_y) {
double y = stan::math::INFTY;
stan::math::var mu = 0.0;
stan::math::var sigma = 1.0;
stan::math::var target = stan::math::normal_lcdf(y, mu, sigma);
EXPECT_FLOAT_EQ(0.0, target.val());
target.grad();
EXPECT_FLOAT_EQ(0.0, mu.adj());
EXPECT_TRUE(isnan(sigma.adj()));
stan::math::recover_memory();
}
Just curious... how'd you come to this conclusion? And which gradient did you think was "infinite"? (The gradient has two elements.) It'd be good to know how you're thinking about what the math library does and how it's generating gradients, especially at the boundary conditions. To get into the example, if you look at the test:
( I think you're expecting I didn't trace through to figure out why it's having trouble computing that term, but I'm sure it could be addressed. I'd lean towards throwing an exception at the boundaries, but I think that would change the behavior of Stan in a way that we'd have to have a larger discussion to implement. Thoughts? |
@syclik Thanks for taking a look!
I concluded that the gradient should be zero because that's the well defined limit of the gradient with respect to both
and getting back a bunch of
Based on your follow-up, I assume that the problem is the partial wrt sigma, that the The use case for returning zero here is given in the discourse thread linked from the OP. It shouldn't be particularly burdensome to code around the current behavior, which I agree isn't necessarily wrong, but if there's appetite on the Stan side for returning zero for the partial wrt sigma then that'd also solve the original discourse issue. |
Description
Consider a program:
When
y
is infinite (and given thatmu
andsigma
are finite, here ensured by declaring them as parameters), then the gradient is zero. However, it seems that Stan yields infinite gradients here.This came up "in the wild" here:
https://discourse.mc-stan.org/t/conditional-truncation-including-inf/30627
If possible, it would be nice to special-case infinite
y
and return the correct gradients. However, there's some question beyond my expertise of whether this special-casing would play nicely with vectorization in opencl.The text was updated successfully, but these errors were encountered: