-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cauchy_ few fixes (1) check gamma > 0 (2) better dtype error log #93314
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/93314
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 7370c0a: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entry point to these functions for both CPU and CUDA is the function called cauchy_
. Why don't you put all these checks there?
The other common place for CPU and CUDA is cauchy_impl_
, so either should work.
Also, in the future, consider using ghstack for submitting these related PRs |
@@ -322,6 +322,9 @@ Tensor& exponential_impl_(Tensor& self, double lambda, c10::optional<Generator> | |||
|
|||
template<template<typename> class cauchy_kernel, typename RNG> | |||
Tensor& cauchy_impl_(Tensor& self, double median, double sigma, c10::optional<Generator> gen) { | |||
// TODO: instead of variable name 'sigma', use 'gamma' or 'scale' | |||
// the variance, squared sigma, is undefined for cauchy distribution | |||
TORCH_CHECK(sigma > 0.0, "cauchy_ expects sigma > 0.0, but found sigma=", sigma); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why restrict to sigma > 0
rather than sigma >= 0
? I would expect .cauchy_(median, sigma=0)
to behave as .fill_(median)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fritzo, I have two comments regarding sigma = 0
.
(1) lim_{sigma -> 0} F^{-1} (x ; sigma, median) = median
. This is perfectly sound for as sigma -> 0
.
(2) both the PDF and CDF of cauchy distribution are not sound when sigma = 0
Considering the two above, sure we can define sigma = 0
as the behavior at limit as sigma -> 0
, but is not the same as sigma = 0
, and PDF, CDF are collapsed for sigma = 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OTOH, this is a sampler, and the distribution of samples converges. I'd imagine this function would be used for initialization, and it's sure nice if I can set sigma=0
rather than add if ... else
logic to decide whether to inject noise.
(I think this less important, but the CDF does converge pointwise almost everywhere to the step function)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a sampler, and the distribution of samples converges
Cauchy distribution doesn't obey the law of large numbers, the sample mean doesn't converge to a finite number reflecting that the expected value is undefined for cauchy, instead converges trivially to a cauchy random variable. I don't think it's sound to return .fill_(median), for sigma =0
and have mean({x_1, x_2, ..., x_n}) = median
.
I'd imagine this function would be used for initialization, and it's sure nice if I can set sigma=0 rather than add if ... else logic to decide whether to inject noise.
I'm not sure which applications popularly use cauchy random variable but I would guess many don't soundly set sigma = 0
. Intuitively, a distribution of cutted distances not cut ?
More importantly, the derivation of cauchy CDF from tan θ = x / b
isn't sound for b = 0
. What would we return for the cdf ?
So I don't think it's much sound intuitively nor physically to define for sigma = 0
. What do you think ?
[reference] https://mathworld.wolfram.com/CauchyDistribution.html eq. 1-9
https://en.wikipedia.org/wiki/Cauchy_distribution#/media/File:Mean_estimator_consistency.gif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my main motivation is to continue satisfying an equation like
x.cauchy_(median, sigma) == x.cauchy_().mul_(sigma).add_(median) # in distribution
for any median
and sigma >= 0
, as the current code satisfies. I'm worried if we add a check for sigma > 0
, then we'll bloat user code like this:
- x.cauchy_(median, sigma)
+ if sigma > 0:
+ x.cauchy_(median, sigma)
+ else:
+ x.zero_()
but maybe that's ok 🤔
I guess in the bigger picture, this is a sampler and not a Distribution
object, so I'd think it's less important to talk about the CDF or central limit theorem. I do agree with you that torch.distributions.Cauchy
should restrict scale > 0
since that Distribution
object should support methods like .log_prob()
that are undefined at scale == 0
. Maybe as a policy we could require stricter checks to torch.distributions
wrappers and use looser checks in .{distribution}_()
methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but maybe that's ok 🤔
Thanks @fritzo, I think this is OK : ) . At the end of the day, whether torch.Tensor.cauchy_
is a sampler or torch.distributions.Cauchy
is a distribution, the generated random variable should follow the cauchy distribution.
If X
follows standard cauchy distribution, and median
a real number and sigma > 0
, then Y = median + sigma * X
follows the cauchy distribution. And we won't guarantee any behavior for parameterization outside. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. Thanks for caring about design choices, I think it's worth the debate 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha, thanks @fritzo, enjoyed our discussion debate : )
@jgong5, accidentally clicked re-request from you. Please feel free to ignore. |
@pytorchbot merge |
Merge failedReason: Approval needed from one of the following (Rule 'superuser'): Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Related #92047
(1)
torch.Tensor.cauchy_
is missing check forgamma > 0
(torch.distributions.cauchy.Cauchy
correctly checksgamma > 0
).(2) add better error log on dtype similar to exponential_
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10