-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logcumsumexp for CPU #93153
Logcumsumexp for CPU #93153
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/93153
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit e787da3: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
c10::complex<scalar_t> _logcumsumexp_minmax(c10::complex<scalar_t> x, c10::complex<scalar_t> y, bool min) { | ||
scalar_t xr = std::real(x); | ||
scalar_t yr = std::real(y); | ||
if (std::isnan(yr) || (std::isnan(std::imag(y)))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert in complex primitives that are already in c10
, but wouldn't it be nicer to implement something like
namespace std {
template<typename scalar_t>
bool isnan(const c10::complex<scalar_t>& v) {
return std::isnan(v.real()) || std::isnan(v.imag());
and then use the following throughout the codebase
if (std::isnan(yr) || (std::isnan(std::imag(y)))) { | |
if (std::isnan(y)) { |
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert in c10 either and I'm not sure if this function is already available. It's better to discuss it with c10 experts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no harm in implementing such template in https://github.com/pytorch/pytorch/blob/master/c10/util/complex_utils.h
But at the very least just create a convenience template in this code and use it instead of explicit isnan(x.real()) || isnan(x.imag())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented it in the newest commit. I was hesitated first because there is no agreed behaviour for std::nan
for complex numbers, but torch.isnan
defined a complex number to be nan if either real or imag part is nan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was tempted to suggest using at::_isnan
, but dispatch complexity would probably be an overkill. Thank you for addressing the suggested change. Please fix the lint though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed the change for fixing lint. If it compiles, we should ship it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope we can add this to CUDA too with the help of Jiterator or such.
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Permission to start a discussion about jiterator for logcumsumexp here. I've spent some time to see the code that uses jiterator and related functions such as cumprod, cumsum, etc, and here are my observations:
Does it mean in order to use jiterator for logcumsumexp, we need to write the |
Yes, that's correct, and also, given that |
Hopefully fixes #89205. This is another version of #90847 where it was reverted because it increases the compile-time significantly. From my discussion with @ngimel in #93153 (comment), it seems the option of jiterator would be very tricky if not impossible. So what I did was to optimize the compile-time in my computer. To optimize the build time, first I compile the pytorch as a whole, then only change the `LogcumsumexpKernel.cu` file to see how it changes the compile time. Here are my results for the compilation time of only the `LogcumsumexpKernel.cu` file in my computer: - Original version (without any complex implementations): 56s (about 1 minute) - The previous PR (#90847): 13m 57s (about 14 minutes) - This PR: 3m 35s (about 3.5 minutes) If the previous PR increases the build time by 30 mins in pytorch's computer, then this PR reduces the increment of build time to about 6 mins. Hopefully this is an acceptable level of build-time increase. What I did was (sorted by how significant it reduces the build time from the most significant one): - Substituting `log(x)` to `log1p(x - 1)`. This is applied in the infinite case, so we don't really care about precision. - Implementing complex exponential manually tag: @malfet, @albanD Pull Request resolved: #94310 Approved by: https://github.com/Skylion007, https://github.com/malfet
Continuation of PR #93153 where I implemented logaddexp for complex, but didn't expose it to `torch.logaddexp`. So this PR is to expose the complex logaddexp to `torch.logaddexp`. Pull Request resolved: #95717 Approved by: https://github.com/lezcano
Partial work from #90847, in the direction of solving #89205.
Most of the content is from #90847, but this is only for CPU, so hopefully it does not increase the build time by a lot.
tag: @albanD, @malfet
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10