Logcumsumexp for CPU #93153

mfkasim1 · 2023-01-27T14:56:18Z

Partial work from #90847, in the direction of solving #89205.
Most of the content is from #90847, but this is only for CPU, so hopefully it does not increase the build time by a lot.

tag: @albanD, @malfet

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

pytorch-bot · 2023-01-27T14:56:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/93153

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e787da3:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

aten/src/ATen/native/cpu/ReduceOpsKernel.cpp

malfet · 2023-01-27T15:13:44Z

aten/src/ATen/native/cpu/ReduceOpsKernel.cpp

+c10::complex<scalar_t> _logcumsumexp_minmax(c10::complex<scalar_t> x, c10::complex<scalar_t> y, bool min) {
+  scalar_t xr = std::real(x);
+  scalar_t yr = std::real(y);
+  if (std::isnan(yr) || (std::isnan(std::imag(y)))) {


I'm not an expert in complex primitives that are already in c10, but wouldn't it be nicer to implement something like

namespace std { template<typename scalar_t> bool isnan(const c10::complex<scalar_t>& v) { return std::isnan(v.real()) || std::isnan(v.imag());

and then use the following throughout the codebase

Suggested change

if (std::isnan(yr) || (std::isnan(std::imag(y)))) {

if (std::isnan(y)) {

}

I'm not an expert in c10 either and I'm not sure if this function is already available. It's better to discuss it with c10 experts.

I see no harm in implementing such template in https://github.com/pytorch/pytorch/blob/master/c10/util/complex_utils.h
But at the very least just create a convenience template in this code and use it instead of explicit isnan(x.real()) || isnan(x.imag())

I implemented it in the newest commit. I was hesitated first because there is no agreed behaviour for std::nan for complex numbers, but torch.isnan defined a complex number to be nan if either real or imag part is nan.

I was tempted to suggest using at::_isnan, but dispatch complexity would probably be an overkill. Thank you for addressing the suggested change. Please fix the lint though

Pushed the change for fixing lint. If it compiles, we should ship it

aten/src/ATen/native/cpu/ReduceOpsKernel.cpp

c10/util/complex_utils.h

Skylion007

I hope we can add this to CUDA too with the help of Jiterator or such.

mfkasim1 · 2023-01-27T21:07:00Z

@pytorchbot merge

pytorchmergebot · 2023-01-27T21:08:40Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

mfkasim1 · 2023-01-30T11:57:40Z

Permission to start a discussion about jiterator for logcumsumexp here. I've spent some time to see the code that uses jiterator and related functions such as cumprod, cumsum, etc, and here are my observations:

None of the accumulating functions (cumprod, cumsum, etc) are using jiterator. They are using the standard dispatcher and scan_dim function
It seems like binary and unary functions are using jitted_gpu_kernel and reduce functions are using jitted_gpu_reduce_kernel, but I can't find anything like jitted_gpu_accum_kernel or jitted_gpu_scan_kernel.

Does it mean in order to use jiterator for logcumsumexp, we need to write the jitted_gpu_accum_kernel?

ngimel · 2023-01-30T17:44:40Z

Yes, that's correct, and also, given that scan_dim uses cub underneatch, tbh I don't know how feasible it is to write jitted_gpu_accum_kernel given that nvrtc needs all the headers at runtime, and we don't ship cub headers.

@ngimel

Hopefully fixes #89205. This is another version of #90847 where it was reverted because it increases the compile-time significantly. From my discussion with @ngimel in #93153 (comment), it seems the option of jiterator would be very tricky if not impossible. So what I did was to optimize the compile-time in my computer. To optimize the build time, first I compile the pytorch as a whole, then only change the `LogcumsumexpKernel.cu` file to see how it changes the compile time. Here are my results for the compilation time of only the `LogcumsumexpKernel.cu` file in my computer: - Original version (without any complex implementations): 56s (about 1 minute) - The previous PR (#90847): 13m 57s (about 14 minutes) - This PR: 3m 35s (about 3.5 minutes) If the previous PR increases the build time by 30 mins in pytorch's computer, then this PR reduces the increment of build time to about 6 mins. Hopefully this is an acceptable level of build-time increase. What I did was (sorted by how significant it reduces the build time from the most significant one): - Substituting `log(x)` to `log1p(x - 1)`. This is applied in the infinite case, so we don't really care about precision. - Implementing complex exponential manually tag: @malfet, @albanD Pull Request resolved: #94310 Approved by: https://github.com/Skylion007, https://github.com/malfet

Continuation of PR #93153 where I implemented logaddexp for complex, but didn't expose it to `torch.logaddexp`. So this PR is to expose the complex logaddexp to `torch.logaddexp`. Pull Request resolved: #95717 Approved by: https://github.com/lezcano

Logcumsumexp for CPU

009cbc8

mfkasim1 requested review from mruberry, ngimel, albanD and soulitzer as code owners January 27, 2023 14:56

github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Jan 27, 2023

malfet approved these changes Jan 27, 2023

View reviewed changes

pytorchbot added the open source label Jan 27, 2023

malfet reviewed Jan 27, 2023

View reviewed changes

mfkasim1 added 2 commits January 27, 2023 15:53

Various changes in logcumsumexp

72f19e5

Defining std::isnan in c10

764afbc

malfet reviewed Jan 27, 2023

View reviewed changes

c10/util/complex_utils.h Outdated Show resolved Hide resolved

Update c10/util/complex_utils.h

e787da3

Skylion007 approved these changes Jan 27, 2023

View reviewed changes

Skylion007 added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 27, 2023

pytorchmergebot added the Merged label Jan 27, 2023

pytorchmergebot closed this in 75cfc0b Jan 27, 2023

Skylion007 mentioned this pull request Jan 28, 2023

[complex] nansum & nanmean #93199

Closed

mfkasim1 mentioned this pull request Feb 7, 2023

Logcumsumexp for CUDA (build-time optimized) #94310

Closed

mfkasim1 mentioned this pull request Feb 28, 2023

Logaddexp for complex in CPU #95717

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logcumsumexp for CPU #93153

Logcumsumexp for CPU #93153

mfkasim1 commented Jan 27, 2023 •

edited by pytorch-bot bot

pytorch-bot bot commented Jan 27, 2023 •

edited

malfet Jan 27, 2023

mfkasim1 Jan 27, 2023

malfet Jan 27, 2023 •

edited

mfkasim1 Jan 27, 2023

malfet Jan 27, 2023

malfet Jan 27, 2023

Skylion007 left a comment

mfkasim1 commented Jan 27, 2023

pytorchmergebot commented Jan 27, 2023

mfkasim1 commented Jan 30, 2023

ngimel commented Jan 30, 2023

	if (std::isnan(yr) \|\| (std::isnan(std::imag(y)))) {
	if (std::isnan(y)) {

Logcumsumexp for CPU #93153

Logcumsumexp for CPU #93153

Conversation

mfkasim1 commented Jan 27, 2023 • edited by pytorch-bot bot

pytorch-bot bot commented Jan 27, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/93153

✅ No Failures

malfet Jan 27, 2023

Choose a reason for hiding this comment

mfkasim1 Jan 27, 2023

Choose a reason for hiding this comment

malfet Jan 27, 2023 • edited

Choose a reason for hiding this comment

mfkasim1 Jan 27, 2023

Choose a reason for hiding this comment

malfet Jan 27, 2023

Choose a reason for hiding this comment

malfet Jan 27, 2023

Choose a reason for hiding this comment

Skylion007 left a comment

Choose a reason for hiding this comment

mfkasim1 commented Jan 27, 2023

pytorchmergebot commented Jan 27, 2023

Merge started

mfkasim1 commented Jan 30, 2023

ngimel commented Jan 30, 2023

mfkasim1 commented Jan 27, 2023 •

edited by pytorch-bot bot

pytorch-bot bot commented Jan 27, 2023 •

edited

malfet Jan 27, 2023 •

edited