Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve torch.fft n-dimensional transforms #46911

Closed
wants to merge 19 commits into from

Conversation

peterbell10
Copy link
Collaborator

@peterbell10 peterbell10 commented Oct 27, 2020

Stack from ghstack:

Differential Revision: D25420647

@dr-ci
Copy link

dr-ci bot commented Oct 27, 2020

💊 CI failures summary and remediations

As of commit e42964b (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 80 times.

@peterbell10
Copy link
Collaborator Author

This replaces use of _fft_with_size in torch.fft with three new operators: _fft_c2c, _fft_r2c, and _fft_c2r. These are separated so you can't pass invalid parameter combinations (e.g. complex_input=True, complex_output=True is possible with _fft_with_size but is meaningless).

The new operators can transform arbitrary dims and also almost-never make an extra tensor copy because the backends have been reworked to make fewer assumptions about the format of the input. They also work directly with complex dtypes instead of going through view_as_real as a prior step.

I can't remove _fft_with_size yet because (as far as I know) torch.complex32 isn't fully working yet, so the real version is the only way to access cuFFT's half-precision FFTs.

@facebook-github-bot
Copy link
Contributor

Hi @peterbell10!

Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but we do not have a signature on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@mruberry
Copy link
Collaborator

mruberry commented Nov 2, 2020

I can't remove _fft_with_size yet because (as far as I know) torch.complex32 isn't fully working yet, so the real version is the only way to access cuFFT's half-precision FFTs.

Do we really support torch.complex32, though? If it's simpler not to then I'd rather we not bother.

@@ -370,44 +300,36 @@ Tensor fft_ifftn(const Tensor& self, c10::optional<IntArrayRef> s,

Tensor fft_rfftn(const Tensor& self, c10::optional<IntArrayRef> s,
c10::optional<IntArrayRef> dim,
c10::optional<std::string> norm) {
c10::optional<std::string> norm_str) {
TORCH_CHECK(!self.is_complex(), "Expected a real input tensor to rfftn");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Expected a real input tensor to rfftn" -> "rfftn expects a real-valued input tensor, but got {dtype}!"

@peterbell10 peterbell10 marked this pull request as ready for review November 27, 2020 16:25
@peterbell10
Copy link
Collaborator Author

I'm happy with this PR, so I've marked it ready for review. The benchmark PR review seemed to stall but I think @robieta was okay with the benchmark itself and the discussion was more around where to put the files. So, I think the results can be trusted even though it's not merged yet.

I have however marked #46912 as draft and moved it to the end of the stack. I wasn't seeing a consistent improvement, and in some cases a huge regression in performance so might need to reconsider it. The other two PRs in this stack should be good though.

@mruberry
Copy link
Collaborator

I'm happy with this PR, so I've marked it ready for review. The benchmark PR review seemed to stall but I think @robieta was okay with the benchmark itself and the discussion was more around where to put the files. So, I think the results can be trusted even though it's not merged yet.

Sounds good. What did you decide to do about complex half support and _fft_with_size?

I have however marked #46912 as draft and moved it to the end of the stack. I wasn't seeing a consistent improvement, and in some cases a huge regression in performance so might need to reconsider it. The other two PRs in this stack should be good though.

OK. Let's discuss those after this PR. Would it be helpful to take a break from this stack and refocus on tasks preparing torch.fft for 1.8? That is, removing the torch.fft function and importing the torch.fft module by default (and updating the documentation to reflect this change). That should be a simple change but we should make it sooner rather than later to see if there are issues with the deprecation.

@peterbell10
Copy link
Collaborator Author

What did you decide to do about complex half support and _fft_with_size?

I don't think it's blocking for this PR. I have no idea if it's widely used and it's certainly not needed for numpy/scipy compatibility. So, just dropping support might be okay. If not, then I think this code should actually call cuFFT correctly for complex half already, and the main issue for support is the other aten operators that get called during the FFT. For example rfft(x) works if I remove the TORCH_CHECK(dtype == kFloat || dtype == kDouble, ...) but rfftn(x) fails with

RuntimeError: "copy_" not implemented for 'ComplexHalf'

Would it be helpful to take a break from this stack and refocus on tasks preparing torch.fft for 1.8? That is, removing the torch.fft function and importing the torch.fft module by default (and updating the documentation to reflect this change). That should be a simple change but we should make it sooner rather than later to see if there are issues with the deprecation.

Sure thing, I can start work on that tomorrow.

@mruberry
Copy link
Collaborator

I don't think it's blocking for this PR. I have no idea if it's widely used and it's certainly not needed for numpy/scipy compatibility. So, just dropping support might be okay.

Let's protect ourselves by disabling any complex half support we currently have and simplifying the code appropriately. If we decided to support complex half in the future we should do a holistic review.

With complex half unsupported, would there be any changes you'd like to make in this PR still? Earlier you suggested that not supporting complex half would let this PR remove _fft_with_size?

@peterbell10
Copy link
Collaborator Author

With complex half unsupported, would there be any changes you'd like to make in this PR still? Earlier you suggested that not supporting complex half would let this PR remove _fft_with_size?

Ignoring half issues, _fft_with_size could be replaced by the new operators. However, if we're good to remove torch.{fft,rfft,ifft,irfft} now then it makes more sense to remove _fft_with_size in the same PR, so _fft_with_size wouldn't have any callers to rewrite.

@mruberry
Copy link
Collaborator

Ignoring half issues, _fft_with_size could be replaced by the new operators. However, if we're good to remove torch.{fft,rfft,ifft,irfft} now then it makes more sense to remove _fft_with_size in the same PR, so _fft_with_size wouldn't have any callers to rewrite.

That makes sense. So you're saying let's keep this as-is but then in the "1.8 readiness" PR described above we remove these functions and _fft_with_size then? Sounds like a plan to me.

@mruberry
Copy link
Collaborator

mruberry commented Dec 2, 2020

ASAN failure looks real:

Nov 27 18:57:01   test_fft_backward_cpu_float64 (__main__.TestFFTCPU) ... /var/lib/jenkins/workspace/aten/src/ATen/native/mkl/SpectralOps.cpp:88:19: runtime error: addition of unsigned offset to 0x619001e07ca0 overflowed to 0x619001e07c90

@peterbell10
Copy link
Collaborator Author

Very weird ASAN error since all the stride calculation there is done in signed specifically to allow negative stride offsets. Just added __ubsan_ignore_undefined__ and the false-positive is silenced. @mruberry PTAL.

@@ -61,7 +73,7 @@ void _fft_fill_with_conjugate_symmetry_slice(
// We explicitly loop over one row, then use this lambda to iterate over
// n-dimensions. This advances iter_index by one row, while updating in_ptr
// and out_ptr to point to the new row of data.
auto advance_index = [&] {
auto advance_index = [&] () __ubsan_ignore_undefined__ {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't exhibit undefined behavior. Let's investigate some more. Maybe start by typing the loop variable correctly to iter_indexes size type. The loop is currently doing an unsigned vs signed comparison.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't exhibit undefined behavior.

It doesn't, ubsan just doesn't seem to like this line:

out_ptr += out_strides[i];

because out_strides can be negative. It seems to be treating the pointer as if it were unsigned for some reason.

Maybe start by typing the loop variable correctly to iter_indexes size type.

DimVector::size_type is just size_t which is already the type.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, I thought it was int64_t.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either this is a false positive, or the the error message is just misleading. It specifically says unsigned offset, but IntArrayRef is int64_t so out_strides is definitely a signed offset.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assume the error message is misleading, but there's a valid UBSAN issue.

Could this be complaining, for example, that

out_ptr += (signal_half_sizes[i] - 1) * out_strides[i];

is upsetting the sanitizer because it's applying a negative offset to out_ptr?

The message states that address 0x619001e07ca0 "overflows" to 0x619001e07c90. Let's just focus on those last two digits where these values are different. 0xa0 = 160, and 0x90 = 144. So Maybe the sanitizer thinks there was overflow because it's not expected a negative offset to be += to the pointer to arrive at a final offset of 144?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what I've been saying. The sanitizer doesn't seem to like that the calculated offset is negative. However, it's entirely expected to be negative because it's using negative strides. And as far as I'm aware, there's no undefined behaviour there.

}

const auto value_type = c10::toValueType(input.scalar_type());
out.resize_(batched_out_sizes, MemoryFormat::Contiguous);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OK for now since the fft operations don't take out=, but in the future this will need to be updated to use resize_output().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't actually match the expected output shape though. I'm not sure the cufft backend could implement an out argument much better than just copying into it at the end.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Luckily a problem for tomorrow ;)

@mruberry mruberry self-requested a review December 8, 2020 08:32
Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @peterbell10! Just one small nit on the formatting of a string. Let me know when this is ready to be merged.

@peterbell10
Copy link
Collaborator Author

@mruberry the rfftn error message has been updated.

@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in fc0a3a1.

@facebook-github-bot facebook-github-bot deleted the gh/peterbell10/22/head branch December 13, 2020 15:17
peterbell10 added a commit to peterbell10/pytorch that referenced this pull request Dec 15, 2020
ghstack-source-id: 6da7a2d097a4b5b8dc1b4dd709b945f62302cd8f
Pull Request resolved: pytorch#46911
@peterbell10 peterbell10 mentioned this pull request Dec 16, 2020
37 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants