Add `torch.nansum` #38628

kshitij12345 · 2020-05-17T15:59:12Z

Reference: #38349

dr-ci · 2020-05-17T16:36:52Z

💊 CI failures summary and remediations

As of commit c325b99 (more details on the Dr. CI page):

2/4 failures possibly* introduced in this PR
- 2/2 non-CircleCI failure(s)
2/4 broken upstream at merge base b6810c1 since Aug 08

🚧 2 ongoing upstream failures:

These were probably caused by upstream breakages that are not fixed yet:

binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build since Aug 08
- 🔁 rerun
caffe2_onnx_main_py3_6_clang7_ubuntu16_04_build since Aug 08
- 🔁 rerun

ci.pytorch.org: 2 failed

Failed: pr/caffe2-pytorch-linux-xenial-rocm3.5.1-py3.6-test
Failed: pr/pytorch-linux-xenial-rocm3.5.1-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 157 times.

* add type promotion logic mimicking numpy. * add gradient function. * add tests. * update NanSumOps.

mruberry · 2020-05-19T08:02:00Z

Hey @kshitij12345! Let me know when you're ready for a review.

kshitij12345 · 2020-05-19T13:51:50Z

@mruberry Sure. Sorry forgot to tag this as [WIP]. Will ping you once ready or if I have any doubts.

…lop/numpy/nansum

kshitij12345 · 2020-06-14T13:59:45Z

@mruberry

It is ready for review now. Please review :)

aten/src/ATen/native/ReduceOps.cpp

aten/src/ATen/native/SharedReduceOps.h

kshitij12345 · 2020-07-18T07:42:07Z

@mruberry

Have addressed the comments. Please review :)

Thanks!

kshitij12345 · 2020-07-21T10:25:15Z

@mruberry Gentle ping:)

kshitij12345 · 2020-07-23T07:17:48Z

@mruberry Gentle ping:)

mruberry · 2020-07-23T07:18:46Z

@mruberry Gentle ping:)

Was just about to update this! It's going to take me a few days to get to because I have to go through the sum/prod changes extremely carefully.

kshitij12345 · 2020-07-23T07:21:52Z

Sure. Thanks! Actually I am a bit sceptical related to float16 cases given that we had to increase the tolerance a lot for it.

Also would it be okay if I ping you on slack related to a doubt on another PR?

mruberry · 2020-07-23T07:22:42Z

Sure. Thanks! Actually I am a bit sceptical related to float16 cases given that we had to increase the tolerance a lot for it.

Also would it be okay if I ping you on slack related to a doubt on another PR?

Yes of course.

mruberry · 2020-07-29T09:17:37Z

This is still on my radar and I should get to it very soon.

mruberry · 2020-07-30T04:46:57Z

aten/src/ATen/native/cuda/ReduceSumProdKernel.cu

+        typename out_t = scalar_t>
+    typename OpFunctor,
+    typename GeneralDispatcher>
+static void reduce_dispatch(TensorIterator& iter, GeneralDispatcher op) {


The code here looks elegant, but is there a performance impact on the existing functions, like sum, by building the callable structs each time this function is called?

Follow-up question from look the the templates: the functions of interest all have the same signature, right?

Could this be written, for example, as one function for each op that called a helper that handled the common part (lines 54-67 below) and then implemented the function-specific dispatch? Further, can the helper avoid be a function template by using the common signature of these functions to specify its function pointer argument?

I think it should not be an issue as the structs method will get inlined and construction of that struct is trivial.
Reference: https://stackoverflow.com/a/18753022/5602957

Also tried simulating a similar code on CompilerExplorer
https://godbolt.org/z/7WTWhe

Compiler is able to actually deduce the finally value, so it don't think this structure should hinder the compiler optimizations.

Follow-up question from look the the templates: the functions of interest all have the same signature, right?

Could this be written, for example, as one function for each op that called a helper that handled the common part (lines 54-67 below) and then implemented the function-specific dispatch? Further, can the helper avoid be a function template by using the common signature of these functions to specify its function pointer argument?

Slightly confused. Could you please give sample code.

Thanks for looking into it.

I think it should not be an issue as the structs method will get inlined and construction of that struct is trivial.
Reference: https://stackoverflow.com/a/18753022/5602957

Also tried simulating a similar code on CompilerExplorer
https://godbolt.org/z/7WTWhe

Compiler is able to actually deduce the finally value, so it don't think this structure should hinder the compiler optimizations.

Follow-up question from look the the templates: the functions of interest all have the same signature, right?
Could this be written, for example, as one function for each op that called a helper that handled the common part (lines 54-67 below) and then implemented the function-specific dispatch? Further, can the helper avoid be a function template by using the common signature of these functions to specify its function pointer argument?

Slightly confused. Could you please give sample code.

Thanks for looking into it.

Thanks for investigating. I think that addresses my concern so this should be fine.

test/test_torch.py

mruberry · 2020-07-30T05:05:52Z

Hey @kshitij12345, thank you for being so patient. I took a close look and overall things look very good. I have a question about the organization of the common function to call prod, sum, and nansum. The code is very elegant, but I'm a little concerned about its effect on performance and wonder if its template logic can be further simplified. I look forward to hear your thoughts!

I should be much more responsive now, so we'll get this in quickly!

* add with_extremal for general case. * minor refactor of test code.

mruberry · 2020-07-31T03:27:17Z

Test looks great, thanks @kshitij12345!

mruberry

Nice work, @kshitij12345! This is the first "nan*" function in PyTorch!

facebook-github-bot

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mruberry · 2020-08-04T18:37:41Z

Update: this triggered some internal perf warnings. Rerunning some tests now to verify.

mruberry · 2020-08-07T22:37:22Z

Tests came back negative again. Going to try one more time. We may need to refactor this.

kshitij12345 · 2020-08-09T05:57:17Z

Thanks for the heads-up. Let me know if there are any changes needed :).

facebook-github-bot

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mruberry · 2020-08-12T03:59:41Z

Follow-up perf runs suggest the initial failure was flakiness in the benchmark. Initiating land process.

facebook-github-bot · 2020-08-12T08:13:02Z

@mruberry merged this pull request in ab0a04d.

add nansum op

223b64f

pytorchbot added the open source label May 17, 2020

kshitij12345 added 4 commits May 17, 2020 23:01

replace std::isnan with at::_isnan

9ad7089

dispatch only to floating types CUDA

23afd15

update _overrides.py

bc905f3

changes

292325f

* add type promotion logic mimicking numpy. * add gradient function. * add tests. * update NanSumOps.

kshitij12345 force-pushed the develop/numpy/nansum branch from 1eb3021 to 292325f Compare May 18, 2020 20:56

mruberry self-requested a review May 19, 2020 08:01

mruberry added module: numpy Related to numpy support, and also numpy compatibility of our operators triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 19, 2020

kshitij12345 changed the title ~~Add torch.nansum~~ [WIP] Add torch.nansum May 19, 2020

kshitij12345 added 8 commits June 9, 2020 21:13

Merge branch 'master' into develop/numpy/nansum

c498bd0

fix doc signature

9794dae

fix merging stray lines and typo in doc

f37baeb

Merge branch 'master' of https://github.com/pytorch/pytorch into deve…

cee5237

…lop/numpy/nansum

add to interned_strings

419c644

fix mypy type error

4d6ef9e

add to docs rst

4ef5660

remove unnecessary signature

d97c371

kshitij12345 changed the title ~~[WIP] Add torch.nansum~~ Add torch.nansum Jun 14, 2020