New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add torch.nansum
#38628
Add torch.nansum
#38628
Conversation
💊 CI failures summary and remediationsAs of commit c325b99 (more details on the Dr. CI page):
🚧 2 ongoing upstream failures:These were probably caused by upstream breakages that are not fixed yet:
ci.pytorch.org: 2 failed
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 157 times. |
1eb3021
to
292325f
Compare
Hey @kshitij12345! Let me know when you're ready for a review. |
@mruberry Sure. Sorry forgot to tag this as [WIP]. Will ping you once ready or if I have any doubts. |
…lop/numpy/nansum
It is ready for review now. Please review :) |
Have addressed the comments. Please review :) Thanks! |
@mruberry Gentle ping:) |
1 similar comment
@mruberry Gentle ping:) |
Was just about to update this! It's going to take me a few days to get to because I have to go through the sum/prod changes extremely carefully. |
Sure. Thanks! Actually I am a bit sceptical related to Also would it be okay if I ping you on slack related to a doubt on another PR? |
Yes of course. |
This is still on my radar and I should get to it very soon. |
typename out_t = scalar_t> | ||
typename OpFunctor, | ||
typename GeneralDispatcher> | ||
static void reduce_dispatch(TensorIterator& iter, GeneralDispatcher op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here looks elegant, but is there a performance impact on the existing functions, like sum, by building the callable structs each time this function is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up question from look the the templates: the functions of interest all have the same signature, right?
Could this be written, for example, as one function for each op that called a helper that handled the common part (lines 54-67 below) and then implemented the function-specific dispatch? Further, can the helper avoid be a function template by using the common signature of these functions to specify its function pointer argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should not be an issue as the structs method will get inlined and construction of that struct is trivial.
Reference: https://stackoverflow.com/a/18753022/5602957
Also tried simulating a similar code on CompilerExplorer
https://godbolt.org/z/7WTWhe
Compiler is able to actually deduce the finally value, so it don't think this structure should hinder the compiler optimizations.
Follow-up question from look the the templates: the functions of interest all have the same signature, right?
Could this be written, for example, as one function for each op that called a helper that handled the common part (lines 54-67 below) and then implemented the function-specific dispatch? Further, can the helper avoid be a function template by using the common signature of these functions to specify its function pointer argument?
Slightly confused. Could you please give sample code.
Thanks for looking into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should not be an issue as the structs method will get inlined and construction of that struct is trivial.
Reference: https://stackoverflow.com/a/18753022/5602957Also tried simulating a similar code on CompilerExplorer
https://godbolt.org/z/7WTWheCompiler is able to actually deduce the finally value, so it don't think this structure should hinder the compiler optimizations.
Follow-up question from look the the templates: the functions of interest all have the same signature, right?
Could this be written, for example, as one function for each op that called a helper that handled the common part (lines 54-67 below) and then implemented the function-specific dispatch? Further, can the helper avoid be a function template by using the common signature of these functions to specify its function pointer argument?Slightly confused. Could you please give sample code.
Thanks for looking into it.
Thanks for investigating. I think that addresses my concern so this should be fine.
Hey @kshitij12345, thank you for being so patient. I took a close look and overall things look very good. I have a question about the organization of the common function to call prod, sum, and nansum. The code is very elegant, but I'm a little concerned about its effect on performance and wonder if its template logic can be further simplified. I look forward to hear your thoughts! I should be much more responsive now, so we'll get this in quickly! |
* add with_extremal for general case. * minor refactor of test code.
Test looks great, thanks @kshitij12345! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, @kshitij12345! This is the first "nan*" function in PyTorch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Update: this triggered some internal perf warnings. Rerunning some tests now to verify. |
Tests came back negative again. Going to try one more time. We may need to refactor this. |
Thanks for the heads-up. Let me know if there are any changes needed :). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Follow-up perf runs suggest the initial failure was flakiness in the benchmark. Initiating land process. |
Reference: #38349