forked from pytorch/pytorch
-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate nonzero count inside nonzero op #260
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will incur two hard-syncs, one from item, and one for the following
out_.copy_()
.Later (not now), I suggest we create a dedicated MPSGraph for this part. We pre-allocate
out_
with the same size of self (so we don't overflow the buffer when resizing), and do the zero-counting and resizing of output in a single MPSGraph op. We can discuss that later.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing that might allocate a new buffer and change the pointer of the out buffer (causing a failure in the test).
E.g in case the user has a pre-allocated buffer from a previous nonzero op (and they know the exact number of nonzeros) doing again
nonzero(input, out=preallocated_out)
and resizing the output at the beginning of the function to match theinput
, could allocate new memory if input's number of elements is larger than output's number of elements (resize_].Previously the op it was calling into
count_nonzero
at the beggining of the function to get the number of elements, this change makes it to get the number of elements from the same graph asnonzero
(seemed a little bit faster when testing compared to previous method).We can do it in the graph (both nonzero and count_nonzero happen in the same graph now), but the returned Tensor's shape we've preallocated in the beginning (not the MPSGraphTensor*) would still be wrong and we'd need to sync at the end (the
.item()
part) to get the number of nonzeros and resize it correctly. And if we've preallocated the output at the beginning it would hit the issue from above (it would work for 99% of the tests to preallocate output in the beginning, but would fail were they're passing a preallocated output to us)