Calculate nonzero count inside nonzero op #260

DenisVieriu97 · 2023-01-26T22:58:08Z

No description provided.

razarmehr · 2023-01-27T17:12:10Z

aten/src/ATen/native/mps/operations/Indexing.mm

-  if (!contiguous_output) {
-    out = at::native::empty_mps(
-           out_.sizes(),
+  // int64_t total_nonzero = at::count_nonzero(self).item<int64_t>();


Do we want to upstream this block of commented code? If not, I suggest you remove it and keep it locally.

Thanks - yes, this needs to be removed (part of old code). I'll update

razarmehr · 2023-01-27T17:19:21Z

aten/src/ATen/native/mps/operations/Indexing.mm

  }

+  int32_t total_nonzero = count_nonzero.item<int32_t>();


This will incur two hard-syncs, one from item, and one for the following out_.copy_().
Later (not now), I suggest we create a dedicated MPSGraph for this part. We pre-allocate out_ with the same size of self (so we don't overflow the buffer when resizing), and do the zero-counting and resizing of output in a single MPSGraph op. We can discuss that later.

We pre-allocate out_ with the same size of self (so we don't overflow the buffer when resizing)

Doing that might allocate a new buffer and change the pointer of the out buffer (causing a failure in the test).
E.g in case the user has a pre-allocated buffer from a previous nonzero op (and they know the exact number of nonzeros) doing again nonzero(input, out=preallocated_out) and resizing the output at the beginning of the function to match the input, could allocate new memory if input's number of elements is larger than output's number of elements (resize_].
Previously the op it was calling into count_nonzero at the beggining of the function to get the number of elements, this change makes it to get the number of elements from the same graph as nonzero (seemed a little bit faster when testing compared to previous method).

and do the zero-counting and resizing of output in a single MPSGraph op. We can discuss that later.

We can do it in the graph (both nonzero and count_nonzero happen in the same graph now), but the returned Tensor's shape we've preallocated in the beginning (not the MPSGraphTensor*) would still be wrong and we'd need to sync at the end (the .item() part) to get the number of nonzeros and resize it correctly. And if we've preallocated the output at the beginning it would hit the issue from above (it would work for 99% of the tests to preallocate output in the beginning, but would fail were they're passing a preallocated output to us)

* Calculate output shape inside nonzero op * nonzero optimizations * Fix lintrunner

DenisVieriu97 added 2 commits January 26, 2023 00:19

Calculate output shape inside nonzero op

1478377

nonzero optimizations

88d9d6e

DenisVieriu97 requested review from kulinseth and razarmehr January 26, 2023 22:58

razarmehr reviewed Jan 27, 2023

View reviewed changes

Fix lintrunner

dc3f0d6

kulinseth approved these changes Jan 27, 2023

View reviewed changes

DenisVieriu97 merged commit 0a17c0b into master Jan 27, 2023

kulinseth pushed a commit that referenced this pull request Feb 6, 2023

Calculate nonzero count inside nonzero op (#260)

dce8fe9

* Calculate output shape inside nonzero op * nonzero optimizations * Fix lintrunner

DenisVieriu97 added Upstreamed Change has been upstreamed to PyTorch master DV: In Progress and removed DV: In Progress labels Feb 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculate nonzero count inside nonzero op #260

Calculate nonzero count inside nonzero op #260

DenisVieriu97 commented Jan 26, 2023

razarmehr Jan 27, 2023

DenisVieriu97 Jan 27, 2023

razarmehr Jan 27, 2023 •

edited

Loading

DenisVieriu97 Jan 27, 2023

Calculate nonzero count inside nonzero op #260

Calculate nonzero count inside nonzero op #260

Conversation

DenisVieriu97 commented Jan 26, 2023

razarmehr Jan 27, 2023

Choose a reason for hiding this comment

DenisVieriu97 Jan 27, 2023

Choose a reason for hiding this comment

razarmehr Jan 27, 2023 • edited Loading

Choose a reason for hiding this comment

DenisVieriu97 Jan 27, 2023

Choose a reason for hiding this comment

razarmehr Jan 27, 2023 •

edited

Loading