Optimize slicing when possible by copying bigger blocks at once#13261
Optimize slicing when possible by copying bigger blocks at once#13261yuslepukhin merged 7 commits intomainfrom
Conversation
|
This pull request introduces 1 alert when merging 5e8f3aa into 25c0a66 - view on LGTM.com new alerts:
|
|
This pull request introduces 1 alert when merging bb1ca40 into b2353fa - view on LGTM.com new alerts:
|
bb1ca40 to
73f72db
Compare
|
This pull request introduces 1 alert when merging 73f72db into 6895918 - view on LGTM.com new alerts:
|
| if (dim < steps_size && steps[dim] != 1) { | ||
| break; | ||
| } | ||
| max_copyable_elements *= extents_[dim]; |
There was a problem hiding this comment.
Is there code coverage of this from the existing unit tests? Not sure if you validated with those or a specific model.
There was a problem hiding this comment.
Yes, Slice now takes advantage of the new interface and I have validated it with the customer model. The new code is covered.
There was a problem hiding this comment.
Is it covered by a unit test I can look at? e.g. one that has a similar inputs to the Slice in the customer model.
Wondering if it would make sense to store a list of block sizes instead of having a single max_copying_elements_block_ value - but I don't have an example of the Slice params you were testing with to figure out if that would be of value.
There was a problem hiding this comment.
This is a good example of coverage.
There was a problem hiding this comment.
List of copyable block sizes is stored in extents_
There was a problem hiding this comment.
Ignore the part about storing a list - that would only be useful if an axis could be repeated in axes which isn't allowed.
There was a problem hiding this comment.
Wouldn't Slice5D_LargeStep behave the same without your change as it's slicing on axis 1 and only taking 1 element, so it's the same amount of data that flattening the last 3 dims would have picked.
If so could we add a test where the new code is a) used for multiple slices, and b) the block size would differ from what the existing code does?
There was a problem hiding this comment.
Added test SliceTest.Slice5D_CopyAxis2LargeBlock that would copy blocks by 8, not by 4, but still does it in multiple slices because of axis 1 slicing.
|
This pull request introduces 1 alert when merging 93372a3 into e398241 - view on LGTM.com new alerts:
|
Description
Currently, SliceIterator copies inner dimension size at once at best.
However, there are many slices when several inner dimensions can be copied at once.
Furthermore, even if a dimension is sliced, it may employ step 1 and, therefore, has a continuous block of inner dimensions that can be copied at once.
Motivation and Context
For example,
[N, C, H, W]with slice[:, :, i:, :]and[N, C, H-i, W]. Meaning, we slice along single axis, with step = 1. Current implementation doesC * (H-i) memcpywith W elements each. With this change we can doC memcpy with (H-i)*Welements each.The optimization produces ~11% savings on certain internal models.