-
Notifications
You must be signed in to change notification settings - Fork 22.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement sum over multiple dimensions (fixes #2006) #6152
Conversation
Can you update the doc and add test cases for this please? |
@pytorchbot test this please |
@pytorchbot retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Not a complete review. Just a few comments)
aten/src/ATen/WrapDimUtils.h
Outdated
@@ -27,6 +27,17 @@ static inline int64_t maybe_wrap_dim(int64_t dim, int64_t dim_post_expr, bool wr | |||
return dim; | |||
} | |||
|
|||
static inline std::vector<bool> dim_list_to_vector(IntList dims, int64_t ndims, bool wrap_scalar=true) { | |||
std::vector<bool> seen(ndims, false); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
aten/src/ATen/native/ReduceOps.cpp
Outdated
} | ||
} | ||
size_t ndims = self.dim(); | ||
std::vector<bool> seen(ndims, false); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
aten/src/ATen/native/ReduceOps.cpp
Outdated
|
||
// MULTI DIM REDUCE ########################################################### | ||
|
||
Tensor sum(const Tensor &self, IntList dims_, bool keepdim) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
@pytorchbot retest this please (can I do this?) |
@pytorchbot add to whitelist |
Hi, I could use a hint how to resolve the ambiguity that the windows compile stumbles over (Error C2666 regarding the use of bitfield in WrapDimUtils.h). Thank you Thomas |
A solution may be to define a flag in |
aten/src/ATen/native/ReduceOps.cpp
Outdated
} | ||
size_t ndims = self.dim(); | ||
AT_ASSERT(ndims <= 64, "tensor dimension must be <= 64 for multiple dims") | ||
std::bitset<64> seen; |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
aten/src/ATen/WrapDimUtils.h
Outdated
// non-explicit half conversion in THCUNN/THCHalfAutoNumerics.cuh | ||
// so this is host-code only | ||
|
||
static inline std::bitset<64> dim_list_to_vector(IntList dims, int64_t ndims, bool wrap_scalar=true) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
So at last the windows build works after moving the bitmap using functions into a different header (that is not included by |
aten/src/ATen/WrapDimUtilsMulti.h
Outdated
constexpr size_t dim_bitset_size = 64; | ||
|
||
static inline std::bitset<dim_bitset_size> dim_list_to_vector(IntList dims, int64_t ndims, bool wrap_scalar=true) { | ||
AT_ASSERT(ndims <= (int64_t) dim_bitset_size, "tensor dimension must be <= %zu for multiple dims", dim_bitset_size); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
aten/src/ATen/WrapDimUtilsMulti.h
Outdated
for (size_t i = 0; i < dims.size(); i++) { | ||
size_t dim = maybe_wrap_dim(dims[i], ndims); | ||
if (seen[dim]) | ||
AT_ERROR("repeated dim"); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
aten/src/ATen/WrapDimUtilsMulti.h
Outdated
|
||
constexpr size_t dim_bitset_size = 64; | ||
|
||
static inline std::bitset<dim_bitset_size> dim_list_to_vector(IntList dims, int64_t ndims, bool wrap_scalar=true) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
aten/src/ATen/native/ReduceOps.cpp
Outdated
AT_ERROR("repeated dim"); | ||
seen[dim] = true; | ||
result = reduce_1(result, dim, true); | ||
} |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
aten/src/ATen/native/ReduceOps.cpp
Outdated
auto dim = maybe_wrap_dim(dims_[i], ndims); | ||
if (seen[dim]) | ||
AT_ERROR("repeated dim in sum"); | ||
seen[dim] = true; |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
@@ -611,13 +611,10 @@ | |||
CPU: _sum_cpu | |||
CUDA: _sum_cuda | |||
|
|||
- func: sum(Tensor self, int64_t dim, bool keepdim=False) -> Tensor | |||
- func: sum(Tensor self, IntList[1] dim, bool keepdim=False) -> Tensor |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
So far I have assumed that the user prescribes the order of summation. The obvious alternative is to use ascending or descending order in the tensors by iterating over the bitset Even more radically, one could consider to permute+reshape axes together. Then one would only sum once and not have intermediate results... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I think there are still minor things that could be improved. Should be good to go after this
aten/src/ATen/WrapDimUtilsMulti.h
Outdated
std::bitset<dim_bitset_size> seen; | ||
for (size_t i = 0; i < dims.size(); i++) { | ||
size_t dim = maybe_wrap_dim(dims[i], ndims); | ||
AT_ASSERT(!seen[dim], "dim %zu appears multiple times in the list of reduced dims", dim); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
aten/src/ATen/native/ReduceOps.cpp
Outdated
return self; | ||
} | ||
size_t ndims = self.dim(); | ||
std::bitset<dim_bitset_size> seen = dim_list_to_bitset(dims_, ndims); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
aten/src/ATen/native/ReduceOps.cpp
Outdated
Tensor result = self; | ||
for (size_t i = 0; i < dims_.size(); i++) { | ||
size_t dim = maybe_wrap_dim(dims_[i], ndims); | ||
result = reduce_1(result, dim, true); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
aten/src/ATen/native/ReduceOps.cpp
Outdated
|
||
template <Tensor (reduce_1)(const Tensor &, int64_t, bool), | ||
Tensor& (reduce_1_out)(Tensor& result, const Tensor &, int64_t, bool)> | ||
inline Tensor& reduce_multi_out(Tensor &result, const Tensor &self, IntList dims_, bool keepdim) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
test/test_torch.py
Outdated
res1 = torch.sum(x, (2, 1)) | ||
res2 = torch.Tensor() | ||
torch.sum(x, (2, 1), out=res2) | ||
self.assertEqual(res1, res2) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
The disadvantage here is that you need to meddle with the backwards while this would be automatic if you could separate them.
Best regards
Thomas
Am 4. April 2018 17:31:10 MESZ schrieb Adam Paszke <notifications@github.com>:
…apaszke commented on this pull request.
> @@ -611,13 +611,10 @@
CPU: _sum_cpu
CUDA: _sum_cuda
-- func: sum(Tensor self, int64_t dim, bool keepdim=False) -> Tensor
+- func: sum(Tensor self, IntList[1] dim, bool keepdim=False) -> Tensor
But this works too (and I think I prefer it)!
--
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
#6152 (comment)
|
So I looked into the handling of keeping or not dimensions per @apaszke 's comment. |
I'm pretty sure we'll never really want to use the order in which the dimensions were given, take this for an example:
This makes sense, because reducing in order starting from the innermost dim is good for data locality, as earlier dimension will have lower strides once you get to them. (NB the difference only grows when I use fewer cores) An additional improvement would be to collapse the pseudo-contiguous pairs of dimensions ( We can implement those later, but I feel like sorting dimensions would let us simplify the code a bit. |
I have one comment about reduction over multiple axis: I think we should follow numpy behavior. For many functions where the order of operations doesn't matter (like It seems that numpy performs the operations differently than what is implemented here: a = np.random.rand(3, 3, 3)
m1 = np.median(a, axis=[0, 2])
# perform a single median, after putting the
# reduction dimensions in together
m2 = np.median(a.transpose((1, 0, 2)).reshape(3, -1), axis=1)
# independently perform the reductions
# on each different axis
m3 = np.median(np.median(a, axis=0), axis=1)
m4 = np.median(np.median(a, axis=2), axis=0)
print (np.all(m1 == m2)) # True
print(np.all(m1 == m3)) # False
print(np.all(m1 == m4)) # False It might be good to benchmark it, but I have the feeling that it might also be faster to perform |
@fmassa good point, the current implementation requires that the function is effectively a commutative and associative, but it is ok for
|
Indeed, in some cases the cost of permute + contiguous outweights the single execution of btw, I believe in your first |
Right, my code was incorrect in the first case, but changing it doesn't affect the final run time for me. |
Thank you both for your input!
So to have a plan:
1) Use fixed order of reduction.
2) Check whether permute and reshape works: how? a) Permute and use compute_stride or b) avoid permuting and mimic compute_stride
3) Either a) rename reduce function to indicate associatative and commutative requirement or b) include force_reshape template option.
While we are at it: 4) a) keep backwards on a case by case basis (sum_backward has different inputs than prod_backward), b) split IntList vs. int64_t for jit dispatch to automatically have backwards for multi-dim c) have a grand reduce_multi_backward template.
I'm leaning towards 1), 2a), 3b), 4a) for this PR and revisit 4 when implementing a few more ops, but I'll gladly follow your advice.
Best regards
Thomas
|
CC @zdevito @apaszke @jamesr66a I'm not sure you should try a different approach; it might just be that we need to support this in the JIT. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidental update of the gloo submodule
So while rebasing this, would you have a pointer to what |
def f(arg1, arg2, *, kwonlyarg1):
pass |
So after a rebase against master: So there might be something to @ezyang 's comment that one might consider a change in the jit here. Best regards Thomas |
Repinging @zdevito, @apaszke, @jamesr66a on the JIT interaction |
So now that the jit is handles |
So comparing the PR with the master commit it is rebased on (
(It gets warnings about caching, but I don't think that that influences whether there is a measurable difference between before and after the PR.) |
The failed build seems to say something about virtual memory. Is that me or the CI? My own building with gcc 7.3 and py 3.6 seems to work... |
@t-vi @ezyang The specific failed test case is here: https://github.com/onnxbot/onnx-fb-universe/blob/master/test/test_operators.py#L310 |
@bddppq I can offer https://github.com/t-vi/pytorch/tree/fix_onnx_sum |
Hello,
this implements summing over multiple dimensions as a ATen native function.
I'll add a test and adapt the docs, but I'd appreciate feedback on the approach.
This patch addresses #2006 and would supersede #2116 .
Of course, there is a ton of other ops (prod, mean, squeeze, unsqueeze) that could be handled similarly.
Best regards
Thomas