Should matmul have a decomposed batch rule or an actual one? #16

zou3519 · 2021-05-06T21:27:11Z

Right now the batch rule for matmul is decomposed (https://github.com/zou3519/functorch/blob/53144b92d33d6d796359c97764ee68743f5463bf/functorch/csrc/BatchingRegistrations.cpp#L1254).

My worry is that it might be possible for us to transform some code into inefficient code. For example, if B0 and B1 are vmap dimensions and we are matrix-multiplying tensor of size [B0, 5, 5], [B1, 5, 5], we don't want to multiply tensors of size [B0, 1, 5, 5] and [1, B1, 5, 5]. If that happens, then internally, matmul will expand the tensors to [B0, B1, 5, 5] and materialize the full memory, which can be quite slow. (The ideal way to multiply these tensors is to reshape them into [B0 * 5, 5] and [5, B1 * 5], and then multiply them together).

This issue is probably just a code reading exercise to see if it's possible for the above to happen in the decomposed matmul code. I was in the middle of writing a non-decomposed matmul here: https://gist.github.com/zou3519/ddd4b2d4aacc98bf20d114f26b27b082

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should matmul have a decomposed batch rule or an actual one? #16

Should matmul have a decomposed batch rule or an actual one? #16

zou3519 commented May 6, 2021 •

edited

Loading

Should matmul have a decomposed batch rule or an actual one? #16

Should matmul have a decomposed batch rule or an actual one? #16

Comments

zou3519 commented May 6, 2021 • edited Loading

zou3519 commented May 6, 2021 •

edited

Loading