Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should matmul have a decomposed batch rule or an actual one? #16

Open
zou3519 opened this issue May 6, 2021 · 0 comments
Open

Should matmul have a decomposed batch rule or an actual one? #16

zou3519 opened this issue May 6, 2021 · 0 comments

Comments

@zou3519
Copy link
Contributor

zou3519 commented May 6, 2021

Right now the batch rule for matmul is decomposed (https://github.com/zou3519/functorch/blob/53144b92d33d6d796359c97764ee68743f5463bf/functorch/csrc/BatchingRegistrations.cpp#L1254).

My worry is that it might be possible for us to transform some code into inefficient code. For example, if B0 and B1 are vmap dimensions and we are matrix-multiplying tensor of size [B0, 5, 5], [B1, 5, 5], we don't want to multiply tensors of size [B0, 1, 5, 5] and [1, B1, 5, 5]. If that happens, then internally, matmul will expand the tensors to [B0, B1, 5, 5] and materialize the full memory, which can be quite slow. (The ideal way to multiply these tensors is to reshape them into [B0 * 5, 5] and [5, B1 * 5], and then multiply them together).

This issue is probably just a code reading exercise to see if it's possible for the above to happen in the decomposed matmul code. I was in the middle of writing a non-decomposed matmul here: https://gist.github.com/zou3519/ddd4b2d4aacc98bf20d114f26b27b082

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant