-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Add CONV_TRANSPOSE_2D
for Metal
#16542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the necessary requirements for the input tensors here:
llama.cpp/ggml/src/ggml-metal/ggml-metal-device.m
Lines 649 to 651 in a190a9d
case GGML_OP_CONV_TRANSPOSE_1D: | |
case GGML_OP_CONV_TRANSPOSE_2D: | |
return true; |
For example, the implementation assumes that src0
and src1
are contiguous.
CONV_TRANSPOSE_2D
for Metal
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Added the checks for type, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be more efficient to have KH x KW
threads in each threadgroup, instead of just 1.
Would you like to try that in this PR or in a follow-up PR?
|
||
if (in_y >= args.IH) continue; | ||
|
||
for (int64_t kw = 0; kw<args.KW; kw++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (int64_t kw = 0; kw<args.KW; kw++) { | |
for (int64_t kw = 0; kw < args.KW; kw++) { |
This PR adds Metal-based implementation of CONV_TRANSPOSE_2D operation (#14909)
TODO: