Custom primitive + RoPE fat op #676

awni · 2024-02-12T22:22:36Z

Proposed changes

add RoPE kernel
test transforms of custom primitive
benchmarks

angeloskath · 2024-02-13T08:06:44Z

Some benchmarks of the kernel on my M2 air

Before

Timing rope_vec ... 3.99837 msec
Timing rope_mat ... 63.36270 msec

After

Timing rope_vec ... 0.61199 msec
Timing rope_mat ... 6.72898 msec

The tests fail on float16 and bfloat16 but due to numerical issues. Tomorrow I will do a quick check on the performance if we do all of the computation in float32 in the kernel since it probably doesn't matter at all performance wise.

awni · 2024-02-13T14:28:58Z

mlx/backend/metal/rope.cpp

+  if (dims_ != in.shape(-1)) {
+    throw std::runtime_error("[RoPE] Partial RoPE application not supported");
+  }
+  if (in.flags().row_contiguous && in.is_donatable()) {


We need a contig and copy check before this right?

Not sure what the copy check is. Also row_contiguous is stricter than contiguous is it not? ie all row_contiguous arrays are contiguous but not the other way around.

I meant, if it's not contiguous, we should make a contiguous copy

It does not appear to me that your kernel handles non-contiguous inputs, but maybe I missed something..

Actually I think I missed it, I was looking for elem_to_loc, but you hardcoded the strides.. so it should be ok

Do we need to check here though that the input has the same size as the output? If it's broadcasted e.g. along the last axis it would be incorrect to donate right?

Yeah, I hardcoded the strides cause the grid is launched with half the last dimension and it can't be delegated to a simple elem_to_loc. I would have to do something like multiply pos.x by 2 and then pass to elem_to_loc etc. I think this is equally readable but I am open to suggestions :-)

Regarding broadcasting, a broadcasted array wouldn't be row_contiguous so this check should be fine donation-wise, right?

Oh of course! Let me quietly exit this thread before I say anything else incorrect

awni · 2024-02-13T14:30:28Z

Wow, that's so fast! We can also increase the tolerance for the lower precision tests if that's simpler.

awni · 2024-02-13T19:16:37Z

Make this a real PR since I think we are almost done.

angeloskath

This looks great. I really like the Custom primitive.

angeloskath · 2024-02-13T20:44:23Z

python/src/extensions.cpp

+      "traditional"_a,
+      "base"_a,
+      "scale"_a,
+      "offset"_a,


Do you think we should make the above keyword only? It would be verbose but error free...

I think so, yes.

* extensions start * rope custom op * fix build * docs + rope benchmark * fix test * Add a Metal kernel for RoPE * Fix position of traditional * transform tests * Move rope computation to float and fix tests * Fix the test and a typo * change to fast * fix no metal build --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>

awni marked this pull request as draft February 12, 2024 22:24

awni requested a review from angeloskath February 12, 2024 22:29

awni commented Feb 13, 2024

View reviewed changes

awni marked this pull request as ready for review February 13, 2024 19:16

angeloskath approved these changes Feb 13, 2024

View reviewed changes

awni and others added 11 commits February 14, 2024 13:46

extensions start

c1df31d

rope custom op

d446d58

fix build

12034e9

docs + rope benchmark

ce8dba7

fix test

1f3fe12

Add a Metal kernel for RoPE

b23090c

Fix position of traditional

638ef68

transform tests

1d47cf3

Move rope computation to float and fix tests

721f36c

Fix the test and a typo

f29acba

change to fast

252576b

awni force-pushed the extensions branch from f58e17c to 252576b Compare February 14, 2024 21:47

fix no metal build

cc06e11

awni merged commit ccf1645 into main Feb 14, 2024
2 checks passed

awni deleted the extensions branch February 14, 2024 22:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom primitive + RoPE fat op #676

Custom primitive + RoPE fat op #676

awni commented Feb 12, 2024 •

edited

Loading

angeloskath commented Feb 13, 2024

awni Feb 13, 2024

angeloskath Feb 13, 2024

awni Feb 13, 2024

awni Feb 13, 2024

awni Feb 13, 2024 •

edited

Loading

awni Feb 13, 2024

angeloskath Feb 13, 2024

awni Feb 13, 2024

awni commented Feb 13, 2024

awni commented Feb 13, 2024

angeloskath left a comment

angeloskath Feb 13, 2024

awni Feb 13, 2024

Custom primitive + RoPE fat op #676

Custom primitive + RoPE fat op #676

Conversation

awni commented Feb 12, 2024 • edited Loading

Proposed changes

angeloskath commented Feb 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awni Feb 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awni commented Feb 13, 2024

awni commented Feb 13, 2024

angeloskath left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awni commented Feb 12, 2024 •

edited

Loading

awni Feb 13, 2024 •

edited

Loading