-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DGEMM MatMul Speed up #8037
DGEMM MatMul Speed up #8037
Conversation
@@ -218,6 +226,14 @@ object Region { | |||
case _: PFloat64 => v => storeDouble(dest, coerce[Double](v)) | |||
} | |||
|
|||
def storePrimitiveUnstaged(typ: PType, dest: Long): AnyVal => Unit = typ match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize you're just replicating the above, but the currying here is odd. It seems like either it should be Type => (Long, AnyVal) => Unit
or (Type, Long, AnyVal) => Unit
, just not this configuration. Why would we restore values in the same destination multiple times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's always immediately applied. Seems like a worthwhile refactor while we're in here to move to either of the other configurations. I prefer the former option (the first arg list is just the type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bet this was because I originally had an unstaged version and I wanted the type system to infer whether I wanted the unstaged or staged version. My apologies ;).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is also going to be super slow -- you have to serialize the type, and do this dispatch. I think the faster thing to do might actually be to just write an approach that copies bytes around (using Region.copyMemory) where you parameterize with the byte size (known on ptype at compile time. this will also work for pointers, too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's faster but still slower than I'd like. Didn't see another way to do it. I'll do what you suggested instead though, that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed what Tim said, and I just deleted storePrimitiveUnstaged
and loadPrimitiveUnstaged
since I am no longer using it. Whether we should refactor the other two now seems like a problem for another PR since this one doesn't interact with that code anymore.
See here for the problem I'm solving with this PR: https://dev.hail.is/t/ndarray-matmul-performance-improvements/176
cc @danking