Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DGEMM MatMul Speed up #8037

Merged
merged 6 commits into from
Feb 5, 2020
Merged

Conversation

johnc1231
Copy link
Contributor

See here for the problem I'm solving with this PR: https://dev.hail.is/t/ndarray-matmul-performance-improvements/176

cc @danking

danking
danking previously requested changes Feb 4, 2020
hail/python/test/hail/expr/test_ndarrays.py Show resolved Hide resolved
@@ -218,6 +226,14 @@ object Region {
case _: PFloat64 => v => storeDouble(dest, coerce[Double](v))
}

def storePrimitiveUnstaged(typ: PType, dest: Long): AnyVal => Unit = typ match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize you're just replicating the above, but the currying here is odd. It seems like either it should be Type => (Long, AnyVal) => Unit or (Type, Long, AnyVal) => Unit, just not this configuration. Why would we restore values in the same destination multiple times?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always immediately applied. Seems like a worthwhile refactor while we're in here to move to either of the other configurations. I prefer the former option (the first arg list is just the type).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet this was because I originally had an unstaged version and I wanted the type system to infer whether I wanted the unstaged or staged version. My apologies ;).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also going to be super slow -- you have to serialize the type, and do this dispatch. I think the faster thing to do might actually be to just write an approach that copies bytes around (using Region.copyMemory) where you parameterize with the byte size (known on ptype at compile time. this will also work for pointers, too)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's faster but still slower than I'd like. Didn't see another way to do it. I'll do what you suggested instead though, that makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed what Tim said, and I just deleted storePrimitiveUnstaged and loadPrimitiveUnstaged since I am no longer using it. Whether we should refactor the other two now seems like a problem for another PR since this one doesn't interact with that code anymore.

@danking danking merged commit 1438e6c into hail-is:master Feb 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants