New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[query] Add tree_matmul, matrix multiply in case of large inner dimension #9063
[query] Add tree_matmul, matrix multiply in case of large inner dimension #9063
Conversation
…rite many RDDs in parallel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is absolutely awesome John! Great work
self._assert_eq(m.T.tree_matmul(m, 2, new_temp_file()), nm.T @ nm) | ||
self._assert_eq(m.T.tree_matmul(nm, 2, new_temp_file()), nm.T @ nm) | ||
self._assert_eq(row.T.tree_matmul(row, 2, new_temp_file()), nrow.T @ nrow) | ||
self._assert_eq(row.T.tree_matmul(nrow, 2, new_temp_file()), nrow.T @ nrow) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call me paranoid, but it would be nice to have at least one example of x @ y where x and y are both block matrices and neither is the transpose of the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, I just copied the current matmul tests plus that looping of different block sizes, but you're right, these are only transposes. I'll add another test.
header: Option[String], | ||
addIndex: Boolean, | ||
compression: Option[String], | ||
customFilenames: Option[Array[String]]): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an odd formatting change? Maybe add a newline before the )
to get the desired formatting from IntelliJ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this was an accident.
@@ -1427,6 +1459,47 @@ def __matmul__(self, b): | |||
|
|||
return BlockMatrix(BlockMatrixDot(self._bmir, b._bmir)) | |||
|
|||
@typecheck_method(b=oneof(np.ndarray, block_matrix_type), split_on_inner=int, path_prefix=str) | |||
def tree_matmul(self, b, split_on_inner, path_prefix): | |||
"""Matrix multiplication in situations with large inner dimension. This function splits a single matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally prefer a one sentence description on its own line, followed by a full description. Some python tools use this in tooltips
@@ -1427,6 +1459,47 @@ def __matmul__(self, b): | |||
|
|||
return BlockMatrix(BlockMatrixDot(self._bmir, b._bmir)) | |||
|
|||
@typecheck_method(b=oneof(np.ndarray, block_matrix_type), split_on_inner=int, path_prefix=str) | |||
def tree_matmul(self, b, split_on_inner, path_prefix): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a default argument for path_prefix of None
and creating a temp file in that case will improve usability.
@@ -1427,6 +1459,47 @@ def __matmul__(self, b): | |||
|
|||
return BlockMatrix(BlockMatrixDot(self._bmir, b._bmir)) | |||
|
|||
@typecheck_method(b=oneof(np.ndarray, block_matrix_type), split_on_inner=int, path_prefix=str) | |||
def tree_matmul(self, b, split_on_inner, path_prefix): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make either the last or the last two parameters keyword-only arguments, so: (self, b, *, split_on_inner, path_prefix)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be willing to make path_prefix
a keyword only argument, but can you explain why you'd prefer that? I think split_on_inner
I'll leave the way it is, since you always have to specify it. path_prefix
with a default value of None
as your above comment suggests makes the keyword only argument sound better.
I addressed most of your comments, I'm not sure why you want the keyword only arguments though. |
It's kind of a gut reaction for me. There's some conversation at PEP 1302's rationale. For me, everything is about backwards compatibility and the two cases are:
In this concrete example, I worry about two possible evolutions of this function.
def tree_matmul(self, *others, split_on_inner, path_prefix)
def tree_matmul(self, b, split_on_inner=None, path_prefix=None) # now path prefix *must* be optional as well. |
I might also call |
@danking Should I review this too, or do you feel like you covered it? |
@patrick-schultz I only looked at the Python interface, so I would much appreciate your thoughts on the Scala stuff. |
Tagging WIP because I have two tiny changes Dan asked for to make (argument name and keyword only) |
…sion (hail-is#9063) * Set up some range computations * Did most of the python work, stuck now beacuse I need to be able to write many RDDs in parallel * Kept Dan's coschedule actions thing somewhere I won't lose it * Fixed bug in copy of BlockMatrixMultiWrite * Python side BlockMatrixMultiWriter exists * Python to scala connection for block matrix native writer is working * Metadata and success files being written, just need to write partition files * Organized correctly now * Write multiple works * Split multiplying works * Fixed last_rows and last_cols computation * Added a print * Now it's tree_matmul * Began documenting the new functions * Delete coscheduleActions comment * Delete old comments * Python half of supporting stage_locally * Deleted unusued partitionURI * WIP * Updated to use using fs.create * Some typechecks fixed * Added tests for tree_matmul * Refactored, passing tests * Refactored to pass along ExecuteContext to get localTmpDir * Deleted print * pylint fixes * Restored tempfile import * Fixed indentation * Removed more accidental formatting changes * Added default path * Update the tests * Rename split_on_inner to splits * Keyword only arguments * Fix typecheck on splits
…sion (hail-is#9063) * Set up some range computations * Did most of the python work, stuck now beacuse I need to be able to write many RDDs in parallel * Kept Dan's coschedule actions thing somewhere I won't lose it * Fixed bug in copy of BlockMatrixMultiWrite * Python side BlockMatrixMultiWriter exists * Python to scala connection for block matrix native writer is working * Metadata and success files being written, just need to write partition files * Organized correctly now * Write multiple works * Split multiplying works * Fixed last_rows and last_cols computation * Added a print * Now it's tree_matmul * Began documenting the new functions * Delete coscheduleActions comment * Delete old comments * Python half of supporting stage_locally * Deleted unusued partitionURI * WIP * Updated to use using fs.create * Some typechecks fixed * Added tests for tree_matmul * Refactored, passing tests * Refactored to pass along ExecuteContext to get localTmpDir * Deleted print * pylint fixes * Restored tempfile import * Fixed indentation * Removed more accidental formatting changes * Added default path * Update the tests * Rename split_on_inner to splits * Keyword only arguments * Fix typecheck on splits
This PR introduces
tree_matmul
, a method onBlockMatrix
that allows for greater parallelism when multiplying two large matrices that result in a small matrix (i.e. the inner dimension is much larger than the outer dimensions).In order to do this, this PR takes a
split_on_inner
parameter. This parameter defines how many subdivisions to break the two matrices into. Corresponding subdivisions are multiplied, written to disk, then read back in and summed.For example, if you are multiplying a 4k by 500k matrix by its transpose, your answer will be a 4k by 4k block. With normal matrix multiply, you'd get no parallelism at all, since the result is only a single partition. If you instead use
tree_matmul
withsplit_on_inner = 5
, hail will do the following:This PR also introduces a
write_block_matrices
function that writes out a list ofBlockMatrix
in parallel. This was necessary to write out all of the intermediate matrices intree_matmul
.@konradjk
@danking