Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block Matrix filter rows/cols edge issues #7146

Merged
merged 9 commits into from Sep 27, 2019

Conversation

@johnc1231
Copy link
Contributor

commented Sep 27, 2019

BlockMatrix.filterCols and BlockMatrix.filterRows has a bug where the smaller blocks located on the edge of the matrix are not always correctly handled when they're of a smaller size than the rest of the blocks. The source of this problem was the fact that BlockMatrix.scala line 1515 below, which was calling blockDims(split.index). This was a problem because split.index is a partition index, not a block index. The fix was to convert it to a block index and then call blockDims. The rest of this PR is some white space fixes, a spelling error, and an additional test case for BlockMatrix filtering that would fail in current master because of this bug.

Copy link
Collaborator

left a comment

good catch.

@danking danking merged commit c5928e5 into hail-is:master Sep 27, 2019
1 check passed
1 check passed
ci-test success
Details
@danking

This comment has been minimized.

Copy link
Collaborator

commented Sep 27, 2019

Hmm. @johnc1231, We should probably pick some time to go through all the BM functionality and verify that we have adequate tests for it all. It feels like we've had a never ending series of bugs discovered by users.

@johnc1231

This comment has been minimized.

Copy link
Contributor Author

commented Sep 27, 2019

Yeah, agreed. I've never taken inventory of the full test suite. This was the first bug that caused an error message though, previous ones have just been about things being too slow. So we'll need benchmarks to catch that stuff.

@johnc1231

This comment has been minimized.

Copy link
Contributor Author

commented Sep 27, 2019

Also I want to profile if it's worth it to have this separate BlockMatrixFilterColsRDD because I have doubts it makes a big difference but it definitely adds a lot of developer overhead and leads to bugs in filterRows that don't exist in filter, which was the case here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.