-
Notifications
You must be signed in to change notification settings - Fork 691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse matrix equality efficient #480
Sparse matrix equality efficient #480
Conversation
Note: This PR was from [https://github.com/apache/spark/pull/8960] and [https://issues.apache.org/jira/browse/SPARK-10906] |
case _ => false | ||
override def equals(p1: Any) = (this, p1) match { | ||
case (x: CSCMatrix[V], p1: CSCMatrix[_]) => | ||
x.rows == p1.rows && x.cols == p1.cols && x.activeSize == p1.activeSize && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is buggy. If there is an implicit 0 in one matrix and a corresponding explicit 0 in the other, then this will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkbradley
Please see my comment below.
If one matrix is dense, I think it's OK to take time linear in the size of the dense matrix (since you have to anyways). The key optimization is for 2 sparse matrices. |
I do not think it is possible to have an explicit zero in the CSCmatrix. Conditional in update function : Locate function which utilizes binarySearch : So if the element is not found then ind will be < 0. If v is zero, then we update nothing, if it isn't then we update the array value. I have updated the PR with an additional test that shows that explicit zeros become implicit. |
Is that still true if you use the constructor? I don't think it is. |
@jkbradley |
i think he means the new CSCMatrix constructor that takes an array. I'm ok with that possible equality breakage so long as we add a warning to -- David On Wed, Jan 6, 2016 at 2:11 PM, rahul palamuttam notifications@github.com
|
@dlwh In MLlib, we'd like equality to ensure semantic equality. Would you mind if we did the more careful check in Breeze? If you think it's too expensive, then we can implement it in MLlib instead. @rahulpalamuttam As @dlwh said, I meant that MLlib sometimes creates a Breeze CSCMatrix using a constructor which does not check for explicit zeros. Because of that, we need to assume Breeze matrices could contain explicit zeros. Note also that the update() method could create an explicit 0 if you set an existing entry to 0. |
ok sure. in that case we probably need a loop that walks over columns in Probably only copy the "smallVectors" for starters unless you're feeling -- David On Wed, Jan 6, 2016 at 2:32 PM, jkbradley notifications@github.com wrote:
|
(lemme know if you need help with it!) On Wed, Jan 6, 2016 at 2:39 PM, David Hall david.lw.hall@gmail.com wrote:
|
@dlwh I ended up using a sequence of while loops as you suggested over both iterators making sure to have additional while loops to skip over the zeros. The last pair of while loops you see is to ensure that an iterator that still has elements only contain zeros. Not sure if there are any hidden implications and or edge cases here that I didn't consider. |
while(ykeyval._2 == 0 && yIter.hasNext) ykeyval = yIter.next() | ||
if(xkeyval != ykeyval) return false | ||
} | ||
if(xIter.hasNext == true && yIter.hasNext == false){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd rather (xIter.hasNext && !yIter.hasNext) (and on the next one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed with latest commit
nitpick, then lgtm! |
x.rows == y.rows && x.cols == y.cols && | ||
keysIterator.forall(k => x(k) == y(k)) | ||
case _ => | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh this should be super.==(p1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh - this is in Matrix.scala not CSCMatrix.scala so I am not sure you want to use the equals method of the super class. Do you want to override the equals method in CSCMatrix? It doesn't have an explicit implementation in CSCMatrix.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, blah, sorry. maybe move the CSC/CSC check to CSC itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
well, one other check |
How about this? I also went back to matching the input matrix, instead of matching the pair of matrices. |
lgtm, thanks! |
…icient Sparse matrix equality efficient
This is a pull request that originates from a requirement in Spark for a more efficient SparseMatrix equality check. When comparing two CSCMatrices, use activeKeysIterator instead of keysIterator. This will return and check the non-zero-value keys in the SparseMatrix. An extra conditional is introduced to make sure the active size of both SparseMatrices are the same.
I tried to to implement the case where equality needed to be checked between a SparseMatrix and DenseMatrix. However, I couldn't find a way of doing this without traversing the DenseMatrix at least once.