-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update generator used for sparse matrix vector product #450
Conversation
When I test this on my local machine all the same tests fail with my changes as without them, and I'm not quite sure how to interpret the Travis CI results, so any feedback would be appreciated. |
There is a test failure essentially because the matrix multiplication code would previously add zeros of type mpc to mpf terms, creating mpc's, but now that the terms are skipped, it preserves the mpf's. It's not really a bug; to what extent the types of entries in matrices should be preserved isn't even well-defined (I wouldn't mind having separate real and complex matrix types to be honest). I'd be happy with just updating the doctest to reflect the new result. By the way, the inner loop should probably be optimized by reading self.__data[] and other.__data[] directly instead of using self[] and other[]. (You could also store xrange(other.__rows) , self.__data and other.__data in local variables, but that probably doesn't save many cycles.) |
I change this to directly access |
Checking up on the state of this and whether the doctests will be updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been a while since I was thinking about this, but I think that the self_get
and other_get
are not needed with my original commit; we are manually checking whether the key exists and doing thing if it doesn't so they keys should always exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but I would prefer to have better commit message and some simple benchmarks (how worse this pr will be in the dense case, how better for the sparse case).
You can either do rebase of the pr or just add a comment in the pr thread.
The cost is at most 2x more expensive, since the only extra work is checking whether the key exists (which will now happen twice if it does exist). In practice, I suspect the mpf arithmetic will be the dominant cost though. The runtime for sparse matrices is almost arbitrarily better depending on the sparsity pattern and cost of dot product of mpfloats. In the extreme case that each inner product is zero just by the sparsity pattern, then the new code will not do any mpmath evaluation. For Here is the summary of a quick benchmark:
Dense:
Sparse:
If
Sparse:
|
While the default matrix class is represented as a dictionary value, when computing a matrix product, the current code runs over an entire inner product of a row of the first matrix and a column of the second matrix. If some of the entries do not exist in the dictionary they are treated as zeros, and so many unnecessary terms are computed in this inner product.
We can check that the keys
(i,k)
and(k,j)
respectively exist in the left and right matrices of the product in the generator that gets fed tofdot
. Checking if a key exists is very fast (O(1) average case) so this is a decent way to avoid iterating unnecessary elements.I'm nots sure what types of applications people usually use this library for, so perhaps there are some cases where the overhead of checking if the keys exists will slow down some people's code. That said, I feel that the overhead will be small so it would only be noticeable small dense matrix multiplications in low precision. As soon as any sparsity is added to the matrix, the method in the pull request becomes a lot faster because the number of products of mpmath numbers is reduced.
This can easily be improved further (reduction from O(nmk) to O(nnz(self)*k)) by only looping over the nonzero entries in
self
, however that would change the order of some operations, and so the outputs will be different (up to machine precision). I have this implemented on my fork, but didn't want to make a pull request for a more significant change until I am more familiar with the workflow of this project.