Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update generator used for sparse matrix vector product #450

Merged
merged 8 commits into from
Apr 20, 2023

Conversation

tchen01
Copy link
Contributor

@tchen01 tchen01 commented Mar 12, 2019

While the default matrix class is represented as a dictionary value, when computing a matrix product, the current code runs over an entire inner product of a row of the first matrix and a column of the second matrix. If some of the entries do not exist in the dictionary they are treated as zeros, and so many unnecessary terms are computed in this inner product.

We can check that the keys (i,k) and (k,j) respectively exist in the left and right matrices of the product in the generator that gets fed to fdot. Checking if a key exists is very fast (O(1) average case) so this is a decent way to avoid iterating unnecessary elements.

I'm nots sure what types of applications people usually use this library for, so perhaps there are some cases where the overhead of checking if the keys exists will slow down some people's code. That said, I feel that the overhead will be small so it would only be noticeable small dense matrix multiplications in low precision. As soon as any sparsity is added to the matrix, the method in the pull request becomes a lot faster because the number of products of mpmath numbers is reduced.

This can easily be improved further (reduction from O(nmk) to O(nnz(self)*k)) by only looping over the nonzero entries in self, however that would change the order of some operations, and so the outputs will be different (up to machine precision). I have this implemented on my fork, but didn't want to make a pull request for a more significant change until I am more familiar with the workflow of this project.

@tchen01
Copy link
Contributor Author

tchen01 commented Mar 13, 2019

When I test this on my local machine all the same tests fail with my changes as without them, and I'm not quite sure how to interpret the Travis CI results, so any feedback would be appreciated.

@fredrik-johansson
Copy link
Collaborator

There is a test failure essentially because the matrix multiplication code would previously add zeros of type mpc to mpf terms, creating mpc's, but now that the terms are skipped, it preserves the mpf's.

It's not really a bug; to what extent the types of entries in matrices should be preserved isn't even well-defined (I wouldn't mind having separate real and complex matrix types to be honest). I'd be happy with just updating the doctest to reflect the new result.

By the way, the inner loop should probably be optimized by reading self.__data[] and other.__data[] directly instead of using self[] and other[].

(You could also store xrange(other.__rows) , self.__data and other.__data in local variables, but that probably doesn't save many cycles.)

@tchen01
Copy link
Contributor Author

tchen01 commented Mar 21, 2019

I change this to directly access self.__data and other.__data.

@tchen01
Copy link
Contributor Author

tchen01 commented Jan 12, 2020

Checking up on the state of this and whether the doctests will be updated.

Copy link
Contributor Author

@tchen01 tchen01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while since I was thinking about this, but I think that the self_get and other_get are not needed with my original commit; we are manually checking whether the key exists and doing thing if it doesn't so they keys should always exist

Copy link
Collaborator

@skirpichev skirpichev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I would prefer to have better commit message and some simple benchmarks (how worse this pr will be in the dense case, how better for the sparse case).

You can either do rebase of the pr or just add a comment in the pr thread.

@tchen01
Copy link
Contributor Author

tchen01 commented Apr 17, 2023

The cost is at most 2x more expensive, since the only extra work is checking whether the key exists (which will now happen twice if it does exist). In practice, I suspect the mpf arithmetic will be the dominant cost though.

The runtime for sparse matrices is almost arbitrarily better depending on the sparsity pattern and cost of dot product of mpfloats. In the extreme case that each inner product is zero just by the sparsity pattern, then the new code will not do any mpmath evaluation. For $n\times n$ sparse matrices near diagonal, the cost of the new algorithm can be expected to be roughly $n$ vs the old algorithm which is $n^3$ (assuming the cost of mpf arithmetic is dominating lookup cost).

Here is the summary of a quick benchmark:

import mpmath as mp
mp.dps = 100

n = 200

A = mp.randmatrix(n)
B = mp.randmatrix(n)

DA = mp.diag(A[:,0])
DB = mp.diag(B[:,0])

Dense:

  • C = A.__mul__(B): 5.62 s ± 63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
  • C = A.newmul(B): 6.73 s ± 34.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Sparse:

  • DC = DA.__mul__(DB): 3.83 s ± 24.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    -DC = DA.newmul(DB): 411 ms ± 3.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

If mp.dps=20 and n=500:
Dense:

  • C = A.__mul__(B): : 1min 52s
  • C = A.newmul(B): 2min 12s

Sparse:

  • DC = DA.__mul__(DB): 1min
    -DC = DA.newmul(DB): 6.33s

@skirpichev skirpichev merged commit ae61088 into mpmath:master Apr 20, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants