Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster expm #354

Closed
wants to merge 5 commits into from
Closed

Faster expm #354

wants to merge 5 commits into from

Conversation

alexleach
Copy link

Hi,

As per the discussions on the Scipy-Dev and Scipy-User mailing lists, I've implemented a couple of the EXPOKIT Fortran routines, as an alternative for scipy.linalg.expm.

Merging this into SciPy is still work in progress, but I hope someone can pick up where I left off, and fully integrate this into SciPy.

Work I think is left to be done:-

  1. time_expm.py - Get the sparse matrix routines to agree with scipy.linalg.expm results. Currently, only getting a vector in return, when should be getting a sparse matrix..
  2. expowrap.py - When time_expm.py works for sparse matrices, copy expmwrap function to expowrap.expm(), and delete time_expm.py
  3. Put the EXPOKIT copyright somewhere? (See author's authorisation on SciPy-user discussion.)
  4. Get the appropriate unit tests in scipy.linalg.tests.test_matfuncs to run on EXPOKIT wrapper functions.
  5. Update scons and bento build systems.
  6. Document the shared library API. I tried to do this in the signature files, but failed to add multi-line documentation to the pymethoddef table. Each function's .doc attribute contains the required argument data-types, though.

And probably some other things...

There are Chebyshev routines in Expokit, which might provide speed improvements cf. the SciPy Chebyshev functions, but I haven't needed to use them, so haven't used or done anything with them. Some of these are used internally by the sparse matrix functions.

The signature file contains auto-generated signatures for all the functions in Expokit, including the ones I didn't use or modify wrappers for, too. It's probably safe to delete the signatures for these.

The only routines I think need to be accessible from Python, for expm to work, are:-

dgpadm - Padé approximation of dense, double precision matrix.
zgpadm - Padé approximation of dense, complex matrix
dgexpv - Krylov projection of general sparse, double precision matrix
zgexpv - Krylov projection of general sparse, complex matrix

However, there are routines optimised for other specific types of matrices, including symmetric and Hermitian dense and sparse matrices. It might be desirable to keep the wrappers for these routines, even though I don't think they are currently available in SciPy's expm[23] functions...

I hope this helps to get things moving along.

Kind regards,
Alex

@pv
Copy link
Member

pv commented Nov 22, 2012

Thanks. We might also want to wrap the expokit Krylov routines for supporting sparse and matrix-free operators.

@alexleach
Copy link
Author

Yes, scipy.linalg.expm also supports sparse matrices, so not supporting them would be a regression. I tried to wrap the matrix-free Krylov routines (dgexpv and zgexpv), in order to support sparse matrices, but as mentioned above, I couldn't get consistent results with the native SciPy functions.

My testing procedure was as follows:-

  1. cd into scipy/linalg
  2. Compile expokit.so with: `f2py [--fcompiler=intelem] -m expokit -c expokit.pyf.src src/expokit.f --link-lapack_opt
  3. Run time_expm.py, which currently only tests the sparse routines. (other sanity tests commented out; exit() called before the time comparisons are made)
  4. Make changes to expokit.pyf.src and / or time_expm.py
  5. If edited expokit.pyf.src, repeat steps 2-4. If only changed time_expm.py, repeat steps 3-4

@alexleach
Copy link
Author

It's also worth mentioning that there is an API difference between expowrap.expm(A, t=None) and scipy.linalg.expm(A, q=None).

When I used it, I needed to specify the time, t with which to multiply the matrix before exponentiating it. i.e. exp(A*t). This functionality is not available within SciPy's expm function, but I've left it in the expokit wrapper, as I think it could be valuable to others too. If left unspecified, the signature file says to use t=1., by default.

@Tillsten
Copy link

@rgommers can this go into 0.12?

@pv
Copy link
Member

pv commented Feb 11, 2013

It's a very good start, but not ready to merge just yet:

  • The expm implementation isn't exposed.
  • expokit.py -> _expokit.py.
  • Need to decide whether this should be the default implementation (in the long run, probably yes) or not, and what to do with the other expm functions. method keyword?
  • The logic in expm for choosing dspadm (symmetric matrix) vs. dgpadm (general matrix) doesn't look right.
  • Would be nice to have also the Krylov routines from Expokit working here, for sparse problems.

@alexleach
Copy link
Author

Sorry to have left it in this state! It's really below par compared with the rest of SciPy. I was stumped trying to fix the sparse routines and, having no actual need to use them myself, had to divert my attention elsewhere before tidying the rest of it.

I do have some thoughts to add, though:

  • There's no expokit.py module in the above commits. I named the python wrapper expowrap.py, so as not to clash with the expokit shared library.
  • SciPy's current expm functions are all in linalg/matfunc.py and are imported into the linalg namespace from init.py. For now, I think it's best to keep these as is, because the expokit methods seem to produce different results to the Python implementations. time_expm.py has a hard-coded precision down to which it will compare the results. I think these precision differences should be quite thoroughly tested before any replacement is considered.
  • A few weeks ago I thought that the fastest way to get this production ready would be as a separate sub-module, rather than trying to replace current functionality. Expokit has a few extra methods which could be useful on their own, so I thought expokit.py could contain thin and well documented function wrappers for each routine exposed in _expokit.so. Expokit is very well documented, so can just copy any documentation straight from its source code. f2py is pretty awesome at auto-generating many of the required Fortran arguments, at the C level. I've done most of the work with the dense exponential routines, but my experience with sparse matrices is very limited, so need some help there. How does that sound to you anyway?

@pv
Copy link
Member

pv commented Feb 12, 2013

I think we want to have the routine on top level of scipy.linalg. I don't think adding a submodule for this is good, as I think the aim is primarily to offer expm functionality and not a wrapper to expokit. (Also, expowrap.py -> _expowrap.py to keep the organization private.)

OTOH, we already have expm, expm2, expm3, which starts getting a bit silly. Adding a method keyword to expm might be a better alternative than adding expm4 for this, and would also allow keeping the expokit version as a non-default one.

I think it would be good to understand the precision issues before merging this PR, or at least have certainty that the wrapper is working as expected. It's also possible Scipy's current expm routines aren't as accurate, although Higham's Pade approximants should in principle be good. The more problems that are addressed now as opposed to later, the better --- there have been several examples in Scipy on what happens if you leave things for "later"...

I think support for sparse matrices would be nice, but that can be left also for later. Expokit's routines here take a matvec function, so one needs to tell f2py to have a callback function. The Python side should be written so that any scipy.sparse.linalg.LinearOperator can be passed in.

The present version looks already quite good, but it needs some polish (+ the precision issue).

@Tillsten
Copy link

@pv Only 3? Still 16 expm functions to go: http://www.cs.cornell.edu/cv/researchpdf/19ways+.pdf .

I really think the ability to provide t as an expm argument would be great in scipy, but sadly my knowledge in wrapping fortran code is not existent, so i cant help. For precession issues, i would probably believe expokit more than scipy.

@alexleach
Copy link
Author

So was mine 'til I needed expm. numpy's f2py makes it amazingly simple though!

Just seen a nice-looking tutorial here: http://websrv.cs.umt.edu/isis/index.php/F2py_example
I used the reference manual here, though: http://cens.ioc.ee/projects/f2py2e/usersguide/index.html

To clarify, the .pyf wrappers in the commit were automatically generated with f2py. I manually edited the wrappers for a couple of the routines, listed as "IMPLEMENTED", and the "unimplemented" ones should I think be brought up to a similar standard of automatic argument generation and then of course tested...

@argriffing argriffing mentioned this pull request May 3, 2013
@ev-br
Copy link
Member

ev-br commented Sep 21, 2013

What is the status of this --- is it superseded by gh-2441 and further activity, or is there something left here?

@pv
Copy link
Member

pv commented Sep 21, 2013

It's not clear if EXPOKIT is more effective than the Higham algorithms @argriffing implemented. We could add a method keyword to the expm functions to do this, but I guess this is now at a lower priority.

AFAIK, the feature set of the Krylov expm approximants in EXPOKIT are also covered by the Al-Mohy&Higham algorithm implemented in expm_multiply. I'd need to reread the expokit docs to be more sure if there's something missing.

@argriffing
Copy link
Contributor

The only routines I think need to be accessible from Python, for expm to work, are:-

dgpadm - Padé approximation of dense, double precision matrix.
zgpadm - Padé approximation of dense, complex matrix
dgexpv - Krylov projection of general sparse, double precision matrix
zgexpv - Krylov projection of general sparse, complex matrix

I think this subset is already covered in scipy, between expm and expm_multiply. I don't know how the scipy functions compare to EXPOKIT regarding accuracy or speed.

@alexleach
Copy link
Author

I haven't tested this code since the new expm was introduced and I couldn't comment on its accuracy, but I did find EXPOKIT's dgpadm quite a bit faster than SciPy's linalg.expm when I used it. I found most of the heavy lifting was done in BLAS (dgemm) in both implementations, so linking against Nvidia's cublas had some interesting results ;)

I don't have my working copy checked out atm, so can't really be much more more help I'm afraid (for the next couple of weeks, at least).

Maybe it would be a good idea to add a module to benchmark the different implementations. Just had a look and perhaps scipy.linalg.benchmarks.bench_expm would make a worthwhile module? time_expm.py in the PR might make a start...

@alexleach
Copy link
Author

Just had a little play with the code and tidied it up somewhat, as per above suggestions by @pv.
I'm stuck with the sparse routines, so it'd be good if someone more experienced with sparse matrices could help with that...

Thought people might like to see some benchmark results, so here they are, for dense, square matrices up to 50x50 in size:

test_times (__main__.TimeExpokit) ... Times averaged over 1000 iterations
Array size               EXPOKIT times            SciPy times              
5                         0.070 += 0.011            0.229 += 0.007
10                        0.082 += 0.009            0.258 += 0.003
15                        0.104 += 0.006            0.296 += 0.007
20                        0.151 += 0.013            0.373 += 0.004
25                        0.209 += 0.006            0.436 += 0.020
30                        0.243 += 0.016            0.467 += 0.008
35                        0.332 += 0.013            0.596 += 0.025
40                        0.292 += 0.009            0.656 += 0.022
45                        0.379 += 0.007            0.749 += 0.005
50                        0.405 += 0.009            0.809 += 0.025

I remember now that run-time ratios converge as the matrices get larger, but expokit seems to always have a slight lead. I used 200x200 matrices in practice, so if that's a likely use-case:

test_times (__main__.TimeExpokit) ... Times averaged over 1000 iterations
Array size               EXPOKIT times            SciPy times              
200                       10.636 += 0.127           13.504 += 0.426

This was compiled with Intel Compilers, linked against threaded MKL and run on a 6-core Intel CPU with HT (Intel W3680 @ 3.33GHz).

There are still precision differences, but I couldn't comment on the reasons for that, whether the algorithm or some parameter given to it. I've tried to make bench_expm.py as informative as possible regarding precision differences, though I don't fancy bloating the comments with all the output.. dgpadm seems to match scipy down to 5 or 6 decimal places, for both real and complex dense matrices (dgpadm and dzpadm).

Will push my changes over now...

…name expowrap.py to expokit.py, providing public interface. Turn time_expm.py into a proper benchmark module (with unittests), and removed redundancy between time_expm.py and expowrap.py (now linalg.benchmarks.bench_expm and linalg.expokit). Sparse matrices still not working...
@pv
Copy link
Member

pv commented Sep 23, 2013

@alexleach: which version of Scipy did you use for the benchmarks? Note that there have been quite big changes in Scipy expm routines recently.

@alexleach
Copy link
Author

@pv: Yes, good point! The benchmark results are only from my fork, which is about 7 months old by the look of it.. I've got release 0.12.0 installed as well. I'll see how that compares..

@alexleach
Copy link
Author

Against 0.12.0 release:

Array size               EXPOKIT times            SciPy times              
5                         0.073 += 0.001            0.253 += 0.012
10                        0.079 += 0.004            0.274 += 0.001
15                        0.099 += 0.002            0.303 += 0.005
20                        0.149 += 0.019            0.398 += 0.027
25                        0.193 += 0.004            0.445 += 0.023
30                        0.255 += 0.008            0.530 += 0.002
35                        0.343 += 0.015            0.569 += 0.021
40                        0.306 += 0.018            0.689 += 0.020
45                        0.363 += 0.014            0.760 += 0.029
50                        0.402 += 0.009            0.844 += 0.012

The compile flags may have been slightly different, but both were buit and linked against the same versions of icc, ifort and MKL.

Runtime setup:-

  • Delete expokit install (from ~/.local/lib/python2.7/..)

  • install expokit build to a fake directory (python setup.py install --prefix=$PWD/install)

  • cd into scipy.linalg.benchmark directory.

  • Move expokit.py and _expokit.so into benchmarks directory.

  • Changed bench_expm.py and expokit.py import statements for only the expokit module, and ran with:-

    PYTHONPATH=$PWD python bench_expm.py

Absolute imports (of scipy.linalg.expm) therefore import from my system scipy install (/usr prefix), which is release 0.12.0.

@argriffing
Copy link
Contributor

I'd like to investigate this in more detail, but I'll be unavailable for the next couple weeks. Here are some of my thoughts on expm before I leave. In any case, thanks for your work to look at putting EXPOKIT into scipy!

  • The scipy expm changes introduced since 0.12 have increased accuracy for some matrices for which 0.12 had particularly poor accuracy. I don't think I benchmarked the speed impact as much as I should have, so it's possible that the new expm is a lot slower. If so, it's probably because of my sloppy implementation of operator 1-norm estimation and linearoperator interfacing; the changes to the algorithm itself should not cause more slowness.
  • The scipy expm is designed for 64 bit float, so it may have much worse accuracy or behave unpredictably for other data types (for example float32 and complex), although the tests seem to pass. For 64 bit float, I think the algorithm underlying scipy expm is designed to be about as accurate as the conditioning of the problem allows, but this might not be true or it might not be rigorously proved.
  • I remember Higham having compared his expm (from which scipy expm is implemented) vs. the expm from a publication by the expokit author, but I don't have the details right now.
  • You may know about this or have mentioned it, but these benchmarks exist.
  • Your test matrices were sampled using np.random.rand() which makes entrywise non-negative matrices. For non-negative matrices it's easy to compute the 1-norm of matrix powers, so it may not stress the 1-norm estimation in the scipy expm as much as other matrices would.
  • I would be interested to see if the EXPOKIT expm is OK with the Burkardt test matrix suite in the tests. Unfortunately this suite may be designed to stress 32 bit float instead of 64 bit float, but I'm not sure. I think EXPOKIT may in fact have been a motivation for this test suite so accuracy reports may already be available on the internet, but old results on the internet should be taken with a grain of salt because a few of the expected outputs were buggy and have been corrected or improved in the scipy implementation of the test suite.
  • You guys have already thought about EXPOKIT licensing?

@alexleach
Copy link
Author

I just updated my working copy to scipy master, merged my changes and created a gist report containing the full output of python benchmarks/bench_expm.py -vvv, which shows the precision differences on 5x5 32bit and 64bit matrices. The sparse benchmark just fails.

See https://gist.github.com/alexleach/5e8a38d4e8bb90f4a53f

Would it make a huge mess of this PR if I updated my github fork to scipy's HEAD? Only one (minor) merge was necessary, in linalg/setup.py, but I'm now 2,148 commits ahead of alexleach/master..

@argriffing Licensing was discussed on the scipy-user mailing list. I did request and receive permission from the Expokit author; his full response is at: http://mail.scipy.org/pipermail/scipy-user/2012-November/033600.html

Thanks for linking those benchmarks and tests; I hadn't spotted them before. I might get a chance to implement test them out in a few weeks, but I'll have to leave this alone for the time being...

@argriffing
Copy link
Contributor

To quickly check which is more accurate, maybe use something like expm(logm(rand())) and see which roundtrip causes less difference? This isn't perfect because it depends on logm accuracy.

@alexleach
Copy link
Author

I ran 1,000 iterations of single random numbers and 1,000 runs with random 2x2 arrays, with slightly inconsistent results.

I put up the code I ran at https://gist.github.com/alexleach/4814fa84089ba67accb8 as a fairly minimal working example. scipy.linalg.expm seems marginally more accurate on average, but occasionally expokit outperforms linalg.expm.

Results over a run of 1,000 iterations:

Expokit wins    linalg wins    draws
84              151          765

@argriffing
Copy link
Contributor

Even if this scipy EXPOKIT wrapper would not immediately replace scipy.linalg.expm I'd find value in functions like scipy.linalg.expokit.expm(..) or scipy.sparse.linalg.expokit.expm_multiply(...). Personally I would be OK with having the partial wrapper in scipy, including only the dense expokit expm which has been implemented and not waiting for the sparse expokit expm_multiply which still has wrapping errors.

Would it make a huge mess of this PR if I updated my github fork to scipy's HEAD?

I'm not good with git. The preferred way to do PRs is to make your own branch per PR instead of making changes to the master branch in your fork, and I'm not sure what is the best way to recover. If some version of this expokit wrapper would be merged into scipy master then I might do some more testing and benchmarking, but I'm not up for "cherry picking" or whatever would be required otherwise.

@argriffing
Copy link
Contributor

@pv how would this fortran interact with your fortran-splitting work? In particular I'm thinking about how to modify the scipy/linalg/setup.py file. I basically don't know anything about fortran or about the scipy build process.

@argriffing
Copy link
Contributor

Perhaps fortran errors like https://github.com/alexleach/scipy/blob/1769fdd06b8b1348656bd2a60b7c2c62ad1ed9fa/scipy/linalg/src/expokit.f#L508 should be turned into scipy errors somehow, or maybe there is already a way to do this. The fortran stop is potent enough to kill the whole unit testing process and not just its specific test.

@pv
Copy link
Member

pv commented Oct 10, 2013

@argriffing: the "fortran-splitting" was just a work-around needed by
the ancient g77 compiler for the specific code issues in interpolative
module. I doubt it's needed here.

@argriffing
Copy link
Contributor

@alexleach here's a PR where I've played around with your expokit wrapper #2974. It features more integration into scipy, adds some new tests, and currently does not have merge conflicts.

@argriffing
Copy link
Contributor

For precession issues, i would probably believe expokit more than scipy.

Arbitrary precision matrix exponential seems to have been implemented here:
flintlib/arb@b78e176
This could be useful if we ever want to compare accuracy more rigorously.

@pv pv added the PR label Feb 19, 2014
@pv pv removed the PR label Aug 13, 2014
@argriffing argriffing mentioned this pull request Apr 23, 2015
@pv pv added the needs-work Items that are pending response from the author label Feb 16, 2016
@ajgpitch ajgpitch mentioned this pull request Apr 8, 2016
@auvipy
Copy link

auvipy commented Apr 18, 2016

is this stalled?

@argriffing
Copy link
Contributor

@auvipy Yes it's stalled as of two and a half years ago. Some features related to this PR are now available in scipy, but I'm pretty sure that some functions in expokit are better in some ways than the analogous functions that are currently in scipy.

@chiwhalee
Copy link

chiwhalee commented Aug 5, 2020

The only routines I think need to be accessible from Python, for expm to work, are:-
dgpadm - Padé approximation of dense, double precision matrix.
zgpadm - Padé approximation of dense, complex matrix
dgexpv - Krylov projection of general sparse, double precision matrix
zgexpv - Krylov projection of general sparse, complex matrix

I think this subset is already covered in scipy, between expm and expm_multiply. I don't know how the scipy functions compare to EXPOKIT regarding accuracy or speed.

Some other functions zhexpv, dsexpv are also very useful, which may be lacking in the scipy.scipy.linalg.expm_multiply. According to the doc of expokit, these two functions which take
symmetric or hermite matrices are much faster than the general ones. So a wrapper of expokit in scipy is still badly needed.

Besides, the expm_multiply function lacks a 'tol' argument.

@ilayn ilayn closed this in #15079 Mar 16, 2022
@ilayn ilayn added this to the 1.9.0 milestone Mar 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-work Items that are pending response from the author scipy.linalg
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants