New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add fast_matrix_market
to scipy.io
#18631
Conversation
Here is why the two dependencies are needed. FMM includes fallbacks to the standard libraries, but apart from being slower those routines also internally lock on the system locale. That kills parallelism. Here is a Here is the same benchmark using Ryu: Note that the Ryu version has a faster sequential speed (p=1), but more importantly it can be parallelized. There is a similar story with |
That circleci
I'm not sure what to do with that information. I do see locally that the |
I believe this was caused by the core module being compiled in one step. I split it into multiple modules that can be compiled in parallel so hopefully that issue is resolved. |
One more design choice: FMM supports two operations that
Together these two features account for about 25% of library size. One or both can dropped. I'll keep them in the stand-alone FMM package. |
Thanks for the PR and detailed explanations on what this does and choices you made @alugowski! A small logistical comment: if it becomes annoying that GitHub Actions CI does not run without someone hitting the "Approve and run" button, you can make a small unrelated PR, some simple doc tweak or whatever, and we can merge that quickly to make the need for approving CI runs go away. |
FMM provides a superset of functionality and is basically much faster/better all around, right? If so, replacing the old implementation is perfectly fine, and preferred over keeping both.
We have no strict standards; SPDX headers are nice to have. I'm hoping we can standardize it at some point with REUSE (https://reuse.software/) tooling.
From an initial reading of the API changes, the |
That'd be nice. I just opened this one: #18640 |
Sounds good to me. The main downside I can think of is the library size, which shouldn't cause issues generally but might in some applications I'm not privvy to. The motivator for writing the library is large matrices, which are downright painful with
Perfect. I already have SPDX headers in there. This makes things easier for me since I can do simple copy from my repo.
If the type is going away then it doesn't make sense to try to get people to start using it. We could keep the ability to write Just curious, what makes you want to drop |
It's https://github.com/nico/ninjatracing. To replicate:
And then load |
Let's try first to trim all of those indeed I'd say.
It's basically a useless alias to |
I've implemented the changes above, and found a few more optimizations. The build time is significantly faster and the library size is also down to about a third of what it was. Curiously the |
Perhaps just a release vs a debug build? When using |
Good point, that sounds likely. I was comparing it to a release build. |
This is very close it looks like. The 32-bit build still seems unhappy:
|
Yes. I've a found a fix for that error, but I'm going through my own test suite first before pushing to this branch. |
Any thoughts on that
I don't see that error when I run |
I can reproduce that locally on macOS with: % python dev.py refguide-check -s io
scipy.io ......................libc++abi: terminating due to uncaught exception of type pybind11::error_already_set: OSError: Can't do nonzero cur-relative seeks It looks like the problem is that in text (rather than binary) mode it's only allowed to seek from the beginning of the file in Python. From https://docs.python.org/3.11/tutorial/inputoutput.html#methods-of-file-objects: "In text files (those opened without a b in the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end with It looks like you are calling back into Python from C++ with |
Thanks! yes I can reproduce it now.
Good lead, it's the |
Ok so the issue only occurred on
|
fast_matrix_market
to scipy.io
Ok, I fixed my extraneous merge snafu. Again apologies for all the folks accidentally added as reviewers. Looks like I don't have the ability to fix that, can someone with appropriate permissions remove all but the first two?
Done. It's also been helpfully pointed out to me that |
@rgommers glad the wheel sizes are not impacted more than expected. If the size becomes an issue in the future I'm happy to revisit. |
A small update: I convinced the threadpoolctl folks to add a way to specify custom controllers. As of threadpoolctl 3.2.0, that is now possible. Now regular FMM can be controlled by threadpoolctl, and so I'm also updating this PR to do the same. SciPy users already use threadpoolctl to control BLAS and OpenMP parallelism, so it's natural to use the same mechanism here. Since this code isn't public yet, I went ahead and dropped the now-redundant |
Looks like you have picked up an unwanted submodule change for |
Thank you! Reverted. |
@rgommers I know it's a big PR, let me know if I can help make it more digestible in any way. |
Argh no, it's perfectly digestible. I just dropped the ball on final review - too much to do in too little time. |
This is quite nice! I played with it a bit, and I'm happy with how this looks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks quite good, CI is happy, performance is great, build time and wheel file size increase seem acceptable and we touched on that on the mailing list before. It's a lot of C++ code and the C++17 usage may perhaps cause an issue on some niche platform - but more review won't catch that, and that this has previously been released as a standalone package should have caught most of the problems here. So let's give this a go!
Very nice work, thanks again @alugowski.
Great! Thank you for the invaluable feedback throughout the process @rgommers ! |
fast_matrix_market
(FMM) is a C++ library with Python bindings to read/write sparse and dense matrices to MatrixMarket.mtx
files. Basically a significantly faster version ofscipy.io.mmread
andscipy.io.mmwrite
. I am the author.It's currently on PyPI and Anaconda, with cibuildwheel wheels for Linux, macOS, and Windows.
I've gotten feedback about contributing it to SciPy, and I'd love for that to happen.
Folks on the mailing list suggested adding to
scipy.io._fast_matrix_market
and go from there. That's what this PR is.To motivate the benefits of FMM, here is a plot comparing FMM and
_mmio
read speeds:Questions:
scipy.io._fast_matrix_market
. Should this replace the scipy.io methods, be released alongside, or ??API extensions over scipy.io._mmio methods:
FMM supports a few things that the existing _mmio methods do not:
mmread
andmmwrite
: optionalparallelism
argument to specify thread pool size. Does this fit with how other scipy packages specify parallelism?mmread
: optionallong_type
boolean. Enables loading values aslongdouble
andlongcomplex
instead offloat64
andcomplex64
. AFAIK the _mmio methods cannot read data that requireslongdouble
precision, but can write it.mmwrite
: thesymmetry
argument has a new default value ofAUTO
._mmio.mmwrite
's default value is None, which looks at the matrix values for any symmetry. This is great, but is extremely slow on large matrices. It can easily slow down a write by 5x. The FMM solution is to still do that if the user explicitly asks for it (by passingNone
), but by default only do it on small matrices. This default can be overridden to match _mmio behavior by settingfmm.ALWAYS_FIND_SYMMETRY = True
. All non-AUTO values behave just like in_mmio.mmwrite
. All the benchmarks presented here usesymmetry='general'
to avoid this performance hit to_mmio.mmwrite
.Two improvements that are not visible in the API:
mmread
: automatically switch to 64-bit indices if the.mtx
file's dimensions do not fit in 32-bit indices._mmio
usesintc
, and on platforms where that is 32-bit_mmio.mmread
simply crashes. Similarly on platforms whereintc
is 64-bit FMM will use 32-bit indices and save memory if the matrix dimensions allow it.mmwrite
: if the matrix being written is acsr_matrix
orcsc_matrix
then a dedicated implementation writes those matrices directly._mmio.mmwrite
converts tocoo_matrix
and writes that instead. This avoided copy is visible in the memory benchmark.Notes:
FMM fills in compiler support gaps with two libraries: fast_float (used under MIT license), and Ryu (used under Boost license). See README.md for details. These compiler gaps are nearly filled in the latest compiler versions, but not the ones used to build wheels.
FMM passes the
test_mmio
test suite, and needs to continue doing so, so instead of adding another test suite I added a pytest fixture totest_mmio
that runs all tests against both implementations. I added a test formmread
'slong_type
argument.I've also added an
io_mm
benchmark that runs for both_mmio
and FMM. It is modeled onio_matlab
and tests memory usage and runtime (on small 10MB matrices).Benchmark outputs:
For the
MemUsage
benchmark the unit is "actual/optimal memory usage ratio", which is what the io_matlab benchmark reports. This the ratio of peak process memory usage and matrix size in bytes.Reading a CSR matrix makes no sense, so that column is 0.
Another run for the IOSpeed benchmark: