Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add fast_matrix_market to scipy.io #18631

Merged
merged 54 commits into from Sep 15, 2023
Merged

ENH: Add fast_matrix_market to scipy.io #18631

merged 54 commits into from Sep 15, 2023

Conversation

alugowski
Copy link
Contributor

fast_matrix_market (FMM) is a C++ library with Python bindings to read/write sparse and dense matrices to MatrixMarket .mtx files. Basically a significantly faster version of scipy.io.mmread and scipy.io.mmwrite. I am the author.

It's currently on PyPI and Anaconda, with cibuildwheel wheels for Linux, macOS, and Windows.

I've gotten feedback about contributing it to SciPy, and I'd love for that to happen.

Folks on the mailing list suggested adding to scipy.io._fast_matrix_market and go from there. That's what this PR is.

To motivate the benefits of FMM, here is a plot comparing FMM and _mmio read speeds:
parallel-scaling-python-read

Questions:

  • How best to expose FMM to users? Currently a user would have to import from scipy.io._fast_matrix_market. Should this replace the scipy.io methods, be released alongside, or ??
  • Are there any standards with how to specify authorship in the source files? Since I don't know any better I have kept the headers I use in my own repo, but happy to change to fit any SciPy standards.
  • FMM's API is intentionally interchangeable with _mmio's. I have added a few things that I felt were missing from _mmio. These are listed below. Any thoughts on these are welcome.

API extensions over scipy.io._mmio methods:

FMM supports a few things that the existing _mmio methods do not:

  • both mmread and mmwrite: optional parallelism argument to specify thread pool size. Does this fit with how other scipy packages specify parallelism?
  • mmread: optional long_type boolean. Enables loading values as longdouble and longcomplex instead of float64 and complex64. AFAIK the _mmio methods cannot read data that requires longdouble precision, but can write it.
  • mmwrite: the symmetry argument has a new default value of AUTO. _mmio.mmwrite's default value is None, which looks at the matrix values for any symmetry. This is great, but is extremely slow on large matrices. It can easily slow down a write by 5x. The FMM solution is to still do that if the user explicitly asks for it (by passing None), but by default only do it on small matrices. This default can be overridden to match _mmio behavior by setting fmm.ALWAYS_FIND_SYMMETRY = True. All non-AUTO values behave just like in _mmio.mmwrite. All the benchmarks presented here use symmetry='general' to avoid this performance hit to _mmio.mmwrite.

Two improvements that are not visible in the API:

  • mmread: automatically switch to 64-bit indices if the .mtx file's dimensions do not fit in 32-bit indices. _mmio uses intc, and on platforms where that is 32-bit _mmio.mmread simply crashes. Similarly on platforms where intc is 64-bit FMM will use 32-bit indices and save memory if the matrix dimensions allow it.
  • mmwrite: if the matrix being written is a csr_matrix or csc_matrix then a dedicated implementation writes those matrices directly. _mmio.mmwrite converts to coo_matrix and writes that instead. This avoided copy is visible in the memory benchmark.

Notes:

FMM fills in compiler support gaps with two libraries: fast_float (used under MIT license), and Ryu (used under Boost license). See README.md for details. These compiler gaps are nearly filled in the latest compiler versions, but not the ones used to build wheels.

FMM passes the test_mmio test suite, and needs to continue doing so, so instead of adding another test suite I added a pytest fixture to test_mmio that runs all tests against both implementations. I added a test for mmread's long_type argument.

I've also added an io_mm benchmark that runs for both _mmio and FMM. It is modeled on io_matlab and tests memory usage and runtime (on small 10MB matrices).

Benchmark outputs:

For the MemUsage benchmark the unit is "actual/optimal memory usage ratio", which is what the io_matlab benchmark reports. This the ratio of peak process memory usage and matrix size in bytes.

Reading a CSR matrix makes no sense, so that column is 0.

· Discovering benchmarks
· Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks)
[  0.00%] ·· Benchmarking existing-py_opt_conda_envs_scipy-dev_bin_python
[ 50.00%] ··· io_mm.MemUsage.track_mmread                                                                                                                                                                                                   1/24 failed
[ 50.00%] ··· ====== ============================== ==================== ==================== =====
              --                                                      matrix_type
              ------------------------------------- -----------------------------------------------
               size          implementation                dense                 coo           csr
              ====== ============================== ==================== ==================== =====
                1M           scipy.io._mmio                92.968               91.932          0
                1M    scipy.io._fast_matrix_market        108.156              108.092          0
               10M           scipy.io._mmio               10.0648              10.0608          0
               10M    scipy.io._fast_matrix_market         15.95                15.454          0
               100M          scipy.io._mmio               1.89012              1.87936          0
               100M   scipy.io._fast_matrix_market         2.6274              2.63188          0
               300M          scipy.io._mmio          1.2808666666666666         failed          0
               300M   scipy.io._fast_matrix_market   1.5505066666666667   1.5493333333333332    0
              ====== ============================== ==================== ==================== =====

[ 50.00%] ···· For parameters: '300M', 'scipy.io._mmio', 'coo'


               asv: benchmark timed out (timeout 240s)

[100.00%] ··· io_mm.MemUsage.track_mmwrite                                                                                                                                                                                                  2/24 failed
[100.00%] ··· ====== ============================== ========= ==================== ====================
              --                                                        matrix_type
              ------------------------------------- ---------------------------------------------------
               size          implementation           dense           coo                  csr
              ====== ============================== ========= ==================== ====================
                1M           scipy.io._mmio           92.84          98.72                98.284
                1M    scipy.io._fast_matrix_market   105.084        107.512              107.684
               10M           scipy.io._mmio          10.1028        10.6968              12.2012
               10M    scipy.io._fast_matrix_market    11.356        11.3576              11.4472
               100M          scipy.io._mmio          1.88588        2.39664               3.8756
               100M   scipy.io._fast_matrix_market    2.0278        2.03196              2.04592
               300M          scipy.io._mmio           failed        1.78268               failed
               300M   scipy.io._fast_matrix_market   1.33392   1.3363733333333334   1.3624133333333333
              ====== ============================== ========= ==================== ====================

[100.00%] ···· For parameters: '300M', 'scipy.io._mmio', 'dense'


               asv: benchmark timed out (timeout 240s)

               For parameters: '300M', 'scipy.io._mmio', 'csr'


               asv: benchmark timed out (timeout 240s)

Another run for the IOSpeed benchmark:

[  0.00%] ·· Benchmarking existing-py_opt_homebrew_Caskroom_miniconda_base_envs_scipy-dev_bin_python
[ 12.50%] ··· Running (io_mm.IOSpeed.time_mmread--)..
[ 62.50%] ··· io_mm.IOSpeed.time_mmread                                                                                                                                                                                                 ok
[ 62.50%] ··· ============================== ============ ============ ============
              --                                          matrix_type
              ------------------------------ --------------------------------------
                      implementation            dense         coo          csr
              ============================== ============ ============ ============
                      scipy.io._mmio           544±2ms      539±2ms     86.6±0.7ns
               scipy.io._fast_matrix_market   48.3±0.8ms   40.9±0.6ms   86.7±0.7ns
              ============================== ============ ============ ============

[ 75.00%] ··· io_mm.IOSpeed.time_mmwrite                                                                                                                                                                                                ok
[ 75.00%] ··· ============================== ============ ========== ==========
              --                                        matrix_type
              ------------------------------ ----------------------------------
                      implementation            dense        coo        csr
              ============================== ============ ========== ==========
                      scipy.io._mmio           839±2ms     489±2ms    636±2ms
               scipy.io._fast_matrix_market   15.1±0.4ms   25.7±4ms   33.9±3ms
              ============================== ============ ========== ==========

@alugowski alugowski requested a review from rgommers as a code owner June 5, 2023 05:03
@github-actions github-actions bot added C/C++ Items related to the internal C/C++ code base scipy.io labels Jun 5, 2023
@alugowski
Copy link
Contributor Author

Here is why the two dependencies are needed. FMM includes fallbacks to the standard libraries, but apart from being slower those routines also internally lock on the system locale. That kills parallelism.

Here is a mmwrite benchmark using standard library double-to-string conversion:

parallel-scaling-python-write-stdlib

Here is the same benchmark using Ryu:

parallel-scaling-python-write-ryu

Note that the Ryu version has a faster sequential speed (p=1), but more importantly it can be parallelized.

There is a similar story with mmread and fast_float.

@alugowski
Copy link
Contributor Author

That circleci build_scipy test fail run has this message:

[1616/1624] Compiling C++ object scipy/optimize/_highs/_highs_wrapper.cpython-39-x86_64-linux-gnu.so.p/meson-generated__highs_wrapper.cpp.o
[1617/1624] Linking target scipy/optimize/_highs/_highs_wrapper.cpython-39-x86_64-linux-gnu.so
[1618/1624] Compiling C object scipy/io/matlab/_mio5_utils.cpython-39-x86_64-linux-gnu.so.p/meson-generated__mio5_utils.c.o
[1619/1624] Linking target scipy/io/matlab/_mio5_utils.cpython-39-x86_64-linux-gnu.so
[1620/1624] Compiling C object scipy/io/_fast_matrix_market/_core.cpython-39-x86_64-linux-gnu.so.p/fast_matrix_market_dependencies_ryu_ryu_f2s.c.o
[1621/1624] Compiling C object scipy/io/_fast_matrix_market/_core.cpython-39-x86_64-linux-gnu.so.p/fast_matrix_market_dependencies_ryu_ryu_d2s.c.o
[1622/1624] Compiling C object scipy/io/_fast_matrix_market/_core.cpython-39-x86_64-linux-gnu.so.p/fast_matrix_market_dependencies_ryu_ryu_d2fixed.c.o
ninja: build stopped: interrupted by user.

Too long with no output (exceeded 10m0s): context deadline exceeded

I'm not sure what to do with that information. I do see locally that the d2fixed.c compile is the very last line in compile_commands.json, so it's not clear to me whether that froze or whatever Ninja did next froze.

@alugowski alugowski requested a review from larsoner as a code owner June 6, 2023 00:18
@alugowski
Copy link
Contributor Author

ninja: build stopped: interrupted by user.

Too long with no output (exceeded 10m0s): context deadline exceeded

I believe this was caused by the core module being compiled in one step. I split it into multiple modules that can be compiled in parallel so hopefully that issue is resolved.

@alugowski
Copy link
Contributor Author

One more design choice: FMM supports two operations that _mmio does not:

  • direct csr/csc writes, as mentioned in the OP
  • reading vector Matrix Market files. These are non-standard and very rare, but exist in the wild. _mmio.mmread does not support these.

Together these two features account for about 25% of library size. One or both can dropped. I'll keep them in the stand-alone FMM package.

@rgommers
Copy link
Member

rgommers commented Jun 6, 2023

Thanks for the PR and detailed explanations on what this does and choices you made @alugowski!

A small logistical comment: if it becomes annoying that GitHub Actions CI does not run without someone hitting the "Approve and run" button, you can make a small unrelated PR, some simple doc tweak or whatever, and we can merge that quickly to make the need for approving CI runs go away.

@rgommers
Copy link
Member

rgommers commented Jun 6, 2023

How best to expose FMM to users? Currently a user would have to import from scipy.io._fast_matrix_market. Should this replace the scipy.io methods, be released alongside, or ??

FMM provides a superset of functionality and is basically much faster/better all around, right? If so, replacing the old implementation is perfectly fine, and preferred over keeping both.

Are there any standards with how to specify authorship in the source files? Since I don't know any better I have kept the headers I use in my own repo, but happy to change to fit any SciPy standards.

We have no strict standards; SPDX headers are nice to have. I'm hoping we can standardize it at some point with REUSE (https://reuse.software/) tooling.

FMM's API is intentionally interchangeable with _mmio's. I have added a few things that I felt were missing from _mmio. These are listed below. Any thoughts on these are welcome.

From an initial reading of the API changes, the parallelism and symmetry ones sound good. The long_type one I'd like to look more at, since I'd like to get rid of longdouble in NumPy and SciPy sooner rather than later.

@alugowski
Copy link
Contributor Author

A small logistical comment: if it becomes annoying that GitHub Actions CI does not run without someone hitting the "Approve and run" button, you can make a small unrelated PR, some simple doc tweak or whatever, and we can merge that quickly to make the need for approving CI runs go away.

That'd be nice. I just opened this one: #18640

@rgommers
Copy link
Member

rgommers commented Jun 8, 2023

Things build now for me. The build is unfortunately quite heavy:

image

The three ryu files are in the 120-240ms range, but the 6 _core ones are all in the 9-20 second range. That seems too much, it'd be good to find ways to trim that down.

@alugowski
Copy link
Contributor Author

alugowski commented Jun 8, 2023

Things build now for me. The build is unfortunately quite heavy:
image

The three ryu files are in the 120-240ms range, but the 6 _core ones are all in the 9-20 second range. That seems too much, it'd be good to find ways to trim that down.

What tool is that from?

It's likely we're seeing the price of all the type combinations (74). There are so many to avoid casting, but there are a few long-hanging fruit to trim it down.

  • 6 of those are reading long double, and 10 more for writing.
  • Reading vector files I mentioned previously. These effectively count for about 15, mostly in the _core_write_coo chunk. Again _mmio doesn't support these, this would be new functionality.
  • Writing csr accounts for 20, 4 of those are extra long double. Again this is new functionality. Dropping this would remove the _core_write_csc chunk entirely.

If we went to just matching existing _mmio features then dropping all the above should cut the build drastically.

@alugowski
Copy link
Contributor Author

How best to expose FMM to users? Currently a user would have to import from scipy.io._fast_matrix_market. Should this replace the scipy.io methods, be released alongside, or ??

FMM provides a superset of functionality and is basically much faster/better all around, right? If so, replacing the old implementation is perfectly fine, and preferred over keeping both.

Sounds good to me. The main downside I can think of is the library size, which shouldn't cause issues generally but might in some applications I'm not privvy to. The motivator for writing the library is large matrices, which are downright painful with _mmio. Small ones work fine with the relatively small Python code. There might also be some weird use case somewhere that the pure Python code can handle, like someone using a custom object for the data array or similar hacks.

Are there any standards with how to specify authorship in the source files? Since I don't know any better I have kept the headers I use in my own repo, but happy to change to fit any SciPy standards.

We have no strict standards; SPDX headers are nice to have. I'm hoping we can standardize it at some point with REUSE (https://reuse.software/) tooling.

Perfect. I already have SPDX headers in there. This makes things easier for me since I can do simple copy from my repo.

FMM's API is intentionally interchangeable with _mmio's. I have added a few things that I felt were missing from _mmio. These are listed below. Any thoughts on these are welcome.

From an initial reading of the API changes, the parallelism and symmetry ones sound good. The long_type one I'd like to look more at, since I'd like to get rid of longdouble in NumPy and SciPy sooner rather than later.

If the type is going away then it doesn't make sense to try to get people to start using it. We could keep the ability to write longdouble.

Just curious, what makes you want to drop longdouble?

@rgommers
Copy link
Member

rgommers commented Jun 8, 2023

What tool is that from?

It's https://github.com/nico/ninjatracing. To replicate:

$ git clean -xdf  # to profile a full build
$ meson setup build  # we can't use dev.py here, because it overwrites .ninja_log when installing
$ ninja -C build
$ python tools/ninjatracing.py build/.ninja_log > trace.json

And then load trace.json into https://ui.perfetto.dev/.

@rgommers
Copy link
Member

rgommers commented Jun 8, 2023

It's likely we're seeing the price of all the type combinations (74). There are so many to avoid casting, but there are a few long-hanging fruit to trim it down.

Let's try first to trim all of those indeed I'd say.

Just curious, what makes you want to drop longdouble?

It's basically a useless alias to double on both Windows and macOS arm64, there's significant gaps in C99 support for it on more niche platforms (NetBSD and the like), and overall the ratio of its utility for a small set of use cases on 64-bit Linux is far from enough to justify the maintenance cost. See here for more details.

@alugowski
Copy link
Contributor Author

I've implemented the changes above, and found a few more optimizations. The build time is significantly faster and the library size is also down to about a third of what it was.

image

Curiously the .so that my scikit-build CMake workflow builds is half the size of the one meson builds, despite basically none of the pruning and an additional dependency.

@rgommers
Copy link
Member

Curiously the .so that my scikit-build CMake workflow builds is half the size of the one meson builds, despite basically none of the pruning and an additional dependency.

Perhaps just a release vs a debug build? When using dev.py or meson/ninja directly, we use the debugoptimized build type. When going via python -m build or pip, we use a release build type. The difference is typically on the order of 3x-6x

@alugowski
Copy link
Contributor Author

Curiously the .so that my scikit-build CMake workflow builds is half the size of the one meson builds, despite basically none of the pruning and an additional dependency.

Perhaps just a release vs a debug build? When using dev.py or meson/ninja directly, we use the debugoptimized build type. When going via python -m build or pip, we use a release build type. The difference is typically on the order of 3x-6x

Good point, that sounds likely. I was comparing it to a release build.

@rgommers
Copy link
Member

This is very close it looks like. The 32-bit build still seems unhappy:

/scipy/test/lib/python3.9/site-packages/pybind11/include/pybind11/detail/../detail/../pytypes.h:1575:69:   required from ‘pybind11::bytes::bytes(const char*, const SzType&) [with SzType = long long int; typename std::enable_if<std::is_integral<_Tp>::value, int>::type <anonymous> = 0]’
../scipy/io/_fast_matrix_market/src/pystreambuf.h:267:41:   required from here
/scipy/test/lib/python3.9/site-packages/pybind11/include/pybind11/detail/../detail/common.h:484:35: error: static assertion failed: Implicit narrowing is not permitted.
  484 |     static_assert(sizeof(IntType) <= sizeof(ssize_t), "Implicit narrowing is not permitted.");

@alugowski
Copy link
Contributor Author

This is very close it looks like. The 32-bit build still seems unhappy:

/scipy/test/lib/python3.9/site-packages/pybind11/include/pybind11/detail/../detail/../pytypes.h:1575:69:   required from ‘pybind11::bytes::bytes(const char*, const SzType&) [with SzType = long long int; typename std::enable_if<std::is_integral<_Tp>::value, int>::type <anonymous> = 0]’
../scipy/io/_fast_matrix_market/src/pystreambuf.h:267:41:   required from here
/scipy/test/lib/python3.9/site-packages/pybind11/include/pybind11/detail/../detail/common.h:484:35: error: static assertion failed: Implicit narrowing is not permitted.
  484 |     static_assert(sizeof(IntType) <= sizeof(ssize_t), "Implicit narrowing is not permitted.");

Yes. I've a found a fix for that error, but I'm going through my own test suite first before pushing to this branch.

@alugowski
Copy link
Contributor Author

Any thoughts on that refguide_check failure? This is the output:

Running checks for 36 modules:
scipy.cluster .. scipy.cluster.hierarchy .................................................................. scipy.cluster.vq .......... scipy.constants .............................................................................................................................................................................. scipy.datasets .......Downloading file 'ascent.dat' from 'https://raw.githubusercontent.com/scipy/dataset-ascent/main/ascent.dat' to '/home/circleci/.cache/scipy-data'.
.Downloading file 'ecg.dat' from 'https://raw.githubusercontent.com/scipy/dataset-ecg/main/ecg.dat' to '/home/circleci/.cache/scipy-data'.
.Downloading file 'face.dat' from 'https://raw.githubusercontent.com/scipy/dataset-face/main/face.dat' to '/home/circleci/.cache/scipy-data'.
... scipy.fft .............................................................................. scipy.fftpack ............................................................. scipy.fftpack.convolve ......... scipy.integrate ................................................................ scipy.interpolate ...................................................................................................... scipy.io ......................terminate called after throwing an instance of 'pybind11::error_already_set'
  what():  OSError: Can't do nonzero cur-relative seeks
Aborted (core dumped)
Task Error - refguide-check => Command error: 'env 
PYTHONPATH=/home/circleci/repo/build-install/lib/python3.9/site-packages 
/home/circleci/.pyenv/versions/3.9.17/bin/python 
/home/circleci/repo/tools/refguide_check.py --doctests' returned 134

Exited with code exit status 2

I don't see that error when I run python dev.py python tools/refguide_check.py locally.

@rgommers
Copy link
Member

I can reproduce that locally on macOS with:

% python dev.py refguide-check -s io
scipy.io ......................libc++abi: terminating due to uncaught exception of type pybind11::error_already_set: OSError: Can't do nonzero cur-relative seeks

It looks like the problem is that in text (rather than binary) mode it's only allowed to seek from the beginning of the file in Python. From https://docs.python.org/3.11/tutorial/inputoutput.html#methods-of-file-objects: "In text files (those opened without a b in the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end with seek(0, 2)) and the only valid offset values are those returned from the f.tell(), or zero. Any other offset value produces undefined behaviour."

It looks like you are calling back into Python from C++ with py_seek, and that doesn't handle the above constraint. The mmread docstring contains an example using StringIO, which may highlight the problem.

@alugowski
Copy link
Contributor Author

I can reproduce that locally on macOS with:

% python dev.py refguide-check -s io
scipy.io ......................libc++abi: terminating due to uncaught exception of type pybind11::error_already_set: OSError: Can't do nonzero cur-relative seeks

Thanks! yes I can reproduce it now.

It looks like the problem is that in text (rather than binary) mode it's only allowed to seek from the beginning of the file in Python. From https://docs.python.org/3.11/tutorial/inputoutput.html#methods-of-file-objects: "In text files (those opened without a b in the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end with seek(0, 2)) and the only valid offset values are those returned from the f.tell(), or zero. Any other offset value produces undefined behaviour."

It looks like you are calling back into Python from C++ with py_seek, and that doesn't handle the above constraint. The mmread docstring contains an example using StringIO, which may highlight the problem.

Good lead, it's the mminfo docstring. Looks like pystreambuf, the class that adapts Python stream objects to C++ stream objects calls py_seek in its sync method. I'll have to investigate why. I do use StringIO extensively in tests and explicitly avoid seeks to enable 1-pass reads, so this one is a surprise.

@alugowski
Copy link
Contributor Author

Ok so the issue only occurred on mminfo because that method only reads the header then exits. All other methods read the entire stream. test_mmio uses temp files, which FMM opens in C++ and again sidesteps this error.

pystreambuf would seek on sync because its own buffering could make its and the backing Python stream's positions not the same. Luckily I'm already forced to use a string-to-bytes adapter class (because the C++ code only works on bytes). So I overrode the seek method in this adapter to ignore these seeks and the issue is fixed.

@rgommers rgommers changed the title ENH: Add fast_matrix_market to scipy.io ENH: Add fast_matrix_market to scipy.io Jun 17, 2023
@alugowski
Copy link
Contributor Author

Ok, I fixed my extraneous merge snafu.

Again apologies for all the folks accidentally added as reviewers. Looks like I don't have the ability to fix that, can someone with appropriate permissions remove all but the first two?

This is determined by the order of subdir's at the bottom of scipy/meson.build. io is right at the bottom because until now it was a very lightweight module. You can move it up, that should help - between fft and spatial is probably about right given the compile times involved.

Done.

It's also been helpfully pointed out to me that _core probably isn't the best name for the extension module. It's very generic. So I renamed it from _core to _fmm_core to be more clear and searchable.

@alugowski
Copy link
Contributor Author

@rgommers glad the wheel sizes are not impacted more than expected. If the size becomes an issue in the future I'm happy to revisit.

@alugowski
Copy link
Contributor Author

A small update:

I convinced the threadpoolctl folks to add a way to specify custom controllers. As of threadpoolctl 3.2.0, that is now possible. Now regular FMM can be controlled by threadpoolctl, and so I'm also updating this PR to do the same.

SciPy users already use threadpoolctl to control BLAS and OpenMP parallelism, so it's natural to use the same mechanism here. Since this code isn't public yet, I went ahead and dropped the now-redundant parallelism arguments to mmread and mmwrite.

@j-bowhay
Copy link
Member

Looks like you have picked up an unwanted submodule change for unuran.

@alugowski
Copy link
Contributor Author

Looks like you have picked up an unwanted submodule change for unuran.

Thank you! Reverted.

@alugowski
Copy link
Contributor Author

@rgommers I know it's a big PR, let me know if I can help make it more digestible in any way.

@rgommers
Copy link
Member

Argh no, it's perfectly digestible. I just dropped the ball on final review - too much to do in too little time.

@rgommers
Copy link
Member

I convinced the threadpoolctl folks to add a way to specify custom controllers. As of threadpoolctl 3.2.0, that is now possible. Now regular FMM can be controlled by threadpoolctl, and so I'm also updating this PR to do the same.

SciPy users already use threadpoolctl to control BLAS and OpenMP parallelism, so it's natural to use the same mechanism here. Since this code isn't public yet, I went ahead and dropped the now-redundant parallelism arguments to mmread and mmwrite.

This is quite nice! I played with it a bit, and I'm happy with how this looks.

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite good, CI is happy, performance is great, build time and wheel file size increase seem acceptable and we touched on that on the mailing list before. It's a lot of C++ code and the C++17 usage may perhaps cause an issue on some niche platform - but more review won't catch that, and that this has previously been released as a standalone package should have caught most of the problems here. So let's give this a go!

Very nice work, thanks again @alugowski.

@rgommers rgommers merged commit 7e96eff into scipy:main Sep 15, 2023
24 checks passed
@alugowski
Copy link
Contributor Author

Great! Thank you for the invaluable feedback throughout the process @rgommers !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C/C++ Items related to the internal C/C++ code base scipy.io
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants