Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for NumPy 1.26 #7101

Merged
merged 1 commit into from Sep 7, 2023
Merged

Conversation

jarrodmillman
Copy link
Contributor

@jarrodmillman jarrodmillman commented Aug 18, 2023

@jarrodmillman jarrodmillman added the 🔧 type: Maintenance Refactoring and maintenance of internals label Aug 18, 2023
@lagru
Copy link
Member

lagru commented Aug 19, 2023

Is it possible that gufunc (in this case _umath_linalg.eigvalsh_lo or something in that machinery is not thread-safe? I am struggling to find a cause for this on our side.

@seberg, I hope it's okay to ping you in the hope that you have some insight or know whom to ask about this. 🙏 E.g. might numpy/numpy@7508afd be responsible?

@lagru
Copy link
Member

lagru commented Aug 19, 2023

I'll try my luck with git bisect later. Perhaps I can identify a commit on our or NumPy's side. Otherwise I am out of ideas.

@lagru
Copy link
Member

lagru commented Aug 22, 2023

An even more minimal reproducing example:

import numpy as np
from concurrent.futures import ThreadPoolExecutor

assert np.__version__ == '1.26.0b1'

rng = np.random.default_rng(32)
matrices = (
    rng.random((5, 10, 10, 3, 3)),
    rng.random((5, 10, 10, 3, 3)),
    # rng.random((5, 10, 10, 3, 3)),
)

with ThreadPoolExecutor(max_workers=None) as ex:
    list(ex.map(lambda m: np.linalg.eigvalsh(m), matrices,))

on the pre-build NumPy 1.26.0b1, I get one of these errors:

  • free(): invalid pointer
    free(): invalid pointer
    
    Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
    
  •   File "/home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/linalg/linalg.py", line 1181, in eigvalsh
      w = gufunc(a, signature=signature, extobj=extobj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ValueError: On entry to DSYEVD parameter number 8 had an illegal value
    
  • Sometimes it even passes! 🙈

When building NumPy 1.26.0b1 from source with python setup.py build_ext --inplace -j 4 it passes all the time. So git bisect is no use right now.

Maybe something todo with CBLAS etc.? Not sure how to read the output of numpy.show_config().

@lagru
Copy link
Member

lagru commented Aug 23, 2023

Reported upstream in numpy/numpy#24512.

@lagru lagru added the ⬆️ Upstream Needs help from or involves an upstream project label Aug 23, 2023
@lagru
Copy link
Member

lagru commented Aug 24, 2023

Just noticed, the failing test in eb97c5e is a different one

def test_trainable_segmentation_oo():

compared to the one mentioned in #6970 (comment). But it's the same function np.linalg.eigvalsh that fails. Though I can't reproduce this failure locally. Only the previous one fails for me locally.

@seberg
Copy link
Contributor

seberg commented Sep 4, 2023

Is it possible that gufunc (in this case _umath_linalg.eigvalsh_lo or something in that machinery is not thread-safe? I am struggling to find a cause for this on our side.

Sorry, just on the process of coming back from vacation. I am very sure that the lapack lite stuff is not threadsafe, but if you install wheels they shouldn't be using it, the np.show_runtime() would be useful. You may be using it if you install NumPy yourself locally though.
(show_runtime() is often more helpful than show_config())

Next, the warning state may be unsafe, but this doesn't seem the problem (fixed on NumPy main finally). I am not aware there are other threading issues and I somewhat doubt it, but neither am I quite sure there are not. (Or whether openblas might have issues.)

@seberg
Copy link
Contributor

seberg commented Sep 4, 2023

Ah, hand't seen the NumPy issue where some nice sleuthing found that it is indeed the lapack-lite being used (due to build problems) which were fixed.

@jarrodmillman
Copy link
Contributor Author

Copy link
Member

@lagru lagru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@jarrodmillman jarrodmillman merged commit 5fb4d72 into scikit-image:main Sep 7, 2023
20 checks passed
@stefanv stefanv added this to the 0.22 milestone Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⬆️ Upstream Needs help from or involves an upstream project 🔧 type: Maintenance Refactoring and maintenance of internals
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants