Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: const signature changes in cython_blas and cython_lapack broke backwards compatibility #18377

Closed
larsoner opened this issue Apr 27, 2023 · 41 comments
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.linalg
Milestone

Comments

@larsoner
Copy link
Member

larsoner commented Apr 27, 2023

Describe your issue.

In MNE-Python we run a pip-pre job with thhe latest scipy-wheels-nightly builds. The latest 1.11.0.dev0+1956.7c74503 pip-pre wheel appears to be buggy:

It worked with 1.11.0.dev0+1926.070b2a8 :

Reproducing Code Example

pip install $STD_ARGS --pre --extra-index-url "https://pypi.anaconda.org/scipy-wheels-nightly/simple" numpy scipy

Then some scipy.linalg stuff... can dig deeper if it's not obvious what the problem is from the traceback

Error message

TypeError: C function scipy.linalg.cython_blas.dasum has wrong signature
________ ERROR at setup of test_make_forward_solution_kit[testing_data] ________
mne/forward/tests/test_make_forward.py:381: in small_surf_src
    src = setup_source_space('sample', 'oct2', subjects_dir=subjects_dir,
<decorator-gen-115>:12: in setup_source_space
    ???
mne/source_space.py:1439: in setup_source_space
    s = _create_surf_spacing(surf, hemi, subject, stype, ico_surf,
mne/surface.py:1066: in _create_surf_spacing
    mmap = _compute_nearest(from_surf['rr'], ico_surf['rr'])
mne/surface.py:503: in _compute_nearest
    tree = _DistanceQuery(xhs, method=method)
mne/surface.py:525: in __init__
    from sklearn.neighbors import BallTree
/opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/sklearn/neighbors/__init__.py:8: in <module>
    from ._distance_metric import DistanceMetric
/opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/sklearn/neighbors/_distance_metric.py:4: in <module>
    from ..metrics import DistanceMetric as _DistanceMetric
/opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/sklearn/metrics/__init__.py:42: in <module>
    from . import cluster
/opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/sklearn/metrics/cluster/__init__.py:22: in <module>
    from ._unsupervised import silhouette_samples
/opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/sklearn/metrics/cluster/_unsupervised.py:23: in <module>
    from ..pairwise import pairwise_distances_chunked
/opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/sklearn/metrics/pairwise.py:40: in <module>
    from ._pairwise_distances_reduction import ArgKmin
/opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/sklearn/metrics/_pairwise_distances_reduction/__init__.py:89: in <module>
    from ._dispatcher import (
/opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/sklearn/metrics/_pairwise_distances_reduction/_dispatcher.py:11: in <module>
    from ._base import _sqeuclidean_row_norms32, _sqeuclidean_row_norms64
sklearn/metrics/_pairwise_distances_reduction/_base.pyx:1: in init sklearn.metrics._pairwise_distances_reduction._base
    ???
sklearn/utils/_cython_blas.pyx:1: in init sklearn.utils._cython_blas
    ???
E   TypeError: C function scipy.linalg.cython_blas.dasum has wrong signature (expected __pyx_t_5scipy_6linalg_11cython_blas_d (int *, __pyx_t_5scipy_6linalg_11cython_blas_d *, int *), got __pyx_t_5scipy_6linalg_11cython_blas_d (int const *, __pyx_t_5scipy_6linalg_11cython_blas_d const *, int const *))

SciPy/NumPy/Python version and system information

Platform                Linux-5.15.0-1035-azure-x86_64-with-glibc2.35
Python                  3.11.3 (main, Apr  6 2023, 07:55:46) [GCC 11.3.0]
Executable              /opt/hostedtoolcache/Python/3.11.3/x64/bin/python
├☑ numpy                1.25.0.dev0+1207.gafa98ddef (OpenBLAS 0.3.21 with 1 thread)
├☑ scipy                1.11.0.dev0+1926.070b2a8
@larsoner larsoner added the defect A clear bug or issue that prevents SciPy from being installed or used as expected label Apr 27, 2023
@larsoner
Copy link
Member Author

cc @andyfaff maybe you have some idea as you've been working through some linalg stuff lately

@ilayn
Copy link
Member

ilayn commented Apr 27, 2023

Yes there are a few changes happening simultaneously (new cython stuff and CI changes). I am also identifying them one by one on #18358 but most issues are now gone apart from a unicode problem for cython if you rebase to main

You have to clear the cache I think to get rid of the old Cython signatures.

@larsoner
Copy link
Member Author

Is it possible this is actually a sklearn problem in sklearn.utils._cython_blas? It looks like the difference in expected signature is a int * vs const int *

@ilayn
Copy link
Member

ilayn commented Apr 27, 2023

const addition is done on SciPy side very recently #18247.

@larsoner
Copy link
Member Author

Okay I'll open a scikit-learn issue to see if they want to deal with that at their end

@andyfaff
Copy link
Contributor

The wheel building infrastructure hasn't been touched for a while, so any problems with a wheel is probably coming from a different place. The work in the CI has been transferring jobs from azure to gha, but none of them re responsible for issuing wheels for public consumption.

@larsoner
Copy link
Member Author

Agreed #18247 is probably the culprit because the failure reported there is similar and the signature change is problematic. We'll see what the scikit-learn folks say, more likely they need to change something at their end to accommodate this signature change. I guess older SciPy wheels could have the old signature and newer wheels the new one so they'll have to be able/willing to use either of them.

@betatim
Copy link

betatim commented Apr 28, 2023

Is there a 160 character summary of the benefits of #18247? I had a quick look at the top comment in the PR but couldn't find benchmarks or other hint.

I think for existing scikit-learn releases it is going to be difficult to adjust to this change in the function signature.

For future scikit-learn releases we'd need a way to support both old and new versions of scipy (across this signature change). Does someone have an example of how to do this/existing code to look at?

@jjerphan
Copy link
Contributor

jjerphan commented Apr 28, 2023

Hi all, I think it would have been nice to synchronize with downstream libraries' maintainers before changing those FFI which have been stable for many years.

To me, those seemingly simple changes on those interfaces introduce a significant burden for scikit-learn's maintenance. It might also tbe the case for other libraries'.

Moreover (and independently of a maintenance perspective), I think some users' workflows will break if those changes are released.

Could you please revert this merge commit and could we synchronize and agree on a pathway to resolve #14262? Thank you. 🙏

@ilayn
Copy link
Member

ilayn commented Apr 28, 2023

I want to first understand what the problem would be. These don't change any argument types or number of arguments or their order. They only change the signatures. So are you saying that you redefine the signatures on the scikit-learn side?

These are not backported changes hence each SciPy is consistent within itself. So many things resolve when you invalidate cache and rebuild.

What am I missing here regarding downstream?

@ilayn
Copy link
Member

ilayn commented Apr 28, 2023

For dasum for example this is the new one

d dasum(const int *n, const d *dx, const int *incx)

replacing the old one

d dasum(int *n, d *dx, int *incx)

is this causing a problem ? Excuse my ignorance I am genuinely curious

@betatim
Copy link

betatim commented Apr 28, 2023

scikit-learn/scikit-learn#26290 (comment) has the error and steps to reproduce the problem.

I think what happens is that the scikit-learn nightly wheel is built with scipy 1.10.1 and I think it is then used with the latest scipy.

Another case where I am not sure what will happen is something like scikit-learn 1.2.2 where the wheel will have been built with a version of scipy that contains the old signature. Will the existing scikit-learn 1.2.2 wheel work with the next release of scipy that contains the new signatures?

@jjerphan
Copy link
Contributor

Also, if newer versions of SciPy featuring these changes were to be released, errors might be present with older versions of scikit-learn (i.e. scikit-learn<=1.2.2). In those cases, I do not think we could prevent or correct such errors. 😕

@ilayn
Copy link
Member

ilayn commented Apr 28, 2023

I think it goes the other way. Because in my limited understanding, *type to const *type conversion should be safe. Well a very big emphasis on should

I am not sure if this will leak into wheels but seems like a nightly build problem. But admittedly I can't quite make up my mind yet. Because once you compile the cython files they are not (or shouldn't be) necessarily dynamically linking to scipy but to the BLAS lib. But then I don't have an answer why this is happening. So clearly I don't know enough.

@lysnikolaou @rgommers any thoughts?

@lorentzenchr
Copy link
Contributor

@betatim
@rkern gave a summary here: #14262 (comment)

Because of the way Cython interprets what is an acceptable ndarray to pass to those arguments. Without const, Cython checks the writable flag and forbids read-only ndarrays. Adding const (is supposed to) just tell Cython to allow either read-only or writable ndarrays. So we check manually whether the specific subroutine is expecting to write to the passed array, and if not, then we should add the const.

@lorentzenchr
Copy link
Contributor

I was very happy to see the const added to the signatures. My limited understanding is that this is a build problem, not a runtime problem. So it should not be that hard to fix (on the scikit-learn side). Scipy now is (a tiny bit) more correct 😄

@ev-br
Copy link
Member

ev-br commented Apr 29, 2023

I'm running into this on a bog-standard dev setup on linux with mamba. Basically, git fetch upstream && git rebase upstream/main breaks any import from scipy.

@ilayn
Copy link
Member

ilayn commented Apr 29, 2023

@ev-br could you please try to build from fresh? The prebuilt files before the PR are typically conflicting.

@ev-br
Copy link
Member

ev-br commented Apr 29, 2023

Yes, $ git clean -xdf && python dev.py build does it, thank you.

@larsoner
Copy link
Member Author

larsoner commented May 1, 2023

Also, if newer versions of SciPy featuring these changes were to be released, errors might be present with older versions of scikit-learn (i.e. scikit-learn<=1.2.2

I think this is almost certainly going to be the case. I can check if people really want to know. But that would be too disruptive to the ecosystem I think. FYI it shows up on Windows now, too (haven't tested macOS):

https://dev.azure.com/mne-tools/mne-python/_build/results?buildId=24921&view=logs&jobId=305851a9-a7bb-55db-0042-7e2b6f48aa1c&j=305851a9-a7bb-55db-0042-7e2b6f48aa1c&t=4d37777d-f36a-53aa-9217-6386d15dddcd

My vote is to revert the int const * change until we can figure out a plan with downstream libraries on how to handle this without causing these problems.

@jjerphan
Copy link
Contributor

jjerphan commented May 1, 2023

My vote is to revert the int const * change until we can figure out a plan with downstream libraries on how to handle this without causing these problems.

+1

@ilayn
Copy link
Member

ilayn commented May 1, 2023

Indeed if this is causing issues I agree with that. But I don't see how future will solve this problem other than pinning scipy >= 1.11

@jjerphan
Copy link
Contributor

jjerphan commented May 1, 2023

Unfortunately, pinning scipy>=1.11 will not solve problems.

#14262 (comment) gives details about two failing configurations (there might be more).

@ilayn
Copy link
Member

ilayn commented May 1, 2023

True but at some point we have to bite the bullet by not supporting old SciPy hence the conda or pip install need to upgrade SciPy too otherwise we can't use const for our own use cases. We can backport this to past version too but I don't know how feasible it is @tylerjereddy any thoughts?

@h-vetinari
Copy link
Member

We can backport this to past version too but I don't know how feasible it is

Aside from feasibility, I think it would make the number of failure modes blow up even more.

at some point we have to bite the bullet by not supporting old SciPy hence the conda or pip install need to upgrade SciPy too

conda(-forge) can handle this much more easily (e.g. patching in a constraint scipy <1.11 to already published packages retroactively), so it shouldn't be the main concern. pip is harder because now libraries like sklearn would need to carry an upper bound for scipy at least for a while, so that when a breaking scipy release comes it doesn't blow up existing installs -- that is, unless sklearn throws away all scipy version compatibility and goes for scipy >=1.11 directly.

Assuming such a harsh break is not in the cards, such a path would be quite painful regardless when & how it's rolled out, so I'm wondering if we can turn it around by doing some C-API versioning like numpy does: e.g. if we encounter something compiled against a newer API version that what (an old) scipy has at runtime, then we error out. The main issue is that we'd need to leave these markers in the compiled sources somewhere and then check against them, and of course do that for a while in published versions before we can rely on it.

@ilayn
Copy link
Member

ilayn commented May 2, 2023

First we need confirmation that a sklearn (or any other package) built with new SciPy with const fails to work with an old SciPy at runtime before anything.

In sklearn case they apparently also modified the signatures which is a different story altogether.

@thomasjpfan
Copy link
Contributor

thomasjpfan commented May 2, 2023

In sklearn case they apparently also modified the signatures which is a different story altogether.

When possible, Scikit-learn prefers to use const T * to be compatible with read-only data. When we need to use SciPy's BLAS API, we cast the pointer to T * to work with SciPy's previous non-const signatures. Scikit-learn does not modify SciPy's BLAS signatures.

First we need confirmation that a sklearn (or any other package) built with new SciPy with const fails to work with an old SciPy at runtime before anything.

Using cibuildwheel, I built a scikit-learn wheel with SciPy 1.11.dev0. Everything works when 1.11.dev0 is installed. When I downgrade to 1.10.1, I get the following error:

TypeError: C function scipy.linalg.cython_blas.dasum has wrong signature
(expected __pyx_t_5scipy_6linalg_11cython_blas_d (int const *, __pyx_t_5scipy_6linalg_11cython_blas_d const *, int const *),
got __pyx_t_5scipy_6linalg_11cython_blas_d (int *, __pyx_t_5scipy_6linalg_11cython_blas_d *, int *))

Note, this is the reverse error from mne-python's failing test run:

TypeError: C function scipy.linalg.cython_blas.dasum has wrong signature
(expected __pyx_t_5scipy_6linalg_11cython_blas_d (int *, __pyx_t_5scipy_6linalg_11cython_blas_d *, int *),
got __pyx_t_5scipy_6linalg_11cython_blas_d (int const *, __pyx_t_5scipy_6linalg_11cython_blas_d const *, int const *))

@ilayn
Copy link
Member

ilayn commented May 2, 2023

Using cibuildwheel, I built a scikit-learn wheel with SciPy 1.11.dev0. Everything works when 1.11.dev0 is installed. When I downgrade to 1.10.1, I get the following error:

Thanks @thomasjpfan for verifying. CI machines always do some tricky stuff that we can never be sure.

that is, unless sklearn throws away all scipy version compatibility and goes for scipy >=1.11 directly.

Why would that be a throw-away? We do that all the time with our upstream with Cython or any other package, we don't enforce NumPy because there are multiple cases that lead to problems in the past for jumping the gun. I don't think we have to be such conservative everytime for every case.

By the way, I am not saying we should not revert this, we will do it anyways but at some point SciPy will have to do this. So the challenge here is to find a solution that is easy enough for downstream to accept. Not doing it is not a viable option and just way too problematic for us since it is becoming an issue.

The main issue is that we'd need to leave these markers in the compiled sources somewhere and then check against them, and of course do that for a while in published versions before we can rely on it.

Leaving markers and checking them is not an issue, we can do it. But in case of incompatibility, what would it say? Install a newer scipy version? That won't work because sklearn would still be compiled with the old nonconst signatures because we didn't release it due to compatibility issues. It will just keep on delaying things. At some point we have to address this. Unless we want to keep two copies of the signatures (which is also fine for me) but not very hopeful that it would push people to switch to the new one.

@ilayn
Copy link
Member

ilayn commented May 2, 2023

And having cython breaking changes, pythran breaking changes and this happening at the same weekend is really not helping either 😃

@rth
Copy link
Contributor

rth commented May 2, 2023

Maybe it's worth asking if Cython could have some way to optionally disable that runtime signature check in specific cases (e.g. for adding const)? From what I understand if the runtime check is relaxed, things might just work?

@larsoner
Copy link
Member Author

larsoner commented May 2, 2023

Opened a #18405 to revert the problematic commit until we find a better plan.

@rth agreed it seems like being more lenient on the signatures in the case of const is something that Cython could/should support ideally, even if it's opt-in.

@ilayn
Copy link
Member

ilayn commented May 2, 2023

I'm warming up to the idea that we should keep two copies of this file. And use some sort of cnp.import_array() type of machinery to use for selecting which one to import. I am not fully certain about the details yet but should not be difficult. Let's revert it for now but now that everybody is scanning here, we'll ping everyone when we have a workable plan. All feedback is welcome.

And again, please keep us in the loop when you are working on the signatures downstream. We should be able to move together as this affects almost every C-API using package, as I just painfully learned here.

@rgommers
Copy link
Member

rgommers commented May 2, 2023

The signature changes are reverted and I just started a run of the nightly wheel builds, so the problem for scikit-learn & co should disappear in an hour or so once the new nightlies are uploaded.

@rgommers rgommers changed the title BUG: Latest pip-pre Linux wheel has broken scipy.linalg BUG: const signature changes in cython_blas and cython_lapack broke backwards compatibility May 2, 2023
@rgommers
Copy link
Member

rgommers commented May 2, 2023

@ilayn this issue got quite long pretty quickly. How about we close it, and open a new issue with a good summary and a focus on re-introducing the const changes?

@betatim
Copy link

betatim commented May 5, 2023

Thanks for all the brain power to work this out!

A new issue sounds like a good idea, could you post a message here that links to it once it exists? I think if you just refer to this issue from the new one there won't be a notification to participants of this issue, and I'd be interested in following along.

@ilayn
Copy link
Member

ilayn commented May 5, 2023

How about we close it, and open a new issue with a good summary and a focus on re-introducing the const changes?

Yes I'll do that this evening, thanks everyone. And apologies for wreaking havoc downstream (sigh... again...)

@ilayn ilayn closed this as completed May 5, 2023
@jjerphan
Copy link
Contributor

jjerphan commented May 5, 2023

And apologies for wreaking havoc downstream (sigh... again...)

It's okay, this kind of mistake happens and can be fixed easily. 🙂

I would be happy participating to discussing to best introduce const-qualification while keeping retro compatibility.

@lysnikolaou
Copy link
Contributor

@ilayn Did you maybe get some time to open a new issue with the summary of the findings so far regarding this? Will you be able to do so or would you like me to maybe work on it?

@ilayn
Copy link
Member

ilayn commented May 15, 2023

I forgot. I took a note and will come back to this just trying to stabilize a few open PRs. Please go ahead if you have the idea already otherwise I'll catch up.

@h-vetinari
Copy link
Member

@lysnikolaou, are you still planning to open a new issue here? Or should we just reopen #14262 and continue there?

@lysnikolaou
Copy link
Contributor

@h-vetinari I was planning to wait until we have a clearer picture of what the Cython folks decide on cython/cython#5488 before opening a new issue.

Totally fine to open a new issue now though, if you feel it's needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.linalg
Projects
None yet
Development

No branches or pull requests