Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST Extend tests for scipy.sparse.*array in sklearn/cluster/tests/test_spectral.py #27161

Merged
merged 10 commits into from Nov 27, 2023

Conversation

bharatr21
Copy link
Contributor

@bharatr21 bharatr21 commented Aug 25, 2023

Reference Issues/PRs

Towards #27090

What does this implement/fix? Explain your changes.

Extends tests for scipy.sparse.*array

Any other comments?

I am aware of the test failures, a sample failure is this one:

FAILED sklearn/cluster/tests/test_spectral.py::test_spectral_clustering[kmeans-arpack-csr_array] - AttributeError: 'numpy.ndarray' object has no attribute 'getA1'

All the failures seem to be occurring due to an upstream issue in scipy here

    def _laplacian_sparse(graph, normed, axis, copy, form, dtype, symmetrized):
        # The keyword argument `form` is unused and has no effect here.
        del form
    
        if dtype is None:
            dtype = graph.dtype
    
        needs_copy = False
        if graph.format in ('lil', 'dok'):
            m = graph.tocoo()
        else:
            m = graph
            if copy:
                needs_copy = True
    
        if symmetrized:
            m += m.T.conj()
    
>       w = m.sum(axis=axis).getA1() - m.diagonal()
E       AttributeError: 'numpy.ndarray' object has no attribute 'getA1'

Should we raise an issue/bug upstream in scipy? Or attempt some other alternative fix from our end?

@github-actions
Copy link

github-actions bot commented Aug 25, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 20a61a7. Link to the linter CI: here

@jjerphan jjerphan changed the title TST: Extend tests for scipy.sparse.*array in sklearn/cluster/tests/test_spectral.py TSTExtend tests for scipy.sparse.*array in sklearn/cluster/tests/test_spectral.py` Aug 27, 2023
@jjerphan jjerphan changed the title TSTExtend tests for scipy.sparse.*array in sklearn/cluster/tests/test_spectral.py` TST Extend tests for scipy.sparse.*array in sklearn/cluster/tests/test_spectral.py Aug 27, 2023
@ogrisel
Copy link
Member

ogrisel commented Aug 28, 2023

Should we raise an issue/bug upstream in scipy? Or attempt some other alternative fix from our end?

No, this getA1 function as probably not kept in the new sparse array API on purpose: its name is really not explicit. We need to understand how this same operation can be achieved with the new scipy sparse array API and whether the new code would still work with the old scipy sparse matrix datastructures. If not, we will have to branch the code to handle backward compat as discussed in #26418 (comment) but I would like to make sure that there is no clean way to write code that works both the new and old API first.

@ogrisel
Copy link
Member

ogrisel commented Aug 28, 2023

I believe that one way to achieve that would be to rewrite:

w = m.sum(axis=axis).getA1() - m.diagonal()

as:

# np.asarray + ravel is used for backward compat for scipy sparse matrix while making
# it also work with the scipy sparse array API.
w = np.asarray(m.sum(axis=axis)).ravel() - m.diagonal()

@ogrisel
Copy link
Member

ogrisel commented Aug 28, 2023

Since this PR will actually fix the code of the sklearn library (and not just the tests), it will require a dedicated entry in the changelog (doc/whats_new/v1.4.rst) to document which estimators are updated, similar to what was done for the NMF estimators in #27100.

@bharatr21
Copy link
Contributor Author

I believe that one way to achieve that would be to rewrite:

w = m.sum(axis=axis).getA1() - m.diagonal()

as:

# np.asarray + ravel is used for backward compat for scipy sparse matrix while making
# it also work with the scipy sparse array API.
w = np.asarray(m.sum(axis=axis)).ravel() - m.diagonal()

If we decide to rewrite this, won't this still need to be done in scipy itself? (As the _laplacian_sparsemethod itself is coming from scipy)

@ogrisel
Copy link
Member

ogrisel commented Aug 28, 2023

If we decide to rewrite this, won't this still need to be done in scipy itself? (As the _laplacian_sparsemethod itself is coming from scipy)

Indeed you are right. I read your analysis too quickly. Apparently, this code is still the same in the main branch of scipy:

https://github.com/scipy/scipy/blob/main/scipy/sparse/csgraph/_laplacian.py#L459

and I could not find any related issue by doing a quick search. Let me report it there.

@ogrisel
Copy link
Member

ogrisel commented Aug 28, 2023

Done in scipy/scipy#19149. @bharatr21 depending on the feedback of scipy maintainers, feel free to open a PR upstream.

Once accepted we will have to see how to do a backport or workaround in scikit-learn.

@bharatr21
Copy link
Contributor Author

Done in scipy/scipy#19149. @bharatr21 depending on the feedback of scipy maintainers, feel free to open a PR upstream.

Once accepted we will have to see how to do a backport or workaround in scikit-learn.

Raised the PR: scipy/scipy#19156. Let's wait and see what happens there.
In the meantime, what would be the steps or procedure to backport it here? A bump of the scipy version would be enough once it's released right?

@bharatr21
Copy link
Contributor Author

bharatr21 commented Sep 5, 2023

UPDATE: The above PR in scipy/scipy#19156 is now merged
cc @ogrisel for next steps here

@ogrisel
Copy link
Member

ogrisel commented Sep 6, 2023

I don't think it's worth backporting a temporary fix to scikit-learn. I think we can just mark the tests as xfail based on the version of scipy (that is sp_version < parse_version("1.12.0")) and assume it will be fixed in 1.12.0 (hence not xfail).

@ogrisel
Copy link
Member

ogrisel commented Sep 6, 2023

Note that is probably easier to use pytest.xfail from within the test rather than attempting to use @pytest.mark.xfail with a condition:

https://docs.pytest.org/en/6.2.x/skipping.html#xfail-mark-test-functions-as-expected-to-fail

We should only call pytest.xfail when the scipy version is lower than 1.12 and the sparse container is a sparse array (not a sparse matrix).

@ogrisel
Copy link
Member

ogrisel commented Sep 6, 2023

And by the way, thanks @bharatr21 for fixing the bug upstream!

@glemaitre
Copy link
Member

Uhm, I see that we have the same problem that I earlier encounter today (kudos to @bharatr21 for fixing the bug upstream): #27240

@ogrisel I was proposing to vendor the _laplacian.py file since this a self contained file (we did vendor it in the past) and use to import from our fixes.py. I am not 100% settle since it might not have a huge impact but this is still kind of an easy fix that impact all the spectral algorithm.

@ogrisel
Copy link
Member

ogrisel commented Sep 13, 2023

I am fine with this option then.

@bharatr21
Copy link
Contributor Author

Uhm, I see that we have the same problem that I earlier encounter today (kudos to @bharatr21 for fixing the bug upstream): #27240

@ogrisel I was proposing to vendor the _laplacian.py file since this a self contained file (we did vendor it in the past) and use to import from our fixes.py. I am not 100% settle since it might not have a huge impact but this is still kind of an easy fix that impact all the spectral algorithm.

Don't totally get this, what does "vendoring" a file mean? Importing some specific version in fixes.py ?

@ogrisel
Copy link
Member

ogrisel commented Sep 14, 2023

Making a local copy of the full file in scikit-learn. (Typically under sklear.externals._private_module_name) and exposing the fraction the api we want to use under sklearn.fixes. we do this for the parse_version method of the packaging module for instance.

@ogrisel
Copy link
Member

ogrisel commented Sep 14, 2023

Alternatively, we could skip all spectral related tests with scipy sparse array based on the version of scipy.

@glemaitre
Copy link
Member

The vendoring is already going in #27240
We could try to merge this PR quickly and @bharatr21 could benefit from it.

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo an potential changelog entry adaptation. Thank you @bharatr21.

The fix mentioned by @glemaitre has been integrated in between, resolving the failure here after merging main.

@bharatr21
Copy link
Contributor Author

LGTM modulo an potential changelog entry adaptation. Thank you @bharatr21.

The fix mentioned by @glemaitre has been integrated in between, resolving the failure here after merging main.

May I know in which changelog I should make the entry? Is it still 1.3.x or 1.4?

@jjerphan
Copy link
Member

jjerphan commented Nov 6, 2023

This section from 1.4's.

@bharatr21
Copy link
Contributor Author

bharatr21 commented Nov 7, 2023

This section from 1.4's.

Still not sure what to put in the changelog since these 2 entries are already present

- :func:`manifold.spectral_embedding` in :pr:`27240` by :user:`Yao Xiao <Charlie-XIAO>`;
- :func:`manifold.SpectralEmbedding` in :pr:`27240` by :user:`Yao Xiao <Charlie-XIAO>`;

@jjerphan please re-review the changelog entry (I'm not sure if it's necessary since there's no code changes?)

@bharatr21 bharatr21 closed this Nov 7, 2023
@bharatr21 bharatr21 reopened this Nov 7, 2023
Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

We probably are going to adapt the changelog entry anyway.

@jjerphan jjerphan added the Waiting for Second Reviewer First reviewer is done, need a second one! label Nov 19, 2023
@glemaitre glemaitre self-requested a review November 19, 2023 18:36
@jjerphan jjerphan removed the Waiting for Second Reviewer First reviewer is done, need a second one! label Nov 24, 2023
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @bharatr21

@glemaitre glemaitre enabled auto-merge (squash) November 27, 2023 07:23
@glemaitre glemaitre merged commit 1b7b0a1 into scikit-learn:main Nov 27, 2023
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants