Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAINT] Resolve sklearn 1.4.dev0 related failure of decoder test #4132

Closed
ymzayek opened this issue Dec 6, 2023 · 9 comments · Fixed by #4188
Closed

[MAINT] Resolve sklearn 1.4.dev0 related failure of decoder test #4132

ymzayek opened this issue Dec 6, 2023 · 9 comments · Fixed by #4188
Labels
Maintenance This issue is related to maintenance work.

Comments

@ymzayek
Copy link
Member

ymzayek commented Dec 6, 2023

The failure still seen here in the numpy2 job: test_decoder_binary_classification_with_masker_object
is actually related to an inconsistency between nilearn and scikit-learn dev version from their nightly releases which we have to use to test against numpy v2.0. All the numpy deprecations so far are dealt with in this PR so I'd rather merge this and deal with the sklearn deprecation separately

Originally posted by @ymzayek in #4130 (comment)

@bthirion
Copy link
Member

bthirion commented Dec 6, 2023

OK. Do you know whether the sklearn team is working on it ?

@ymzayek
Copy link
Member Author

ymzayek commented Dec 7, 2023

Not sure it's a bug from them necessarily. They are now using a private attribute called _estimator_type to do some checks when score is called from an estimator. This attribute can have the values of "regressor" and "classifier". Our Decoder class objects should have type "classifier" but since they inherit from _BaseDecoder which inherits from sklearn's LinearRegression they actually have _estimator_type="regressor".

@ymzayek
Copy link
Member Author

ymzayek commented Dec 7, 2023

@glemaitre wondering if you have any input on the change on the scikit-learn side?

@glemaitre
Copy link
Contributor

We change the internal scoring with a refactoring. Concretely, we check the _estimator_type to ensure that you call a meaningful response method (i.e. predict, predict_proba, decision_function). If you decoded is a regressor then, we only accept to call predict because we don't have other use case in scikit-learn.

This said, I need to understand if the failure would be a real regression or more a design issue. One thing that might potentially make sense in scikit-learn would be to be slightly more leninent and change the else statement with elif is_regressor(self) and do not make any check for other models.

I am under the impression that this will not helped in your case however. Could you tell me what is the reason for calling decision_function on a regressor?

@glemaitre
Copy link
Contributor

Looking at the decoder class, I would overwrite _estimator_type to classifier and for DecoderRegressor should use regressor.

Perfectly, I assume that _estimator_type should be delegated to the _estimator_type from the estimator attribute using a property. But the fact that you inherit from LinearRegression will not allow you to do that.

I don't know enough the code right now to know what is the reason to inherate from LinearRegression. I assume that you want to take profit from some functionalities and I am wondering if there not something better to do here. But I need to go through the code to be sure.

@glemaitre
Copy link
Contributor

I looked a bit closer, I am not sure that this is necessary to inherit indeed. LinearRegression only define a fit and __init__ that you are redefining. Then it inherate itself from LinearModel but it seems that you are not using the private method define also.

So I am wondering if it would be better to remove the inheritance from LinearRegression and instead add:

class Decoder(ClassifierMixin, _BaseDecoder):
    ...

class DecoderRegressor(MultiOutputMixin, RegressorMixin, _BaseDecoder):
    ...

On thing weird is to define a public def decision_function in _BaseDecoder because it is the same than predict for regressor. Typically, in scikit-learn we would define _decision_function that can be used both in the Decoder and DecoderRegressor but only the Decoder will define decision_function.

However, going in this direction will break some backward compatibility but it will be compatible with the scikit-learn rules.

@ymzayek
Copy link
Member Author

ymzayek commented Dec 7, 2023

@glemaitre thanks for the feedback! I'm not sure about the original design intention of some of these choices so I'll have to do some tests to see if we can change the inheritance and I'll check on the defined decision_function in our code as well.

@glemaitre
Copy link
Contributor

You can ping on your PR. I don't mind reviewing if you want.

@Remi-Gau Remi-Gau added the Maintenance This issue is related to maintenance work. label Dec 12, 2023
@ymzayek
Copy link
Member Author

ymzayek commented Jan 4, 2024

The setup that we have is most likely related to having to extract data from nifti images into a compatible 2D array of n_samples x n_features using a mask or masker object. I think LinearRegression is inherited in _BaseDecoder in order to always call our predict function which does the necessary transformation from the mask and handles both regression and classification. With changing the inheritances or just simply setting _estimator_type to "classifier" for Decoder class we run into the problem of calling our decision_function without this extraction in some cases depending on the scoring chosen (the default "roc_auc" for example will fail).

BTW this works in the stable versions of nilearn and scikit-learn because _ThresholdScorer in sklearn has the lines

        if is_regressor(clf):
            y_pred = method_caller(clf, "predict", X)

along with the inheritance of the LinearRegression which makes this statement True for the Decoder.

This hack doesn't work in the dev version of scikit-learn where all the scorer classes inheriting from _BaseScorer have been refactored into _Scorer

I'll look into how to best refactor our code to properly handle the inheritances, transformation of our data, and be compatible with the upcoming changes in scikit-learn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Maintenance This issue is related to maintenance work.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants