Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Confound removal with newer sklearn versions results in too many user warnings #152

Closed
fraimondo opened this issue May 5, 2022 · 3 comments
Labels
bug Something isn't working Priority: High High Priority Issue
Milestone

Comments

@fraimondo
Copy link
Contributor

Describe the bug
A clear and concise description of what the bug is. Include the error message in detail.

A new version of scikit-learn instoduced a check for feature names. With this new version, any julearn model with confound removal will issue too many warnings like this:

/Users/fraimondo/anaconda3/envs/julearn/lib/python3.8/site-packages/sklearn/base.py:443: UserWarning: X has feature names, but LinearRegression was fitted without feature names
  warnings.warn(

To Reproduce
Steps to reproduce the behavior:

"""
Return Confounds in Confound Removal
====================================

In most cases confound removal is a simple operation.
You regress out the confound from the features and only continue working with
these new confound removed features. This is also the default setting for
julearn's `remove_confound` step. But sometimes you want to work with the
confound even after removing it from the features. In this example, we
will discuss the options you have.

"""
# Authors: Sami Hamdan <s.hamdan@fz-juelich.de>
#
# License: AGPL
from sklearn.datasets import load_diabetes  # to load data
from julearn.transformers import ChangeColumnTypes
from julearn import run_cross_validation
import warnings

# load in the data
df_features, target = load_diabetes(return_X_y=True, as_frame=True)


###############################################################################
# First, we can have a look at our features.
# You can see it includes
# Age, BMI, average blood pressure (bp) and 6 other measures from s1 to s6
# Furthermore, it includes sex which will be considered as a confound in
# this example.
#
print('Features: ', df_features.head())

###############################################################################
# Second, we can have a look at the target
print('Target: ', target.describe())

###############################################################################
# Now, we can put both into one DataFrame:
data = df_features.copy()
data['target'] = target

###############################################################################
# In the following we will explore different settings of confound removal
# using Julearns pipeline functionalities.
#
# Confound Removal Typical Use Case
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# Here, we want to deconfound the features and not include the confound as a
# feature into our last model.
# Afterwards, we will transform our features with a pca and run
# a linear regression.
#
feature_names = list(df_features.drop(columns='sex').columns)

scores, model = run_cross_validation(
    X=feature_names, y='target', data=data,
    confounds='sex', model='linreg', problem_type='regression',
    preprocess_X=['remove_confound', 'pca'],
    return_estimator='final')

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

System (please complete the following information):

  • OS: [e.g. macOS / Linux / Windows]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Workaround for the moment:

with warnings.catch_warnings():
    warnings.simplefilter("once", lineno=443)
    scores, model = run_cross_validation(
        X=feature_names, y='target', data=data,
        confounds='sex', model='linreg', problem_type='regression',
        preprocess_X=['remove_confound', 'pca'],
        return_estimator='final')

@fraimondo fraimondo added the bug Something isn't working label May 5, 2022
@fraimondo
Copy link
Contributor Author

Solution to use when joblib is used:

import sys

if not sys.warnoptions:
    import os, warnings
    warnings.simplefilter("ignore") # Change the filter in this process
    os.environ["PYTHONWARNINGS"] = "ignore" # Also affect subprocesses

@fraimondo
Copy link
Contributor Author

@samihamdan Is this fixed for the moment? Will it be fixed for 0.3.0?

@fraimondo fraimondo added this to the v0.3.0 milestone Jul 21, 2022
@fraimondo fraimondo added the Priority: High High Priority Issue label Jul 21, 2022
@fraimondo
Copy link
Contributor Author

solved in #154 and #183

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Priority: High High Priority Issue
Projects
None yet
Development

No branches or pull requests

1 participant