Skip to content

Error when attempting to passthrough transformer step if tuning transformer parameter #930

@blaverty

Description

@blaverty

I am using an imblearn pipeline to perform dimensionality reduction before model training. I would like try either a PCA or skipping the dimensionality reduction step completely (setting step to None). I am also tuning the number of components used for PCA.

My code throws an error during grid search when the dimensionality_reduction step is set to None but the dimensionality_reduction__n_components is set to a floating number.

Is it possible to allow passthrough of a transformer and ignore any setting of parameters associated with that step?

MWE

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from imblearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression

# initialize data set and split
x, y = make_classification() # make dataset
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.30) # split into train and test

# initiaze hyperparameters 
param_log = {}
param_log['dimensionality_reduction']=[PCA(), None] # dimensionality reduction options
param_log['dimensionality_reduction__n_components']= [0.8, 0.9] # number of components for PCA

# define pipeline
pipeline = Pipeline([('dimensionality_reduction', PCA()), 
                     ('classifier', LogisticRegression())]) 

gs_log = GridSearchCV(pipeline, param_log).fit(x_train, y_train) # train

Expected Results

No error is thrown. dimensionality_reduction__n_components parameter is ignored because dimensionality_reduction is not set.

Actual Results

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/joblib/externals/loky/process_executor.py", line 428, in _process_worker
    r = call_item()
  File "/usr/local/lib/python3.7/dist-packages/joblib/externals/loky/process_executor.py", line 275, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.7/dist-packages/joblib/_parallel_backends.py", line 620, in __call__
    return self.func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/joblib/parallel.py", line 289, in __call__
    for func, args, kwargs in self.items]
  File "/usr/local/lib/python3.7/dist-packages/joblib/parallel.py", line 289, in <listcomp>
    for func, args, kwargs in self.items]
  File "/usr/local/lib/python3.7/dist-packages/sklearn/utils/fixes.py", line 216, in __call__
    return self.function(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py", line 668, in _fit_and_score
    estimator = estimator.set_params(**cloned_parameters)
  File "/usr/local/lib/python3.7/dist-packages/sklearn/pipeline.py", line 188, in set_params
    self._set_params("steps", **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/sklearn/utils/metaestimators.py", line 54, in _set_params
    super().set_params(**params)
  File "/usr/local/lib/python3.7/dist-packages/sklearn/base.py", line 258, in set_params
    valid_params[key].set_params(**sub_params)
AttributeError: 'NoneType' object has no attribute 'set_params'

Versions

Linux-5.10.133+-x86_64-with-Ubuntu-18.04-bionic
Python 3.7.14 (default, Sep 8 2022, 00:06:44)
[GCC 7.5.0]
NumPy 1.21.6
SciPy 1.7.3
Scikit-Learn 1.0.2
Imbalanced-Learn 0.8.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: BugIndicates an unexpected problem or unintended behaviorType: Question

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions