-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
I am using an imblearn pipeline to perform dimensionality reduction before model training. I would like try either a PCA or skipping the dimensionality reduction step completely (setting step to None). I am also tuning the number of components used for PCA.
My code throws an error during grid search when the dimensionality_reduction step is set to None but the dimensionality_reduction__n_components is set to a floating number.
Is it possible to allow passthrough of a transformer and ignore any setting of parameters associated with that step?
MWE
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from imblearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
# initialize data set and split
x, y = make_classification() # make dataset
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.30) # split into train and test
# initiaze hyperparameters
param_log = {}
param_log['dimensionality_reduction']=[PCA(), None] # dimensionality reduction options
param_log['dimensionality_reduction__n_components']= [0.8, 0.9] # number of components for PCA
# define pipeline
pipeline = Pipeline([('dimensionality_reduction', PCA()),
('classifier', LogisticRegression())])
gs_log = GridSearchCV(pipeline, param_log).fit(x_train, y_train) # trainExpected Results
No error is thrown. dimensionality_reduction__n_components parameter is ignored because dimensionality_reduction is not set.
Actual Results
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/joblib/externals/loky/process_executor.py", line 428, in _process_worker
r = call_item()
File "/usr/local/lib/python3.7/dist-packages/joblib/externals/loky/process_executor.py", line 275, in __call__
return self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.7/dist-packages/joblib/_parallel_backends.py", line 620, in __call__
return self.func(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/joblib/parallel.py", line 289, in __call__
for func, args, kwargs in self.items]
File "/usr/local/lib/python3.7/dist-packages/joblib/parallel.py", line 289, in <listcomp>
for func, args, kwargs in self.items]
File "/usr/local/lib/python3.7/dist-packages/sklearn/utils/fixes.py", line 216, in __call__
return self.function(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py", line 668, in _fit_and_score
estimator = estimator.set_params(**cloned_parameters)
File "/usr/local/lib/python3.7/dist-packages/sklearn/pipeline.py", line 188, in set_params
self._set_params("steps", **kwargs)
File "/usr/local/lib/python3.7/dist-packages/sklearn/utils/metaestimators.py", line 54, in _set_params
super().set_params(**params)
File "/usr/local/lib/python3.7/dist-packages/sklearn/base.py", line 258, in set_params
valid_params[key].set_params(**sub_params)
AttributeError: 'NoneType' object has no attribute 'set_params'Versions
Linux-5.10.133+-x86_64-with-Ubuntu-18.04-bionic
Python 3.7.14 (default, Sep 8 2022, 00:06:44)
[GCC 7.5.0]
NumPy 1.21.6
SciPy 1.7.3
Scikit-Learn 1.0.2
Imbalanced-Learn 0.8.1