You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Output of OneHotEncoder fed into CCA throws ValueError: The output of the '<pipeline_name>' transformer should be 2D (scipy matrix, array, or pandas DataFrame).
#16600
Open
izabella197 opened this issue
Feb 29, 2020
· 3 comments
Output of OneHotEncoder fed into CCA throws ValueError: The output of the '<pipeline_name>' transformer should be 2D (scipy matrix, array, or pandas DataFrame)
Notes:
Please note that this is not an issue if the above is done not in a pipeline ie. I call fit_transform on the OneHotEncoder and then use the output with CCA.fit_transform. The issue occurs if the entire Column Transformer pipeline is used.
Steps/Code to Reproduce
Create a pipeline transformer, where input is one hot encoded with OneHotEncoder, its output is fed to CCA.
Run fit_transform on the data using the pipeline.
Should return a numpy array with number of features specified in CCA
Should produce a numpy array such that if X_test.shape is called the output should be
>>>(<num examples>, 300)
Actual Results
D:\Users\<USERNAME>\Anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py in fit_transform(self, X, y)
536
537 self._update_fitted_transformers(transformers)
--> 538 self._validate_output(Xs)
539
540 return self._hstack(list(Xs))
D:\Users\<USERNAME>\Anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py in _validate_output(self, result)
400 raise ValueError(
401 "The output of the '{0}' transformer should be 2D (scipy "
--> 402 "matrix, array, or pandas DataFrame).".format(name))
403
404 def _validate_features(self, n_features, feature_names):
ValueError: The output of the 'one_hot_encode' transformer should be 2D (scipy matrix, array, or pandas DataFrame).
Hmmmm. Indeed CCA transform methods don't follow our usual API conventions
in what they return. Adding something onto the pipeline to clean up the CCA
output would work, but we should probably reconsider whether CCA's output
is how we want it.
class MakeNumpy(BaseEstimator, TransformerMixin):
def __init__(self):
pass
def fit(self, X, y=None):
return self
def transform(self, X, y=None):
# Since X is a tuple
return np.array(X[0])
Describe the bug
Output of OneHotEncoder fed into CCA throws ValueError: The output of the '<pipeline_name>' transformer should be 2D (scipy matrix, array, or pandas DataFrame)
Notes:
Please note that this is not an issue if the above is done not in a pipeline ie. I call fit_transform on the OneHotEncoder and then use the output with CCA.fit_transform. The issue occurs if the entire Column Transformer pipeline is used.
Steps/Code to Reproduce
Expected Results
Should produce a numpy array such that if
X_test.shape
is called the output should beActual Results
Versions
(The reason that these versions are used and not newest is due to a memmapping problem experienced in windows)
The text was updated successfully, but these errors were encountered: