-
-
Couldn't load subscription status.
- Fork 26.4k
Description
Description
A small two column dataset with a text column and a numeric column requires inconsistent list notation usage.
I found it raised here but I found the errors quite confusing even once I solved the issue.
Steps/Code to Reproduce
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import Normalizer
from sklearn.compose import ColumnTransformer
dataset = pd.DataFrame({"a":["word gone wild","gone with wind"],
"c":[1,2]})
clmn = ColumnTransformer([
("tfidf", TfidfVectorizer(min_df=0), "a"),
("norm", Normalizer(norm='l1'), "c") #errors
#("norm", Normalizer(norm='l1'), ["c"]) #code executes as expected
])
clmn.fit_transform(dataset)
#### Expected Results
```python
array([[0.44943642, 0.6316672 , 0. , 0. , 0.6316672 ,
1. ],
[0.44943642, 0. , 0.6316672 , 0.6316672 , 0. ,
1. ]])Actual Results
ValueError: 1D data passed to a transformer that expects 2D data. Try to specify the column selection as a list of one item instead of a scalar.Versions
System:
python: 3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\evanmiller\AppData\Local\Continuum\anaconda3\envs\capco\python.exe
machine: Windows-10-10.0.16299-SP0
BLAS:
macros:
lib_dirs:
cblas_libs: cblas
Python deps:
pip: 19.1.1
setuptools: 41.0.1
sklearn: 0.20.3
numpy: 1.16.4
scipy: 1.3.0
Cython: None
pandas: 0.24.2
I was wondering if there was any explanation as to why this might happen? The single text column must have no [/] around it, while the other column requires it.
If I understood more about why it's happening I'd be happy to write a more informative error message if you think that's the right call.
Evan