Skip to content

Inconsistent column references with columntransformer for text/numeric cols #14056

@evanmiller29

Description

@evanmiller29

Description

A small two column dataset with a text column and a numeric column requires inconsistent list notation usage.

I found it raised here but I found the errors quite confusing even once I solved the issue.

Steps/Code to Reproduce

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import Normalizer
from sklearn.compose import ColumnTransformer

dataset = pd.DataFrame({"a":["word gone wild","gone with wind"],
                        "c":[1,2]})

clmn = ColumnTransformer([
    ("tfidf", TfidfVectorizer(min_df=0), "a"),
    ("norm", Normalizer(norm='l1'), "c") #errors
    #("norm", Normalizer(norm='l1'), ["c"]) #code executes as expected
])
clmn.fit_transform(dataset)

#### Expected Results
```python
array([[0.44943642, 0.6316672 , 0.        , 0.        , 0.6316672 ,
        1.        ],
       [0.44943642, 0.        , 0.6316672 , 0.6316672 , 0.        ,
        1.        ]])

Actual Results

ValueError: 1D data passed to a transformer that expects 2D data. Try to specify the column selection as a list of one item instead of a scalar.

Versions

System:
python: 3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\evanmiller\AppData\Local\Continuum\anaconda3\envs\capco\python.exe
machine: Windows-10-10.0.16299-SP0

BLAS:
macros:
lib_dirs:
cblas_libs: cblas

Python deps:
pip: 19.1.1
setuptools: 41.0.1
sklearn: 0.20.3
numpy: 1.16.4
scipy: 1.3.0
Cython: None
pandas: 0.24.2

I was wondering if there was any explanation as to why this might happen? The single text column must have no [/] around it, while the other column requires it.

If I understood more about why it's happening I'd be happy to write a more informative error message if you think that's the right call.

Evan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions