BUG: DataFrame.drop() fails when `columns=` is given as tuple #43978

JBGreisman · 2021-10-11T19:56:37Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(12).reshape(3, 4),
                  columns=['A', 'B', 'C', 'D'])

print(df.drop(columns=list(["A", "B"])))     # <-- Works
print(df.drop(columns=np.array(["A", "B"]))) # <-- Works
print(df.drop(columns=tuple(["A", "B"])))    # <-- Fails with KeyError

Issue Description

The documentation says that the columns= argument of DataFrame.drop() can take a single label or list-like, but it fails when given a tuple with more than column name. This method seems to work when the exact same column labels are provided as a list or a np.ndarray.

Just an observation -- a tuple with len()==1 does seem to work successfully here.

Expected Behavior

For the above example, df.drop(columns=("A", "B")) should produce the same output as columns=["A", "B"] or columns=np.array(["A", "B"]), resulting in the following DataFrame:

Installed Versions

INSTALLED VERSIONS

commit : 73c6825
python : 3.8.11.final.0
python-bits : 64
OS : Darwin
OS-release : 17.7.0
Version : Darwin Kernel Version 17.7.0: Fri Oct 30 13:34:27 PDT 2020; root:xnu-4570.71.82.8~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.3.3
numpy : 1.21.2
pytz : 2021.1
dateutil : 2.8.2
pip : 21.0.1
setuptools : 58.0.4
Cython : None
pytest : 6.2.5
hypothesis : None
sphinx : 4.2.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.28.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.7.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

The text was updated successfully, but these errors were encountered:

erfannariman · 2021-10-11T21:55:04Z

take

erfannariman · 2021-10-11T23:12:25Z

As already described in the draf PR, the ambiguity lies in the fact that we can use tuples to perform slicing on couple levels, while they can also be used as column names:

df = pd.DataFrame(np.arange(6).reshape(3, 2), columns=[("A", "B"), ("C", "D")])
print(df.drop(columns=("A", "B")))

   (C, D)
0       1
1       3
2       5

Or MultiIndex:

idx = pd.MultiIndex.from_tuples([("A", "B"), ("C", "D")])
df = pd.DataFrame(np.arange(6).reshape(3, 2), columns=idx)

print(df)
   A  C
   B  D
0  0  1
1  2  3
2  4  5


df.drop(columns=("A", "B"))
   C
   D
0  1
1  3
2  5

So it's not clear for me if this is expected behaviour or a bug.

attack68 · 2021-10-12T05:26:42Z

In another issue I have proposed prohibiting tuple-sequence input, since a tuple has the ambiguous possibility of an individual label. If a list is used there is no ambiguity. This has greater implications in the .loc method, for example, but it would be useful to be consistent here also. Therefore, I believe this should fail if the input is given as a tuple, and a tuple label does not exist.

Can find the exact post but this is similar: #42329 (comment)

erfannariman · 2021-10-12T08:16:33Z

In another issue I have proposed prohibiting tuple-sequence input

Would tuples as column names also fall under this?

attack68 · 2021-10-12T16:52:15Z

In another issue I have proposed prohibiting tuple-sequence input

Would tuples as column names also fall under this?
   (C, D)
0       1
1       3
2       5

Index Labels are allowed to be tuples, they are not allowed to be lists.

So:

[(0,1)] is a list of one label, which is (0,1).
(0,1) is a single index label, which may be ambigously a single level identifier or two level identifers.
(0, 1) is not a sequence of the two labels 0,1 and is not equivalent to [0, 1].

If you permit (0, 1) as a a sequence of two labels it opens up a host of unncessary problems when it is more explicit, anyway, to use lists.
In my opinion.

jreback · 2021-10-21T02:03:44Z

tuples are for Multiindexes; yes if its unambiguous you could just listify it but -1 on changing here as not worth it.

JBGreisman added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 11, 2021

github-actions bot assigned erfannariman Oct 11, 2021

erfannariman mentioned this issue Oct 11, 2021

BUG: DataFrame.drop fails when columns argument given tuple #43985

Closed

4 tasks

mroeschke added Needs Discussion Requires discussion from core team before further action Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 16, 2021

jorisvandenbossche mentioned this issue Mar 31, 2023

BUG: pd.isnull treats list and tuple input differently #52283

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DataFrame.drop() fails when `columns=` is given as tuple #43978

BUG: DataFrame.drop() fails when `columns=` is given as tuple #43978

JBGreisman commented Oct 11, 2021

INSTALLED VERSIONS

erfannariman commented Oct 11, 2021

erfannariman commented Oct 11, 2021 •

edited

attack68 commented Oct 12, 2021 •

edited

erfannariman commented Oct 12, 2021

attack68 commented Oct 12, 2021

jreback commented Oct 21, 2021

BUG: DataFrame.drop() fails when columns= is given as tuple #43978

BUG: DataFrame.drop() fails when columns= is given as tuple #43978

Comments

JBGreisman commented Oct 11, 2021

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

erfannariman commented Oct 11, 2021

erfannariman commented Oct 11, 2021 • edited

attack68 commented Oct 12, 2021 • edited

erfannariman commented Oct 12, 2021

attack68 commented Oct 12, 2021

jreback commented Oct 21, 2021

BUG: DataFrame.drop() fails when `columns=` is given as tuple #43978

BUG: DataFrame.drop() fails when `columns=` is given as tuple #43978

erfannariman commented Oct 11, 2021 •

edited

attack68 commented Oct 12, 2021 •

edited