New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: DataFrame dropna accepting multiple axes #20987

Closed
kunalgosar opened this Issue May 9, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@kunalgosar
Contributor

kunalgosar commented May 9, 2018

Code Sample, a copy-pastable example if possible

This is the relevant code extract from pandas source:

if isinstance(axis, (tuple, list)):
       result = self
       for ax in axis:
           result = result.dropna(how=how, thresh=thresh, subset=subset, axis=ax)

This is the output from the function call:

In [7]: df
Out[7]: 
     A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1
2  NaN  NaN NaN  5

In [8]: df.dropna(axis=[0, 1], subset=['A', 'C'])
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-8-4c4a476386f3> in <module>()
----> 1 df.dropna(axis=[0, 1], subset=['A', 'C'])

~/dev/pandas/pandas/core/frame.py in dropna(self, axis, how, thresh, subset, inplace)
   4265             for ax in axis:
   4266                 result = result.dropna(how=how, thresh=thresh, subset=subset,
-> 4267                                        axis=ax)
   4268         else:
   4269             axis = self._get_axis_number(axis)

~/dev/pandas/pandas/core/frame.py in dropna(self, axis, how, thresh, subset, inplace)
   4276                 check = indices == -1
   4277                 if check.any():
-> 4278                     raise KeyError(list(np.compress(check, subset)))
   4279                 agg_obj = self.take(indices, axis=agg_axis)
   4280 

KeyError: ['A', 'C']

Problem description

Subset selects columns/rows from the "other" axis. Passing in the same subset of labels to both axis calls does not really make sense. Unless the subset labels are present on both axes, the function will always throw a KeyError.

I would expect that subset can take in a list of subsets for each axis, when axis is_list_like. If this is agreeable, I'm happy to submit a PR with this change.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: bd4332f
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0rc2+13.gbd4332f4b
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@kunalgosar kunalgosar changed the title from subset parameter in dropna function does not make sense for multiple axis to DataFrame dropna function's subset parameter does not make sense for multiple axis May 9, 2018

@jreback

This comment has been minimized.

Contributor

jreback commented May 9, 2018

rather than change this, let's deprecate passing multiple axes, we don't do this for any other pandas functions.

@jreback jreback changed the title from DataFrame dropna function's subset parameter does not make sense for multiple axis to DEPR: DataFrame dropna accepting multiple axes May 9, 2018

@jreback jreback added the Deprecate label May 9, 2018

@jreback jreback added this to the 0.24.0 milestone May 9, 2018

@kunalgosar

This comment has been minimized.

Contributor

kunalgosar commented May 9, 2018

Opened a PR to deprecate this as suggested. I've added it to 0.23.0 currently, given the time - happy to move it to 0.24.0 if we can't get this merged in time.

@jreback jreback modified the milestones: 0.24.0, 0.23.0 May 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment