Add example usage to DataFrame.filter #12399

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
4 participants
Contributor

cswarth commented Feb 20, 2016

Updates doc comments for DataFrame.filter and adds usage examples.
Fixed errors identified by flake8 and correctly rebase my branch before issuing PR.

DataFrame.filter(items=None, like=None, regex=None, axis=None)

Subset rows or columns of dataframe according to labels in the index.

Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index. This method is a thin veneer on top of DateFrame Select

Parameters:

items : list-like

List of info axis to restrict to (must not all be present)

like : string

Keep info axis where “arg in col == True”

regex : string (regular expression)

Keep info axis with re.search(regex, col) == True

axis : int or None

The axis to filter on.

Returns:

same type as input object with filtered info axis

Notes

The items, like, and regex parameters should be mutually exclusive, but this is not checked.

axis defaults to the info axis that is used when indexing with [].

Examples

>>> df
        one  two  three
mouse     1    2      3
rabbit    4    5      6
>>> # select columns by name
>>> df.filter(items=['one', 'three'])  
        one  three
mouse     1      3
rabbit    4      6
>>> # select columns by regular expression
>>> df.filter(regex='e$', axis=1)
        one  three
mouse     1      3
rabbit    4      6
>>> # select rows containing 'bbi'
>>> df.filter(like='bbi', axis=0)
        one  two  three
rabbit    4    5      6

jreback added this to the 0.18.0 milestone Feb 20, 2016

@jreback jreback commented on an outdated diff Feb 20, 2016

pandas/core/generic.py
"""
- Restrict the info axis to set of items or wildcard
+
+ Subset rows or columns of dataframe according to labels in the index.
+
+ Note that this routine does not filter a dataframe on its
+ contents. The filter is applied to the labels of the index.
+ This method is a thin veneer on top of :ref:`DateFrame Select
+ <DataFrame.select>`
@jreback

jreback Feb 20, 2016

Contributor

this last sentence is not necessary.

@jreback jreback commented on an outdated diff Feb 20, 2016

pandas/core/generic.py
@@ -2324,15 +2331,44 @@ def filter(self, items=None, like=None, regex=None, axis=None):
regex : string (regular expression)
Keep info axis with re.search(regex, col) == True
axis : int or None
@jreback

jreback Feb 20, 2016

Contributor

this axis args should be generated like we do for many other methods, see .reindex or .fillna

@jreback

jreback Apr 19, 2016

Contributor

can you make this change

@jreback

jreback Jun 3, 2016

Contributor

can you update the axis parameters as well (can be a string)

@jreback jreback and 1 other commented on an outdated diff Feb 20, 2016

pandas/core/generic.py
Notes
-----
- Arguments are mutually exclusive, but this is not checked for
+ The ``items``, ``like``, and ``regex`` parameters should be
@jreback

jreback Feb 20, 2016

Contributor

this is not checked? are you sure? if so then that's a much bigger problem. defering to documentation is not a good idea.

@cswarth

cswarth Feb 22, 2016

Contributor

Yes, mutual exclusion of parameters really is not checked. That line was already in the docs, I think I just reworded it a bit. As @MaximilianR said of this routine, "it seems to do a lot of stuff not very well"

@jreback

jreback Apr 19, 2016

Contributor

can you put in a check for this and a couple of tests?

@jreback jreback commented on an outdated diff Feb 20, 2016

pandas/core/generic.py
Notes
-----
- Arguments are mutually exclusive, but this is not checked for
+ The ``items``, ``like``, and ``regex`` parameters should be
+ mutually exclusive, but this is not checked.
@jreback

jreback Feb 20, 2016

Contributor

you can add a see also section to direct to .select

@jreback jreback modified the milestone: 0.18.1, 0.18.0 Feb 22, 2016

Contributor

cswarth commented Feb 22, 2016

Removed the "thin veneer" comment and added a See Also referring to pandas.DataFrame.select.
The point is to demonstrate that this routine does not filter on the contents of the dataframe, but on the index. Hopefully this will help those coming from R who might be expecting DataFrame.filter to act like dplyr::filter

@jreback jreback modified the milestone: 0.18.2, 0.18.1 Apr 18, 2016

Contributor

jreback commented Apr 18, 2016

can you update

Contributor

cswarth commented Apr 18, 2016

Sorry for being a newbie, but update in what way? Do you want me to rebase my doc/df_filter development branch to upstream/master and push that to github? It looks like that will be push 201 commits so I want to make sure that is the right thing to do.

Contributor

jreback commented Apr 18, 2016

you need to rebase on master first

Contributor

cswarth commented Apr 19, 2016

I updated my branch, but 24hrs later the CI builds are still not complete. Is that unusual or expected?

Also the CI build for python 3.5 failed but I don't see how it has anything to do with changes I created.
https://travis-ci.org/pydata/pandas/jobs/124031500

Contributor

jreback commented Apr 19, 2016

Travis was having some issues ok now

ithe 3.5 build on. numpy master is currently failing but that is ok

Contributor

jreback commented May 7, 2016

can you rebase / update according to comments

Contributor

jreback commented May 20, 2016

can you rebase / update

Contributor

jreback commented May 31, 2016

can you update

codecov-io commented Jun 1, 2016 edited

Current coverage is 83.76%

Merging #12399 into master will decrease coverage by 0.47%

@@             master   #12399   diff @@
========================================
  Files           138      135      -3   
  Lines         50721    49640   -1081   
  Methods           0        0           
  Branches          0        0           
========================================
- Hits          42723    41583   -1140   
- Misses         7998     8057     +59   
  Partials          0        0           

Powered by Codecov. Last updated by 2061e9e...5e492cd

Contributor

cswarth commented Jun 1, 2016

As requested, added test for mutually exclusive arguments in DataFrame.filter and added unit tests for same.

@jorisvandenbossche jorisvandenbossche and 1 other commented on an outdated diff Jun 1, 2016

pandas/core/generic.py
"""
import re
+ args = locals().copy()
+ nkw = sum(map(lambda x: operator.getitem(args,x)!=None, ['items', 'like', 'regex']))
@jorisvandenbossche

jorisvandenbossche Jun 1, 2016

Owner

This seems a bit complicated for what we want to achieve (the locals is certainly not needed I think).
Using a similar construct: sum(map(lambda x: x is not None, [item, like, regex])) (but using a list comprehension is maybe easier to read: sum([x is not None for x in [item, like, regex]]))

@cswarth

cswarth Jun 1, 2016

Contributor

thanks, changing to,

nkw = sum([operator.getitem(args,x) is not None for x in ['items', 'like', 'regex']])
if nkw > 1:
        raise TypeError("filter(): keyword arguments are mutually exclusive")
@cswarth

cswarth Jun 1, 2016 edited

Contributor

opps, I didn't understand...now using,

        nkw = sum([x is not None for x in [items, like, regex]])

@jorisvandenbossche jorisvandenbossche and 1 other commented on an outdated diff Jun 1, 2016

pandas/core/generic.py
"""
import re
+ args = locals().copy()
+ nkw = sum(map(lambda x: operator.getitem(args,x)!=None, ['items', 'like', 'regex']))
+ if nkw == 0:
+ raise TypeError("filter(): must specify at least one keyword argument")
@jorisvandenbossche

jorisvandenbossche Jun 1, 2016

Owner

This one is already raised below

@cswarth

cswarth Jun 1, 2016

Contributor

deleted this exception and updated the tests to expect the latter one.

@jorisvandenbossche jorisvandenbossche and 1 other commented on an outdated diff Jun 1, 2016

pandas/tests/test_generic.py
@@ -1497,6 +1497,40 @@ def test_to_xarray(self):
expected,
check_index_type=False)
+ def test_filter(self):
Contributor

cswarth commented Jun 1, 2016

Enforcing mutually exclusive arguments in frame.filter() is an incompatible change that is going to break some code, somewhere. Should this be called out in the release notes?

I would like to make a case that this API should be deprecated at this time. It adds little value over reindex and select at the expense of an API that is different than just about anything else.

@jreback jreback commented on the diff Jun 3, 2016

pandas/core/generic.py
"""
import re
+ nkw = sum([x is not None for x in [items, like, regex]])
+ if nkw > 1:
+ raise TypeError('Keyword arguments `items`, `like`, or `regex` '
+ 'are mutually exclusive')
+
if axis is None:
axis = self._info_axis_name
axis_name = self._get_axis_name(axis)
@jreback

jreback Jun 3, 2016

Contributor

e.g. checked here

Contributor

jreback commented Jun 3, 2016

@cswarth yes pls add a whatsnew note about the mutually exclusive args.

Contributor

jreback commented Jun 3, 2016

already slated to deprecate in 0.19.0: pydata#12401

@cswarth cswarth DOC: Add example usage to DataFrame.filter
- add sample usage
- enforce mutual exlusion of keyword arguments
- add note to what's new doc calling out API change
f48e9ff

jreback closed this in 103f7d3 Jun 3, 2016

Contributor

jreback commented Jun 3, 2016

thanks @cswarth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment