ENH: Allow where/mask/Indexers to accept callable #12539

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
7 participants
@sinhrks
Member

sinhrks commented Mar 6, 2016

  • closes #12533
  • closes #11485
  • tests added / passed
  • passes git diff upstream/master | flake8 --diff
  • whatsnew entry

Current impl is very simple like assign. Another idea is applying some restriction to return value from callable (for example, only allow aligned DataFrame or scalar).

@sinhrks sinhrks added this to the 0.18.1 milestone Mar 6, 2016

@kawochen

This comment has been minimized.

Show comment
Hide comment
@kawochen

kawochen Mar 6, 2016

Contributor

One potential problem is when you subclass DataFrame and make it callable.

Contributor

kawochen commented Mar 6, 2016

One potential problem is when you subclass DataFrame and make it callable.

@sinhrks

This comment has been minimized.

Show comment
Hide comment
@sinhrks

sinhrks Mar 6, 2016

Member

Do you know such a package? The logic is not specific here and I don't think we have to care ATM.

Member

sinhrks commented Mar 6, 2016

Do you know such a package? The logic is not specific here and I don't think we have to care ATM.

@kawochen

This comment has been minimized.

Show comment
Hide comment
@kawochen

kawochen Mar 6, 2016

Contributor

I don't know anyone that uses it that way. But since the order of consideration is relevant, I think it should be documented that the args are first tested for callability.

Contributor

kawochen commented Mar 6, 2016

I don't know anyone that uses it that way. But since the order of consideration is relevant, I think it should be documented that the args are first tested for callability.

@jreback jreback referenced this pull request Mar 6, 2016

Closed

API: chained reshaping ops #11485

2 of 2 tasks complete
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 6, 2016

Contributor

let's simultaneously do all of the chained reshaping ops #11485 in 0.18.1, making them consistent.

Contributor

jreback commented Mar 6, 2016

let's simultaneously do all of the chained reshaping ops #11485 in 0.18.1, making them consistent.

@sinhrks sinhrks changed the title from ENH: Allow .where to accept callable as condition to ENH: Allow where/mask/query/Indexers to accept callable Mar 9, 2016

@sinhrks

This comment has been minimized.

Show comment
Hide comment
@sinhrks

sinhrks Mar 9, 2016

Member

Sure, once done and changed the title.

Member

sinhrks commented Mar 9, 2016

Sure, once done and changed the title.

@jreback

View changes

doc/source/whatsnew/v0.18.1.txt
+
+.. ipython:: python
+
+ # callable returns bool indexer

This comment has been minimized.

@jreback

jreback Mar 9, 2016

Contributor

need to have some tests if people try to pass something like:

df.loc[lambda x: ...., list_of_columns] which is a bit odd, but I suppose ok, otherwise we need to interpret this

@jreback

jreback Mar 9, 2016

Contributor

need to have some tests if people try to pass something like:

df.loc[lambda x: ...., list_of_columns] which is a bit odd, but I suppose ok, otherwise we need to interpret this

This comment has been minimized.

@sinhrks

sinhrks Mar 15, 2016

Member

OK, added some tests for mixture of lamda and scalar/list.

@sinhrks

sinhrks Mar 15, 2016

Member

OK, added some tests for mixture of lamda and scalar/list.

@jreback

View changes

doc/source/whatsnew/v0.18.1.txt
+Method chaininng improvements
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Following methods / ``Indexer`` now accept ``callable`` as input. It is

This comment has been minimized.

@jreback

jreback Mar 15, 2016

Contributor

The following methods / indexers now acccept a callable

@jreback

jreback Mar 15, 2016

Contributor

The following methods / indexers now acccept a callable

@jreback

View changes

doc/source/whatsnew/v0.18.1.txt
+
+Following methods / ``Indexer`` now accept ``callable`` as input. It is
+intended to make these more useful in the method chain.
+(:issue:`11485`, :issue:`12533`)

This comment has been minimized.

@jreback

jreback Mar 15, 2016

Contributor

in method chains.

@jreback

jreback Mar 15, 2016

Contributor

in method chains.

@jreback

View changes

doc/source/whatsnew/v0.18.1.txt
+
+- ``.where`` and ``.mask``
+- ``.query``
+- ``.loc``, ``iloc`` and ``.ix``

This comment has been minimized.

@jreback

jreback Mar 15, 2016

Contributor

do:

.where() and .mask()
.query()
.loc[], .iloc[], .ix[]

@jreback

jreback Mar 15, 2016

Contributor

do:

.where() and .mask()
.query()
.loc[], .iloc[], .ix[]

@jreback

View changes

doc/source/whatsnew/v0.18.1.txt
+- ``.query``
+- ``.loc``, ``iloc`` and ``.ix``
+
+When ``callable`` is passed, it is evaluated on the caller

This comment has been minimized.

@jreback

jreback Mar 15, 2016

Contributor

a callable

@jreback

jreback Mar 15, 2016

Contributor

a callable

@jreback

View changes

doc/source/whatsnew/v0.18.1.txt
+
+.. note::
+
+ Thus, if input is callable ``np.ndarray`` or ``Series/DataFrame`` class

This comment has been minimized.

@jreback

jreback Mar 15, 2016

Contributor

what does this mean?

@jreback

jreback Mar 15, 2016

Contributor

what does this mean?

This comment has been minimized.

@sinhrks

sinhrks Mar 20, 2016

Member

Removed, as it likely to be a noise. Tried to explain what will happen if input is callable Series (unlikely).

@sinhrks

sinhrks Mar 20, 2016

Member

Removed, as it likely to be a noise. Tried to explain what will happen if input is callable Series (unlikely).

@jreback

View changes

doc/source/whatsnew/v0.18.1.txt
+ Thus, if input is callable ``np.ndarray`` or ``Series/DataFrame`` class
+ which defined on your own, it is evaluated on the caller
+ ``Series/DataFrame``, not regarded as values.
+

This comment has been minimized.

@jreback

jreback Mar 15, 2016

Contributor

make this a Furthermore and put .where/.mask after .query and indexers (make it a sub-section)

@jreback

jreback Mar 15, 2016

Contributor

make this a Furthermore and put .where/.mask after .query and indexers (make it a sub-section)

@jreback

View changes

doc/source/whatsnew/v0.18.1.txt
+ 'C': [7, 8, 9]})
+ df.where(lambda x: x > 4, lambda x: x + 10)
+
+``.query`` can now accept callable as expr argument.

This comment has been minimized.

@jreback

jreback Mar 15, 2016

Contributor

a callable

@jreback

jreback Mar 15, 2016

Contributor

a callable

@jreback

View changes

doc/source/whatsnew/v0.18.1.txt
+
+ df.query(lambda x: x.A >= 2)
+
+``.loc``, ``.iloc`` and ``.ix`` can now accept callable, and tuple of callable

This comment has been minimized.

@jreback

jreback Mar 15, 2016

Contributor

a callable

@jreback

jreback Mar 15, 2016

Contributor

a callable

@jreback

View changes

doc/source/whatsnew/v0.18.1.txt
+ df.loc[lambda x: x.A >= 2, lambda x: x.sum() > 10]
+
+ # callable returns list of labels
+ df.loc[lambda x: [1, 3], lambda x: ['A', 'B']]

This comment has been minimized.

@jreback

jreback Mar 15, 2016

Contributor

will need to add a similar section to indexing.rst (and a ref link from here)

@jreback

jreback Mar 15, 2016

Contributor

will need to add a similar section to indexing.rst (and a ref link from here)

@@ -2135,9 +2137,17 @@ def query(self, expr, inplace=False, **kwargs):
>>> df.query('a > b')
>>> df[df.a > df.b] # same result as the previous expression

This comment has been minimized.

@jreback

jreback Mar 15, 2016

Contributor

can you add an Example as well

@jreback

jreback Mar 15, 2016

Contributor

can you add an Example as well

@jreback

View changes

doc/source/whatsnew/v0.18.1.txt
+ df.loc[lambda x: x.A >= 2, lambda x: x.sum() > 10]
+
+ # callable returns list of labels
+ df.loc[lambda x: [1, 3], lambda x: ['A', 'B']]

This comment has been minimized.

@jreback

jreback Mar 15, 2016

Contributor

the other thing is at the top of the whatsnew we need a motivating example. I would take the example from here. The stats like example. and change .query('h > 0') to .query(lambda x: x.h>0). (and also change the docs in basics.rst)

@jreback

jreback Mar 15, 2016

Contributor

the other thing is at the top of the whatsnew we need a motivating example. I would take the example from here. The stats like example. and change .query('h > 0') to .query(lambda x: x.h>0). (and also change the docs in basics.rst)

@jreback

View changes

doc/source/whatsnew/v0.18.1.txt
@@ -29,6 +29,56 @@ New features
Enhancements
~~~~~~~~~~~~

This comment has been minimized.

@jreback

jreback Mar 15, 2016

Contributor

add a ref tag here (and put this in highlites up top with a ref to here)

@jreback

jreback Mar 15, 2016

Contributor

add a ref tag here (and put this in highlites up top with a ref to here)

@codecov-io

This comment has been minimized.

Show comment
Hide comment
@codecov-io

codecov-io Mar 16, 2016

Current coverage is 83.91%

Merging #12539 into master will increase coverage by +<.01%

@@             master     #12539   diff @@
==========================================
  Files           136        136          
  Lines         49908      49933    +25   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          41876      41901    +25   
  Misses         8032       8032          
  Partials          0          0          

Powered by Codecov. Last updated by 3921b32

codecov-io commented Mar 16, 2016

Current coverage is 83.91%

Merging #12539 into master will increase coverage by +<.01%

@@             master     #12539   diff @@
==========================================
  Files           136        136          
  Lines         49908      49933    +25   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          41876      41901    +25   
  Misses         8032       8032          
  Partials          0          0          

Powered by Codecov. Last updated by 3921b32

@sinhrks

This comment has been minimized.

Show comment
Hide comment
@sinhrks

sinhrks Mar 20, 2016

Member

@jreback Thanks, fixed the points.

Member

sinhrks commented Mar 20, 2016

@jreback Thanks, fixed the points.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 20, 2016

Contributor

@shoyer @jorisvandenbossche @TomAugspurger

any comments on the idea/impl?

Contributor

jreback commented Mar 20, 2016

@shoyer @jorisvandenbossche @TomAugspurger

any comments on the idea/impl?

@shoyer

This comment has been minimized.

Show comment
Hide comment
@shoyer

shoyer Mar 22, 2016

Member

If we're adding this to indexing methods:

  1. Why not add this to [] too, for consistency? df[lambda x: x.A == 'B'] beats df.loc[lambda x: x.A == 'B']
  2. What's the point of adding to the .query when we can get the exactly same functionality with indexing?
Member

shoyer commented Mar 22, 2016

If we're adding this to indexing methods:

  1. Why not add this to [] too, for consistency? df[lambda x: x.A == 'B'] beats df.loc[lambda x: x.A == 'B']
  2. What's the point of adding to the .query when we can get the exactly same functionality with indexing?
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 22, 2016

Contributor

I agree with 1)

  1. for consistency.
Contributor

jreback commented Mar 22, 2016

I agree with 1)

  1. for consistency.
@shoyer

This comment has been minimized.

Show comment
Hide comment
@shoyer

shoyer Mar 22, 2016

Member

@jreback this PR overloads indexing with yet another method. That's kind of OK because indexing already does a lot, and this will be quite useful. On the other hand, if df[lambda x: ...] works exactly the same as df.query(lambda x: ...), I don't see the point in overloading query, too, unless we are going to systematically add this to almost every possible dataframe method.

.groupby is another natural method to which to add this support.

Finally, if anyone is wondering whether this is possible without verbosely adding the df argument... why yes it is. If you use this hack that I wrote last time this came up, you can write df[lambda: A == 'B'] instead of df[lambda x: x.A == 'B']. The code that does this is actually remarkably hygienic. We could even restrict it to only working on lambdas by verifying that f.func_name == '<lambda>'.

There are probably good reasons why Python discourages this much magic, but it actually seems pretty safe to me, aside from breaking auto-complete/lint. Maybe we could make it optional. On the other hand, if you prefer the magical X notation of pandas-ply (even by a third party), that could be straightforwardly hooked into either system once you accept callables.

Member

shoyer commented Mar 22, 2016

@jreback this PR overloads indexing with yet another method. That's kind of OK because indexing already does a lot, and this will be quite useful. On the other hand, if df[lambda x: ...] works exactly the same as df.query(lambda x: ...), I don't see the point in overloading query, too, unless we are going to systematically add this to almost every possible dataframe method.

.groupby is another natural method to which to add this support.

Finally, if anyone is wondering whether this is possible without verbosely adding the df argument... why yes it is. If you use this hack that I wrote last time this came up, you can write df[lambda: A == 'B'] instead of df[lambda x: x.A == 'B']. The code that does this is actually remarkably hygienic. We could even restrict it to only working on lambdas by verifying that f.func_name == '<lambda>'.

There are probably good reasons why Python discourages this much magic, but it actually seems pretty safe to me, aside from breaking auto-complete/lint. Maybe we could make it optional. On the other hand, if you prefer the magical X notation of pandas-ply (even by a third party), that could be straightforwardly hooked into either system once you accept callables.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 22, 2016

Contributor

@shoyer

.groupby already has all of this

df.groupby(lambda x: .....)

I don't see why as a user you have to know that a particular indexing method accepts a callable or not. That seems very weird. yes, .query is just another method of indexing, BUT, it a) cannot be assigned to (like .sel in xarray). and it dispatches to things like numexpr.

one could argue that .loc should accept a string like .query actually (but that introduces some ambiguity on what the user is doing).

Contributor

jreback commented Mar 22, 2016

@shoyer

.groupby already has all of this

df.groupby(lambda x: .....)

I don't see why as a user you have to know that a particular indexing method accepts a callable or not. That seems very weird. yes, .query is just another method of indexing, BUT, it a) cannot be assigned to (like .sel in xarray). and it dispatches to things like numexpr.

one could argue that .loc should accept a string like .query actually (but that introduces some ambiguity on what the user is doing).

@sinhrks

This comment has been minimized.

Show comment
Hide comment
@sinhrks

sinhrks Mar 31, 2016

Member

Added __getitem__ impl.

Member

sinhrks commented Mar 31, 2016

Added __getitem__ impl.

@jreback

View changes

doc/source/indexing.rst
- df1.loc['a','A']
+ df1.loc['a', 'A']
+
+For getting values with ``callable``

This comment has been minimized.

@jreback

jreback Apr 3, 2016

Contributor

need versionadded tag

@jreback

jreback Apr 3, 2016

Contributor

need versionadded tag

@jreback

View changes

pandas/tests/indexing/test_indexing.py
@@ -844,6 +844,143 @@ def compare(result, expected):
result2 = s.loc[0:3]
assert_series_equal(result1, result2)

This comment has been minimized.

@jreback

jreback Apr 3, 2016

Contributor

maybe put in a new class TestCallable?

@jreback

jreback Apr 3, 2016

Contributor

maybe put in a new class TestCallable?

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 3, 2016

Contributor

@shoyer I kind of like your 'hack' (note that 'hack' -> elegant if its cool enough :)

want to do a PR to add (after this one)?

Contributor

jreback commented Apr 3, 2016

@shoyer I kind of like your 'hack' (note that 'hack' -> elegant if its cool enough :)

want to do a PR to add (after this one)?

@shoyer

This comment has been minimized.

Show comment
Hide comment
@shoyer

shoyer Apr 3, 2016

Member

I still feel that adding this to query breaks the rule for how these callables work. Generally, they work by remapping first_arg -> first_arg(self). But the way this has been added to query breaks the rule: df.query(lambda x: x > 5) is not mapped to df.query(df.x > 5) (which would error) but rather df.loc[df.x > 5].

@jreback Yeah, I can put together a PR for adding argument-less lambdas together after we add this. Restricting it to lambdas (at least func.name == '<lambda>') actually makes it pretty clean, because you are prohibited from writing larger functions with no-arguments that could be defined anywhere.

Member

shoyer commented Apr 3, 2016

I still feel that adding this to query breaks the rule for how these callables work. Generally, they work by remapping first_arg -> first_arg(self). But the way this has been added to query breaks the rule: df.query(lambda x: x > 5) is not mapped to df.query(df.x > 5) (which would error) but rather df.loc[df.x > 5].

@jreback Yeah, I can put together a PR for adding argument-less lambdas together after we add this. Restricting it to lambdas (at least func.name == '<lambda>') actually makes it pretty clean, because you are prohibited from writing larger functions with no-arguments that could be defined anywhere.

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Apr 4, 2016

Contributor

+1 on idea and the implementation looks good.

I'm with @shoyer that extending .query is a bit strange. It already seems different enough from the other indexing methods, that I'm not sure it would be strangely inconsistent if it couldn't take a callable. I don't feel strongly about this though.

What are the thoughts on setting with this syntax? Right now

In [13]: df.loc[lambda x: x.b > 0] = 10

In [14]: df
Out[14]:
                                            a          b
0                                    2.252602   0.545462
1                                    1.984207  -2.059620
2                                   -0.392923   1.810029
3                                    0.038127  -2.738341
4                                   -0.299452   1.730878
...                                       ...        ...
6                                    0.164651   1.242235
7                                    0.801972  -0.196810
8                                    0.367329  -0.486618
9                                   -0.928201  -0.120228
<function <lambda> at 0x115266d08>  10.000000  10.000000

Should this raise? At the very least we should probably document that users probably don't want to do that.

Contributor

TomAugspurger commented Apr 4, 2016

+1 on idea and the implementation looks good.

I'm with @shoyer that extending .query is a bit strange. It already seems different enough from the other indexing methods, that I'm not sure it would be strangely inconsistent if it couldn't take a callable. I don't feel strongly about this though.

What are the thoughts on setting with this syntax? Right now

In [13]: df.loc[lambda x: x.b > 0] = 10

In [14]: df
Out[14]:
                                            a          b
0                                    2.252602   0.545462
1                                    1.984207  -2.059620
2                                   -0.392923   1.810029
3                                    0.038127  -2.738341
4                                   -0.299452   1.730878
...                                       ...        ...
6                                    0.164651   1.242235
7                                    0.801972  -0.196810
8                                    0.367329  -0.486618
9                                   -0.928201  -0.120228
<function <lambda> at 0x115266d08>  10.000000  10.000000

Should this raise? At the very least we should probably document that users probably don't want to do that.

@jreback

View changes

doc/source/whatsnew/v0.18.1.txt
+ bb = pd.read_csv('data/baseball.csv', index_col='id')
+ (bb.groupby(['year', 'team']).sum()
+ .loc[lambda df: df.r > 100]
+ .query(lambda df: df.sf > df.sh * 2))

This comment has been minimized.

@jreback

jreback Apr 4, 2016

Contributor

i wouldn't have both .loc and .query here

@jreback

jreback Apr 4, 2016

Contributor

i wouldn't have both .loc and .query here

@sinhrks

This comment has been minimized.

Show comment
Hide comment
@sinhrks

sinhrks Apr 5, 2016

Member

@TomAugspurger thx, good catch. Fixed and added some tests. Because it can't be used in chainnings, raising an error is alternative idea.Thoughts?

Member

sinhrks commented Apr 5, 2016

@TomAugspurger thx, good catch. Fixed and added some tests. Because it can't be used in chainnings, raising an error is alternative idea.Thoughts?

@jreback

View changes

doc/source/whatsnew/v0.18.1.txt
@@ -9,6 +9,8 @@ We recommend that all users upgrade to this version.
Highlights include:
+- Method chaining improvements, see :ref:`here <whatsnew_0181.enhancements.method_chain>`.

This comment has been minimized.

@jreback

jreback Apr 7, 2016

Contributor

can you add the examples you have above in the appropriate sections in indexing.rst?

@jreback

jreback Apr 7, 2016

Contributor

can you add the examples you have above in the appropriate sections in indexing.rst?

This comment has been minimized.

@sinhrks

sinhrks Apr 10, 2016

Member

@jreback OK, reorganized as Selection By Callable section.

@sinhrks

sinhrks Apr 10, 2016

Member

@jreback OK, reorganized as Selection By Callable section.

@jreback

View changes

doc/source/whatsnew/v0.18.1.txt
+ # callable returns list of labels
+ df.loc[lambda x: [1, 2], lambda x: ['A', 'B']]
+
+``__getitem__``

This comment has been minimized.

@jreback

jreback Apr 14, 2016

Contributor

make change __getitem__ to [] indexing

@jreback

jreback Apr 14, 2016

Contributor

make change __getitem__ to [] indexing

This comment has been minimized.

@sinhrks

sinhrks Apr 15, 2016

Member

Yes, fixed the doc.

@sinhrks

sinhrks Apr 15, 2016

Member

Yes, fixed the doc.

@jreback

This comment has been minimized.

Show comment
Hide comment
Contributor

jreback commented Apr 14, 2016

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 18, 2016

Contributor

any comments?

Contributor

jreback commented Apr 18, 2016

any comments?

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Apr 18, 2016

Member

I will take a look in 2-3 hours

Member

jorisvandenbossche commented Apr 18, 2016

I will take a look in 2-3 hours

@jorisvandenbossche

View changes

doc/source/whatsnew/v0.18.1.txt
+These can accept a callable, and tuple of callable as a slicer. The callable
+can return valid ``bool`` indexer or anything which is valid for these indexer's input.
+
+In most use case are covered by callable returns ``bool``

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Apr 18, 2016

Member

This is not a correct sentence.

@jorisvandenbossche

jorisvandenbossche Apr 18, 2016

Member

This is not a correct sentence.

@jorisvandenbossche

View changes

doc/source/indexing.rst
@@ -79,6 +79,7 @@ of multi-axis indexing.
- A slice object with labels ``'a':'f'``, (note that contrary to usual python
slices, **both** the start and the stop are included!)
- A boolean array
+ - A ``callable``

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Apr 18, 2016

Member

Can you add here a bit more specifics? eg "function with one argument (the calling dataframe/series) and that returns valid output for indexing (one of the above)"

@jorisvandenbossche

jorisvandenbossche Apr 18, 2016

Member

Can you add here a bit more specifics? eg "function with one argument (the calling dataframe/series) and that returns valid output for indexing (one of the above)"

@jorisvandenbossche

View changes

doc/source/indexing.rst
+Selection By Callable
+---------------------
+
+.. versionadded::0.18.1

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Apr 18, 2016

Member

missing space before 0.18.1

@jorisvandenbossche

jorisvandenbossche Apr 18, 2016

Member

missing space before 0.18.1

@jorisvandenbossche

View changes

pandas/core/frame.py
+ you can refer to variables in the environment by prefixing them
+ with an '@' character like ``@a + b``.
+ If expr is callable, it is computed on the DataFrame.
+ The callable must not change input DataFrame.

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Apr 18, 2016

Member

It is not really clear what the callable should return

@jorisvandenbossche

jorisvandenbossche Apr 18, 2016

Member

It is not really clear what the callable should return

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Apr 18, 2016

Member

@sinhrks Some more doc comments:

  • Can you also update the docstrings of ix/loc/iloc
  • Is there documentation on where/query that needs to be updated (apart from the docstrings, which you did update)?

Further, I have to say I share the reservations about adding this to query. Currently, that is a specific function that has a very different way of usage compared to the indexing methods (i.e. a string that is evaluated). I don't really see the advantage of being able to use query with a callable over just using one of the indexer methods with a callable

For the rest, very nice addition!

Member

jorisvandenbossche commented Apr 18, 2016

@sinhrks Some more doc comments:

  • Can you also update the docstrings of ix/loc/iloc
  • Is there documentation on where/query that needs to be updated (apart from the docstrings, which you did update)?

Further, I have to say I share the reservations about adding this to query. Currently, that is a specific function that has a very different way of usage compared to the indexing methods (i.e. a string that is evaluated). I don't really see the advantage of being able to use query with a callable over just using one of the indexer methods with a callable

For the rest, very nice addition!

@jorisvandenbossche

View changes

pandas/tests/indexing/test_indexing.py
+ # GH 11485
+ df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': list('aabb'),
+ 'C': [1, 2, 3, 4]})
+ # iloc cannot use bool mask

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Apr 18, 2016

Member

Why is this? Boolean indexing works in iloc?

@jorisvandenbossche

jorisvandenbossche Apr 18, 2016

Member

Why is this? Boolean indexing works in iloc?

This comment has been minimized.

@sinhrks

sinhrks Apr 27, 2016

Member

We can't use boolean Series for .iloc.

df = pd.DataFrame(np.random.randn(5, 5))
df.iloc[df[0] == 0]
# NotImplementedError: iLocation based boolean indexing on an integer type is not available

# using list / array is OK
df.iloc[[True, False, True, False, True]]
          0         1         2         3         4
0  1.762968 -1.624038 -0.328636 -1.383105  0.515155
2 -1.789459  1.326702 -2.571745  0.127205  0.797887
4 -0.377984  1.015565  0.235313  0.522190 -1.179584
@sinhrks

sinhrks Apr 27, 2016

Member

We can't use boolean Series for .iloc.

df = pd.DataFrame(np.random.randn(5, 5))
df.iloc[df[0] == 0]
# NotImplementedError: iLocation based boolean indexing on an integer type is not available

# using list / array is OK
df.iloc[[True, False, True, False, True]]
          0         1         2         3         4
0  1.762968 -1.624038 -0.328636 -1.383105  0.515155
2 -1.789459  1.326702 -2.571745  0.127205  0.797887
4 -0.377984  1.015565  0.235313  0.522190 -1.179584

This comment has been minimized.

@jreback

jreback Apr 27, 2016

Contributor

hmm, IIRC there is an open issue for this. We should allow this (as we do for DataFrames). If not, can you create an issue? thxs

@jreback

jreback Apr 27, 2016

Contributor

hmm, IIRC there is an open issue for this. We should allow this (as we do for DataFrames). If not, can you create an issue? thxs

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Apr 27, 2016

Member

Just looked it up, and there is an extensive discussion of it in this issue: #3631
and then a PR from @jreback (but a long time ago) that implemented the NotImplementedError as shown above: #3635

Summary is: it is not clear whether the boolean series should be realigned (since this is the rule when passing labeled indexers, but on the other hand iloc is position-based and not label-based), therefore it was chosen to raise an error in this case.

@jorisvandenbossche

jorisvandenbossche Apr 27, 2016

Member

Just looked it up, and there is an extensive discussion of it in this issue: #3631
and then a PR from @jreback (but a long time ago) that implemented the NotImplementedError as shown above: #3635

Summary is: it is not clear whether the boolean series should be realigned (since this is the rule when passing labeled indexers, but on the other hand iloc is position-based and not label-based), therefore it was chosen to raise an error in this case.

This comment has been minimized.

@jreback

jreback Apr 27, 2016

Contributor

@sinhrks so maybe just put a link to that issue in the test. I do recall this a bit now.

@jreback

jreback Apr 27, 2016

Contributor

@sinhrks so maybe just put a link to that issue in the test. I do recall this a bit now.

@TomAugspurger

View changes

doc/source/indexing.rst
+.. versionadded::0.18.1
+
+``.loc``, ``.iloc``, ``.ix`` and also ``[]`` indexing can accept a ``callable`` as indexer. The ``callable`` must
+be a function with one argument which accepts caller and returns valid output for indexing.

This comment has been minimized.

@TomAugspurger

TomAugspurger Apr 19, 2016

Contributor

Could you change "one argument which accepts caller" to "one argument, the calling DataFrame or Series"?

@TomAugspurger

TomAugspurger Apr 19, 2016

Contributor

Could you change "one argument which accepts caller" to "one argument, the calling DataFrame or Series"?

This comment has been minimized.

@sinhrks

sinhrks Apr 27, 2016

Member

Fixed as the same as @jorisvandenbossche suggestion.

The callable must be a function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing.

@sinhrks

sinhrks Apr 27, 2016

Member

Fixed as the same as @jorisvandenbossche suggestion.

The callable must be a function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing.

@TomAugspurger

View changes

doc/source/whatsnew/v0.18.1.txt
+- ``[]`` indexing
+
+When a ``callable`` is passed, it is evaluated on the caller
+``Series/DataFrame`` prior to any other logics.

This comment has been minimized.

@TomAugspurger

TomAugspurger Apr 19, 2016

Contributor

What do you mean by this?

@TomAugspurger

TomAugspurger Apr 19, 2016

Contributor

What do you mean by this?

@TomAugspurger

View changes

pandas/core/generic.py
+ cond : boolean %(klass)s, array or callable
+ If cond is callable, it is computed on the %(klass)s and
+ should return boolean %(klass)s or array.
+ The callable must not change input %(klass)s.

This comment has been minimized.

@TomAugspurger

TomAugspurger Apr 19, 2016

Contributor

Perhaps note whether or not pandas checks if the type has changed (I haven't gotten there yet to see).

@TomAugspurger

TomAugspurger Apr 19, 2016

Contributor

Perhaps note whether or not pandas checks if the type has changed (I haven't gotten there yet to see).

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 25, 2016

Contributor

ok, how about for .query() we than raise if the input is NOT a string with a nice mesage (to use .loc)? or could show a warning and just do it anyhow.

Contributor

jreback commented Apr 25, 2016

ok, how about for .query() we than raise if the input is NOT a string with a nice mesage (to use .loc)? or could show a warning and just do it anyhow.

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Apr 26, 2016

Member

You already get an error when passing a callable to query, but I agree a nicer message can be raised.

Member

jorisvandenbossche commented Apr 26, 2016

You already get an error when passing a callable to query, but I agree a nicer message can be raised.

@shoyer

This comment has been minimized.

Show comment
Hide comment
@shoyer

shoyer Apr 26, 2016

Member

I would definitely raise if the input to .query() is not a string. Adding warnings and then doing something anyways is a bad idea.

Member

shoyer commented Apr 26, 2016

I would definitely raise if the input to .query() is not a string. Adding warnings and then doing something anyways is a bad idea.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 26, 2016

Contributor

ok @sinhrks if you can make those mods would be great. (you might need to adjust docs/example as I think you use .query, but can just change to .loc)

Contributor

jreback commented Apr 26, 2016

ok @sinhrks if you can make those mods would be great. (you might need to adjust docs/example as I think you use .query, but can just change to .loc)

@sinhrks sinhrks changed the title from ENH: Allow where/mask/query/Indexers to accept callable to ENH: Allow where/mask/Indexers to accept callable Apr 27, 2016

@sinhrks

This comment has been minimized.

Show comment
Hide comment
@sinhrks

sinhrks Apr 28, 2016

Member

Thanks for all the feedbacks. Updated. Lmk if I missed something.

Member

sinhrks commented Apr 28, 2016

Thanks for all the feedbacks. Updated. Lmk if I missed something.

@jreback

View changes

pandas/tests/indexing/test_indexing.py
@@ -5489,6 +5489,269 @@ def test_none_coercion_mixed_dtypes(self):
strict_nan=True)
+class TestIndexingCallable(tm.TestCase):

This comment has been minimized.

@jreback

jreback Apr 28, 2016

Contributor

I would make this a separate file: indexing/test_callables.py

@jreback

jreback Apr 28, 2016

Contributor

I would make this a separate file: indexing/test_callables.py

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 28, 2016

Contributor

just minor change otherwise lgtm.

@jorisvandenbossche @shoyer

Contributor

jreback commented Apr 28, 2016

just minor change otherwise lgtm.

@jorisvandenbossche @shoyer

+ df = pd.DataFrame({'A': [1, 2, 3],
+ 'B': [4, 5, 6],
+ 'C': [7, 8, 9]})
+ df.where(lambda x: x > 4, lambda x: x + 10)

This comment has been minimized.

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Apr 28, 2016

Member

@sinhrks Can you also update the docstrings for loc/iloc/ix (eg for loc https://github.com/pydata/pandas/blob/65ed3afd092908130905e7d86ff1b6b9c2002d00/pandas/core/indexing.py#L1314), these are used in the API docs.

For the rest, +1!

Member

jorisvandenbossche commented Apr 28, 2016

@sinhrks Can you also update the docstrings for loc/iloc/ix (eg for loc https://github.com/pydata/pandas/blob/65ed3afd092908130905e7d86ff1b6b9c2002d00/pandas/core/indexing.py#L1314), these are used in the API docs.

For the rest, +1!

@@ -2454,8 +2459,9 @@ def assign(self, **kwargs):
kwargs : keyword, value pairs
keywords are the column names. If the values are
callable, they are computed on the DataFrame and
- assigned to the new columns. If the values are
- not callable, (e.g. a Series, scalar, or array),
+ assigned to the new columns. The callable must not

This comment has been minimized.

@jreback

jreback Apr 29, 2016

Contributor

add a versionadded tag here (and other doc-strings)

@jreback

jreback Apr 29, 2016

Contributor

add a versionadded tag here (and other doc-strings)

This comment has been minimized.

@sinhrks

sinhrks Apr 29, 2016

Member

assign impl is unchanged.

@sinhrks

sinhrks Apr 29, 2016

Member

assign impl is unchanged.

+ cond : boolean %(klass)s, array or callable
+ If cond is callable, it is computed on the %(klass)s and
+ should return boolean %(klass)s or array.
+ The callable must not change input %(klass)s

This comment has been minimized.

@jreback

jreback Apr 29, 2016

Contributor

same

@jreback

jreback Apr 29, 2016

Contributor

same

This comment has been minimized.

@sinhrks

sinhrks Apr 29, 2016

Member

Do you know a format to use versionadded as a part of an argument description? Adding it under cond looks cond argument itself is added in the specified version.

@sinhrks

sinhrks Apr 29, 2016

Member

Do you know a format to use versionadded as a part of an argument description? Adding it under cond looks cond argument itself is added in the specified version.

This comment has been minimized.

@jreback

jreback Apr 29, 2016

Contributor

you can do

<blank line>
<indent here>versionadded: 0.18.1
<blank line>
<indent here>comment here

I think

@jorisvandenbossche

@jreback

jreback Apr 29, 2016

Contributor

you can do

<blank line>
<indent here>versionadded: 0.18.1
<blank line>
<indent here>comment here

I think

@jorisvandenbossche

This comment has been minimized.

@sinhrks

sinhrks Apr 29, 2016

Member

Added tags, and currently rendered like below.

2016-04-30 1 55 14

@sinhrks

sinhrks Apr 29, 2016

Member

Added tags, and currently rendered like below.

2016-04-30 1 55 14

This comment has been minimized.

@jreback

jreback Apr 29, 2016

Contributor

ok looks fine. thxs. going to merge.

@jreback

jreback Apr 29, 2016

Contributor

ok looks fine. thxs. going to merge.

@@ -1326,6 +1338,8 @@ class _LocIndexer(_LocationIndexer):
- A slice object with labels, e.g. ``'a':'f'`` (note that contrary
to usual python slices, **both** the start and the stop are included!).
- A boolean array.
+ - A ``callable`` function with one argument (the calling Series, DataFrame

This comment has been minimized.

@jreback

jreback Apr 29, 2016

Contributor

versionadded tag

@jreback

jreback Apr 29, 2016

Contributor

versionadded tag

+# pylint: disable-msg=W0612,E1101
+import nose
+
+import numpy as np

This comment has been minimized.

@jreback

jreback Apr 29, 2016

Contributor

👍

@jreback

jreback Apr 29, 2016

Contributor

👍

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 29, 2016

Contributor

@sinhrks some very minor doc-changes. ping when pushed.

Contributor

jreback commented Apr 29, 2016

@sinhrks some very minor doc-changes. ping when pushed.

@jreback jreback closed this in 7bbd031 Apr 29, 2016

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 29, 2016

Contributor

thanks @sinhrks !

awesome enhancement!

Contributor

jreback commented Apr 29, 2016

thanks @sinhrks !

awesome enhancement!

@sinhrks sinhrks deleted the sinhrks:where branch Apr 29, 2016

nps added a commit to nps/pandas that referenced this pull request May 17, 2016

ENH: Allow where/mask/Indexers to accept callable
closes #12533
closes #11485

Author: sinhrks <sinhrks@gmail.com>

Closes #12539 from sinhrks/where and squashes the following commits:

6b5d618 [sinhrks] ENH: Allow .where to accept callable as condition
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment