Skip to content

Commit

Permalink
ENH: Allow where/mask/Indexers to accept callable
Browse files Browse the repository at this point in the history
closes #12533
closes #11485

Author: sinhrks <sinhrks@gmail.com>

Closes #12539 from sinhrks/where and squashes the following commits:

6b5d618 [sinhrks] ENH: Allow .where to accept callable as condition
  • Loading branch information
sinhrks authored and jreback committed Apr 29, 2016
1 parent a615dbe commit 7bbd031
Show file tree
Hide file tree
Showing 13 changed files with 588 additions and 23 deletions.
93 changes: 79 additions & 14 deletions doc/source/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,10 @@ of multi-axis indexing.
- A slice object with labels ``'a':'f'``, (note that contrary to usual python
slices, **both** the start and the stop are included!)
- A boolean array
- A ``callable`` function with one argument (the calling Series, DataFrame or Panel) and
that returns valid output for indexing (one of the above)

.. versionadded:: 0.18.1

See more at :ref:`Selection by Label <indexing.label>`

Expand All @@ -93,6 +97,10 @@ of multi-axis indexing.
- A list or array of integers ``[4, 3, 0]``
- A slice object with ints ``1:7``
- A boolean array
- A ``callable`` function with one argument (the calling Series, DataFrame or Panel) and
that returns valid output for indexing (one of the above)

.. versionadded:: 0.18.1

See more at :ref:`Selection by Position <indexing.integer>`

Expand All @@ -110,6 +118,8 @@ of multi-axis indexing.
See more at :ref:`Advanced Indexing <advanced>` and :ref:`Advanced
Hierarchical <advanced.advanced_hierarchical>`.

- ``.loc``, ``.iloc``, ``.ix`` and also ``[]`` indexing can accept a ``callable`` as indexer. See more at :ref:`Selection By Callable <indexing.callable>`.

Getting values from an object with multi-axes selection uses the following
notation (using ``.loc`` as an example, but applies to ``.iloc`` and ``.ix`` as
well). Any of the axes accessors may be the null slice ``:``. Axes left out of
Expand Down Expand Up @@ -317,6 +327,7 @@ The ``.loc`` attribute is the primary access method. The following are valid inp
- A list or array of labels ``['a', 'b', 'c']``
- A slice object with labels ``'a':'f'`` (note that contrary to usual python slices, **both** the start and the stop are included!)
- A boolean array
- A ``callable``, see :ref:`Selection By Callable <indexing.callable>`

.. ipython:: python
Expand All @@ -340,13 +351,13 @@ With a DataFrame
index=list('abcdef'),
columns=list('ABCD'))
df1
df1.loc[['a','b','d'],:]
df1.loc[['a', 'b', 'd'], :]
Accessing via label slices

.. ipython:: python
df1.loc['d':,'A':'C']
df1.loc['d':, 'A':'C']
For getting a cross section using a label (equiv to ``df.xs('a')``)

Expand All @@ -358,15 +369,15 @@ For getting values with a boolean array

.. ipython:: python
df1.loc['a']>0
df1.loc[:,df1.loc['a']>0]
df1.loc['a'] > 0
df1.loc[:, df1.loc['a'] > 0]
For getting a value explicitly (equiv to deprecated ``df.get_value('a','A')``)

.. ipython:: python
# this is also equivalent to ``df1.at['a','A']``
df1.loc['a','A']
df1.loc['a', 'A']
.. _indexing.integer:

Expand All @@ -387,6 +398,7 @@ The ``.iloc`` attribute is the primary access method. The following are valid in
- A list or array of integers ``[4, 3, 0]``
- A slice object with ints ``1:7``
- A boolean array
- A ``callable``, see :ref:`Selection By Callable <indexing.callable>`

.. ipython:: python
Expand Down Expand Up @@ -416,26 +428,26 @@ Select via integer slicing
.. ipython:: python
df1.iloc[:3]
df1.iloc[1:5,2:4]
df1.iloc[1:5, 2:4]
Select via integer list

.. ipython:: python
df1.iloc[[1,3,5],[1,3]]
df1.iloc[[1, 3, 5], [1, 3]]
.. ipython:: python
df1.iloc[1:3,:]
df1.iloc[1:3, :]
.. ipython:: python
df1.iloc[:,1:3]
df1.iloc[:, 1:3]
.. ipython:: python
# this is also equivalent to ``df1.iat[1,1]``
df1.iloc[1,1]
df1.iloc[1, 1]
For getting a cross section using an integer position (equiv to ``df.xs(1)``)

Expand Down Expand Up @@ -471,8 +483,8 @@ returned)
dfl = pd.DataFrame(np.random.randn(5,2), columns=list('AB'))
dfl
dfl.iloc[:,2:3]
dfl.iloc[:,1:3]
dfl.iloc[:, 2:3]
dfl.iloc[:, 1:3]
dfl.iloc[4:6]
A single indexer that is out of bounds will raise an ``IndexError``.
Expand All @@ -481,12 +493,52 @@ A list of indexers where any element is out of bounds will raise an

.. code-block:: python
dfl.iloc[[4,5,6]]
dfl.iloc[[4, 5, 6]]
IndexError: positional indexers are out-of-bounds
dfl.iloc[:,4]
dfl.iloc[:, 4]
IndexError: single positional indexer is out-of-bounds
.. _indexing.callable:

Selection By Callable
---------------------

.. versionadded:: 0.18.1

``.loc``, ``.iloc``, ``.ix`` and also ``[]`` indexing can accept a ``callable`` as indexer.
The ``callable`` must be a function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing.

.. ipython:: python
df1 = pd.DataFrame(np.random.randn(6, 4),
index=list('abcdef'),
columns=list('ABCD'))
df1
df1.loc[lambda df: df.A > 0, :]
df1.loc[:, lambda df: ['A', 'B']]
df1.iloc[:, lambda df: [0, 1]]
df1[lambda df: df.columns[0]]
You can use callable indexing in ``Series``.

.. ipython:: python
df1.A.loc[lambda s: s > 0]
Using these methods / indexers, you can chain data selection operations
without using temporary variable.

.. ipython:: python
bb = pd.read_csv('data/baseball.csv', index_col='id')
(bb.groupby(['year', 'team']).sum()
.loc[lambda df: df.r > 100])
.. _indexing.basics.partial_setting:

Selecting Random Samples
Expand Down Expand Up @@ -848,6 +900,19 @@ This is equivalent (but faster than) the following.
df2 = df.copy()
df.apply(lambda x, y: x.where(x>0,y), y=df['A'])
.. versionadded:: 0.18.1

Where can accept a callable as condition and ``other`` arguments. The function must
be with one argument (the calling Series or DataFrame) and that returns valid output
as condition and ``other`` argument.

.. ipython:: python
df3 = pd.DataFrame({'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]})
df3.where(lambda x: x > 4, lambda x: x + 10)
**mask**

``mask`` is the inverse boolean operation of ``where``.
Expand Down
62 changes: 62 additions & 0 deletions doc/source/whatsnew/v0.18.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ Highlights include:
- ``pd.to_datetime()`` has gained the ability to assemble dates from a ``DataFrame``, see :ref:`here <whatsnew_0181.enhancements.assembling>`
- Custom business hour offset, see :ref:`here <whatsnew_0181.enhancements.custombusinesshour>`.
- Many bug fixes in the handling of ``sparse``, see :ref:`here <whatsnew_0181.sparse>`
- Method chaining improvements, see :ref:`here <whatsnew_0181.enhancements.method_chain>`.


.. contents:: What's new in v0.18.1
:local:
Expand Down Expand Up @@ -94,6 +96,66 @@ Now you can do:

df.groupby('group').resample('1D').ffill()

.. _whatsnew_0181.enhancements.method_chain:

Method chaininng improvements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The following methods / indexers now accept ``callable``. It is intended to make
these more useful in method chains, see :ref:`Selection By Callable <indexing.callable>`.
(:issue:`11485`, :issue:`12533`)

- ``.where()`` and ``.mask()``
- ``.loc[]``, ``iloc[]`` and ``.ix[]``
- ``[]`` indexing

``.where()`` and ``.mask()``
""""""""""""""""""""""""""""

These can accept a callable as condition and ``other``
arguments.

.. ipython:: python

df = pd.DataFrame({'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]})
df.where(lambda x: x > 4, lambda x: x + 10)

``.loc[]``, ``.iloc[]``, ``.ix[]``
""""""""""""""""""""""""""""""""""

These can accept a callable, and tuple of callable as a slicer. The callable
can return valid ``bool`` indexer or anything which is valid for these indexer's input.

.. ipython:: python

# callable returns bool indexer
df.loc[lambda x: x.A >= 2, lambda x: x.sum() > 10]

# callable returns list of labels
df.loc[lambda x: [1, 2], lambda x: ['A', 'B']]

``[]`` indexing
"""""""""""""""

Finally, you can use a callable in ``[]`` indexing of Series, DataFrame and Panel.
The callable must return valid input for ``[]`` indexing depending on its
class and index type.

.. ipython:: python

df[lambda x: 'A']

Using these methods / indexers, you can chain data selection operations
without using temporary variable.

.. ipython:: python

bb = pd.read_csv('data/baseball.csv', index_col='id')
(bb.groupby(['year', 'team']).sum()
.loc[lambda df: df.r > 100])

.. _whatsnew_0181.partial_string_indexing:

Partial string indexing on ``DateTimeIndex`` when part of a ``MultiIndex``
Expand Down
10 changes: 10 additions & 0 deletions pandas/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -1843,6 +1843,16 @@ def _get_callable_name(obj):
return None


def _apply_if_callable(maybe_callable, obj, **kwargs):
"""
Evaluate possibly callable input using obj and kwargs if it is callable,
otherwise return as it is
"""
if callable(maybe_callable):
return maybe_callable(obj, **kwargs)
return maybe_callable


_string_dtypes = frozenset(map(_get_dtype_from_object, (compat.binary_type,
compat.text_type)))

Expand Down
16 changes: 9 additions & 7 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -1970,6 +1970,7 @@ def iget_value(self, i, j):
return self.iat[i, j]

def __getitem__(self, key):
key = com._apply_if_callable(key, self)

# shortcut if we are an actual column
is_mi_columns = isinstance(self.columns, MultiIndex)
Expand Down Expand Up @@ -2138,6 +2139,9 @@ def query(self, expr, inplace=False, **kwargs):
>>> df.query('a > b')
>>> df[df.a > df.b] # same result as the previous expression
"""
if not isinstance(expr, compat.string_types):
msg = "expr must be a string to be evaluated, {0} given"
raise ValueError(msg.format(type(expr)))
kwargs['level'] = kwargs.pop('level', 0) + 1
kwargs['target'] = None
res = self.eval(expr, **kwargs)
Expand Down Expand Up @@ -2336,6 +2340,7 @@ def _box_col_values(self, values, items):
name=items, fastpath=True)

def __setitem__(self, key, value):
key = com._apply_if_callable(key, self)

# see if we can slice the rows
indexer = convert_to_index_sliceable(self, key)
Expand Down Expand Up @@ -2454,8 +2459,9 @@ def assign(self, **kwargs):
kwargs : keyword, value pairs
keywords are the column names. If the values are
callable, they are computed on the DataFrame and
assigned to the new columns. If the values are
not callable, (e.g. a Series, scalar, or array),
assigned to the new columns. The callable must not
change input DataFrame (though pandas doesn't check it).
If the values are not callable, (e.g. a Series, scalar, or array),
they are simply assigned.
Returns
Expand Down Expand Up @@ -2513,11 +2519,7 @@ def assign(self, **kwargs):
# do all calculations first...
results = {}
for k, v in kwargs.items():

if callable(v):
results[k] = v(data)
else:
results[k] = v
results[k] = com._apply_if_callable(v, data)

# ... and then assign
for k, v in sorted(results.items()):
Expand Down
28 changes: 26 additions & 2 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -4283,8 +4283,26 @@ def _align_series(self, other, join='outer', axis=None, level=None,
Parameters
----------
cond : boolean %(klass)s or array
other : scalar or %(klass)s
cond : boolean %(klass)s, array or callable
If cond is callable, it is computed on the %(klass)s and
should return boolean %(klass)s or array.
The callable must not change input %(klass)s
(though pandas doesn't check it).
.. versionadded:: 0.18.1
A callable can be used as cond.
other : scalar, %(klass)s, or callable
If other is callable, it is computed on the %(klass)s and
should return scalar or %(klass)s.
The callable must not change input %(klass)s
(though pandas doesn't check it).
.. versionadded:: 0.18.1
A callable can be used as other.
inplace : boolean, default False
Whether to perform the operation in place on the data
axis : alignment axis if needed, default None
Expand All @@ -4304,6 +4322,9 @@ def _align_series(self, other, join='outer', axis=None, level=None,
def where(self, cond, other=np.nan, inplace=False, axis=None, level=None,
try_cast=False, raise_on_error=True):

cond = com._apply_if_callable(cond, self)
other = com._apply_if_callable(other, self)

if isinstance(cond, NDFrame):
cond, _ = cond.align(self, join='right', broadcast_axis=1)
else:
Expand Down Expand Up @@ -4461,6 +4482,9 @@ def where(self, cond, other=np.nan, inplace=False, axis=None, level=None,
@Appender(_shared_docs['where'] % dict(_shared_doc_kwargs, cond="False"))
def mask(self, cond, other=np.nan, inplace=False, axis=None, level=None,
try_cast=False, raise_on_error=True):

cond = com._apply_if_callable(cond, self)

return self.where(~cond, other=other, inplace=inplace, axis=axis,
level=level, try_cast=try_cast,
raise_on_error=raise_on_error)
Expand Down
Loading

0 comments on commit 7bbd031

Please sign in to comment.