Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API/ENH: add method='nearest' to Index.get_indexer/reindex and method to get_loc #9258

Merged
merged 1 commit into from
Feb 23, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 11 additions & 13 deletions doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -948,15 +948,9 @@ chosen from the following table:

pad / ffill, Fill values forward
bfill / backfill, Fill values backward
nearest, Fill from the nearest index value

Other fill methods could be added, of course, but these are the two most
commonly used for time series data. In a way they only make sense for time
series or otherwise ordered data, but you may have an application on non-time
series data where this sort of "interpolation" logic is the correct thing to
do. More sophisticated interpolation of missing values would be an obvious
extension.

We illustrate these fill methods on a simple TimeSeries:
We illustrate these fill methods on a simple Series:

.. ipython:: python

Expand All @@ -969,18 +963,22 @@ We illustrate these fill methods on a simple TimeSeries:
ts2.reindex(ts.index)
ts2.reindex(ts.index, method='ffill')
ts2.reindex(ts.index, method='bfill')
ts2.reindex(ts.index, method='nearest')

Note these methods require that the indexes are **order increasing**.
These methods require that the indexes are **ordered** increasing or
decreasing.

Note the same result could have been achieved using :ref:`fillna
<missing_data.fillna>`:
Note that the same result could have been achieved using
:ref:`fillna <missing_data.fillna>` (except for ``method='nearest'``) or
:ref:`interpolate <missing_data.interpolation>`:

.. ipython:: python

ts2.reindex(ts.index).fillna(method='ffill')

Note that ``reindex`` will raise a ValueError if the index is not
monotonic. ``fillna`` will not make any checks on the order of the index.
``reindex`` will raise a ValueError if the index is not monotonic increasing or
descreasing. ``fillna`` and ``interpolate`` will not make any checks on the
order of the index.

.. _basics.drop:

Expand Down
28 changes: 28 additions & 0 deletions doc/source/whatsnew/v0.16.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,15 @@ users upgrade to this version.
New features
~~~~~~~~~~~~

- Reindex now supports ``method='nearest'`` for frames or series with a monotonic increasing or decreasing index (:issue:`9258`):

.. ipython:: python

df = pd.DataFrame({'x': range(5)})
df.reindex([0.2, 1.8, 3.5], method='nearest')

This method is also exposed by the lower level ``Index.get_indexer`` and ``Index.get_loc`` methods.

.. _whatsnew_0160.api:

.. _whatsnew_0160.api_breaking:
Expand Down Expand Up @@ -189,6 +198,9 @@ Enhancements

- Added ``StringMethods.find()`` and ``rfind()`` which behave as the same as standard ``str`` (:issue:`9386`)

- ``Index.get_indexer`` now supports ``method='pad'`` and ``method='backfill'`` even for any target array, not just monotonic targets. These methods also work for monotonic decreasing as well as monotonic increasing indexes (:issue:`9258`).
- ``Index.asof`` now works on all index types (:issue:`9258`).

- Added ``StringMethods.isnumeric`` and ``isdecimal`` which behave as the same as standard ``str`` (:issue:`9439`)
- Added ``StringMethods.ljust()`` and ``rjust()`` which behave as the same as standard ``str`` (:issue:`9352`)
- ``StringMethods.pad()`` and ``center()`` now accept ``fillchar`` option to specify filling character (:issue:`9352`)
Expand Down Expand Up @@ -244,6 +256,22 @@ Bug Fixes

- Fixed character encoding bug in ``read_stata`` and ``StataReader`` when loading data from a URL (:issue:`9231`).

- Looking up a partial string label with ``DatetimeIndex.asof`` now includes values that match the string, even if they are after the start of the partial string label (:issue:`9258`). Old behavior:

.. ipython:: python
:verbatim:

In [4]: pd.to_datetime(['2000-01-31', '2000-02-28']).asof('2000-02')
Out[4]: Timestamp('2000-01-31 00:00:00')

Fixed behavior:

.. ipython:: python

pd.to_datetime(['2000-01-31', '2000-02-28']).asof('2000-02')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this was a bug then?

To reproduce the old behavior, simply add more precision to the label (e.g., use ``2000-02-01`` instead of ``2000-02``).



- Bug in adding ``offsets.Nano`` to other offets raises ``TypeError`` (:issue:`9284`)
Expand Down
18 changes: 14 additions & 4 deletions pandas/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -2682,21 +2682,31 @@ def _astype_nansafe(arr, dtype, copy=True):
return arr.view(dtype)


def _clean_fill_method(method):
def _clean_fill_method(method, allow_nearest=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems kind of hacky, why don't we just always allow nearest (I know its not quite supported by fillna). Or will you just fix this then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, when nearest is valid for fillna we should remove this option.

I was considering adding nearest for fillna in this PR but I'd rather save it for another just to minimize the scope here.

if method is None:
return None
method = method.lower()
if method == 'ffill':
method = 'pad'
if method == 'bfill':
method = 'backfill'
if method not in ['pad', 'backfill']:
msg = ('Invalid fill method. Expecting pad (ffill) or backfill '
'(bfill). Got %s' % method)

valid_methods = ['pad', 'backfill']
expecting = 'pad (ffill) or backfill (bfill)'
if allow_nearest:
valid_methods.append('nearest')
expecting = 'pad (ffill), backfill (bfill) or nearest'
if method not in valid_methods:
msg = ('Invalid fill method. Expecting %s. Got %s'
% (expecting, method))
raise ValueError(msg)
return method


def _clean_reindex_fill_method(method):
return _clean_fill_method(method, allow_nearest=True)


def _all_none(*args):
for arg in args:
if arg is not None:
Expand Down
36 changes: 19 additions & 17 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -1672,10 +1672,12 @@ def sort_index(self, axis=0, ascending=True):
keywords)
New labels / index to conform to. Preferably an Index object to
avoid duplicating data
method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None
Method to use for filling holes in reindexed DataFrame
pad / ffill: propagate last valid observation forward to next valid
backfill / bfill: use NEXT valid observation to fill gap
method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}, optional
Method to use for filling holes in reindexed DataFrame:
* default: don't fill gaps
* pad / ffill: propagate last valid observation forward to next valid
* backfill / bfill: use next valid observation to fill gap
* nearest: use nearest valid observations to fill gap
copy : boolean, default True
Return a new object, even if the passed indexes are the same
level : int or name
Expand Down Expand Up @@ -1703,7 +1705,7 @@ def reindex(self, *args, **kwargs):

# construct the args
axes, kwargs = self._construct_axes_from_arguments(args, kwargs)
method = com._clean_fill_method(kwargs.get('method'))
method = com._clean_reindex_fill_method(kwargs.get('method'))
level = kwargs.get('level')
copy = kwargs.get('copy', True)
limit = kwargs.get('limit')
Expand Down Expand Up @@ -1744,9 +1746,8 @@ def _reindex_axes(self, axes, level, limit, method, fill_value, copy):

axis = self._get_axis_number(a)
obj = obj._reindex_with_indexers(
{axis: [new_index, indexer]}, method=method,
fill_value=fill_value, limit=limit, copy=copy,
allow_dups=False)
{axis: [new_index, indexer]},
fill_value=fill_value, copy=copy, allow_dups=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this need method passed thru? shocked the tests don't fail

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll note that _reindex_with_indexers doesn't actually use method or limit arguments. So I removed them from the function signature below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep


return obj

Expand All @@ -1770,10 +1771,12 @@ def _reindex_multi(self, axes, copy, fill_value):
New labels / index to conform to. Preferably an Index object to
avoid duplicating data
axis : %(axes_single_arg)s
method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None
Method to use for filling holes in reindexed object.
pad / ffill: propagate last valid observation forward to next valid
backfill / bfill: use NEXT valid observation to fill gap
method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}, optional
Method to use for filling holes in reindexed DataFrame:
* default: don't fill gaps
* pad / ffill: propagate last valid observation forward to next valid
* backfill / bfill: use next valid observation to fill gap
* nearest: use nearest valid observations to fill gap
copy : boolean, default True
Return a new object, even if the passed indexes are the same
level : int or name
Expand Down Expand Up @@ -1802,15 +1805,14 @@ def reindex_axis(self, labels, axis=0, method=None, level=None, copy=True,

axis_name = self._get_axis_name(axis)
axis_values = self._get_axis(axis_name)
method = com._clean_fill_method(method)
method = com._clean_reindex_fill_method(method)
new_index, indexer = axis_values.reindex(labels, method, level,
limit=limit)
return self._reindex_with_indexers(
{axis: [new_index, indexer]}, method=method, fill_value=fill_value,
limit=limit, copy=copy)
{axis: [new_index, indexer]}, fill_value=fill_value, copy=copy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, right, method is extraneous here (above too?)


def _reindex_with_indexers(self, reindexers, method=None,
fill_value=np.nan, limit=None, copy=False,
def _reindex_with_indexers(self, reindexers,
fill_value=np.nan, copy=False,
allow_dups=False):
""" allow_dups indicates an internal call here """

Expand Down
Loading