Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix to _get_nearest_indexer for pydata/xarray#3751 #32905

Merged
merged 18 commits into from
Mar 26, 2020

Conversation

spencerkclark
Copy link
Contributor

@spencerkclark spencerkclark commented Mar 22, 2020

@jbrockmendel thanks for encouraging me to submit a PR. I'm totally open to any suggested changes; this is just a sketch of something that would work for our needs and hopefully not break anything in pandas.

To those not familiar, some changes introduced in #31511 broke some functionality of an Index subclass we define in xarray. More details/discussion can be found in pydata/xarray#3751 and pydata/xarray#3764, but the gist of the issue is that for better or for worse, we have been relying on left and right distances in _get_nearest_indexer being determined using NumPy arrays rather than Index objects (as they were prior to #31511).

One way to fix this is to override _get_nearest_indexer within our Index subclass, which we've temporarily done in pydata/xarray#3764, but this potentially fragile. It would be great if we could find a solution here.

  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@spencerkclark
Copy link
Contributor Author

The test I added fails in CI, because of another issue, which I proposed a downstream fix for in pydata/xarray#3874 (comment).

@jorisvandenbossche jorisvandenbossche added this to the 1.0.4 milestone Mar 22, 2020
@jbrockmendel
Copy link
Member

Looks like CFTImeIndex.__new__ needs to instantiate _cache; for now i guess the test will have to monkeypatch

@spencerkclark
Copy link
Contributor Author

It occurred to me we could write a more direct test that circumvents the _cache issue for now; that way we won't need to temporarily monkeypatch.

@jbrockmendel
Copy link
Member

Looks like test failures are happening because we are calling _filter_indexer_tolerance inconsistently (in master, unrelated to this PR).

To fix this, I think we need to go back to your way of doing it: pass target._values to _filter_indexer_tolerance, and we'll have to update the annotation

@spencerkclark
Copy link
Contributor Author

Cool, that seemed to do it; looks like things are all green now. Let me know if you have any further comments.

@@ -15,6 +15,7 @@ dependencies:

# pandas dependencies
- beautifulsoup4
- cftime # Needed for downstream xarray.CFTimeIndex test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls move these near xarray

@@ -14,6 +14,7 @@ dependencies:
# pandas dependencies
- beautifulsoup4
- bottleneck
- cftime # Needed for downstream xarray.CFTimeIndex test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should not be in every CI yaml
rather u skip the test if if it’s not installed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand; I was just following @jbrockmendel's recommendation. Which CI yaml should I put this in?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove from all
except the environment.yaml

pandas/tests/test_downstream.py Show resolved Hide resolved
@jreback jreback added the Dependencies Required and optional dependencies label Mar 23, 2020
@spencerkclark
Copy link
Contributor Author

I think this may be ready now. Does this require a what's new entry? If so, I notice this was tagged for a 1.0.4 release. Should I create a new what's new file for that?

@spencerkclark spencerkclark changed the title Possible fix to _get_nearest_indexer for pydata/xarray#3751 Fix to _get_nearest_indexer for pydata/xarray#3751 Mar 25, 2020
@jreback jreback removed this from the 1.0.4 milestone Mar 26, 2020
@jreback jreback added this to the 1.1 milestone Mar 26, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impl and tests look good. can you add a whatsnew in 1.1 (I am not sure we are actually doing a 1.0.4). put in bug fixes indexing. you can reference this PR number is fine (if you want to link to the xarray issue that's ok too).

@spencerkclark
Copy link
Contributor Author

Great, thanks @jreback. I added a what's new entry. Let me know if you'd like me to tweak it at all. I appreciate your and @jbrockmendel's help with this.

@@ -310,6 +310,7 @@ Indexing
- Bug in :meth:`DataFrame.iloc.__setitem__` on a :class:`DataFrame` with duplicate columns incorrectly setting values for all matching columns (:issue:`15686`, :issue:`22036`)
- Bug in :meth:`DataFrame.loc:` and :meth:`Series.loc` with a :class:`DatetimeIndex`, :class:`TimedeltaIndex`, or :class:`PeriodIndex` incorrectly allowing lookups of non-matching datetime-like dtypes (:issue:`32650`)
- Bug in :meth:`Series.__getitem__` indexing with non-standard scalars, e.g. ``np.dtype`` (:issue:`32684`)
- :meth:`Index._get_nearest_indexer` now internally uses NumPy arrays or ExtensionArrays to compute left and right distances. This preserves the fix to :issue:`26683` while maintaining compatibility with downstream :class:`Index` subclasses, e.g. xarray's CFTimeIndex (:issue:`32905`). See also `pydata/xarray#3751 <https://github.com/pydata/xarray/issues/3751>`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't reference an internal routine, rather only user visible things. You can just mention xarray.

@jreback
Copy link
Contributor

jreback commented Mar 26, 2020

lgtm ping on green.

@spencerkclark
Copy link
Contributor Author

spencerkclark commented Mar 26, 2020

Many thanks @jreback; looks like things are green.

@jreback jreback merged commit 883379c into pandas-dev:master Mar 26, 2020
@jreback
Copy link
Contributor

jreback commented Mar 26, 2020

thanks @spencerkclark very nice! keep em coming!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dependencies Required and optional dependencies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants