Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG GH23282 calling min on series of NaT returns NaT #23289

Merged
merged 12 commits into from
Oct 28, 2018

Conversation

JustinZhengBC
Copy link
Contributor

@JustinZhengBC JustinZhengBC commented Oct 23, 2018

For max, NaT values are filled with the lowest possible value. For min, they are filled with the highest possible value. The problem is that only the lowest possible value is recognized as NaT. Since nanops.py is responsible for assigning the highest value to NaT when min is called, it should also be responsible for translating it to NaT when appropriate.

@pep8speaks
Copy link

pep8speaks commented Oct 23, 2018

Hello @JustinZhengBC! Thanks for updating the PR.

Comment last updated on October 25, 2018 at 23:07 Hours UTC

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be sure to always add tests first and foremost

@WillAyd WillAyd added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Oct 23, 2018
@@ -718,6 +718,8 @@ def reduction(values, axis=None, skipna=True, mask=None):
result = np.nan
else:
result = getattr(values, meth)(axis)
if is_integer(result) and result == _int64_max:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks affect to integer dtype, pd.Series([_int64_max]).min() / max()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, now it only applies the conversion from _int64_max to NaT if given an appropriate dtype.

@@ -509,3 +509,8 @@ def test_dt_timetz_accessor(self, tz_naive_fixture):
time(22, 14, tzinfo=tz)])
result = s.dt.timetz
tm.assert_series_equal(result, expected)

def test_minmax_nat(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u add test for timedelta dtype and DataFrame (#10390)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more tests, but this PR does not fix #10390

@@ -718,6 +718,9 @@ def reduction(values, axis=None, skipna=True, mask=None):
result = np.nan
else:
result = getattr(values, meth)(axis)
if (is_integer(result) and is_datetime_or_timedelta_dtype(dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs handling not here but in _wrap_resulf where a scalar should be turned into NaT if it’s null and of the correct dtype

pandas/tests/series/test_datetime_values.py Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Oct 23, 2018

Codecov Report

Merging #23289 into master will decrease coverage by <.01%.
The diff coverage is 93.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23289      +/-   ##
==========================================
- Coverage   92.22%   92.22%   -0.01%     
==========================================
  Files         169      169              
  Lines       51258    51266       +8     
==========================================
+ Hits        47274    47281       +7     
- Misses       3984     3985       +1
Flag Coverage Δ
#multiple 90.66% <93.75%> (-0.01%) ⬇️
#single 42.23% <43.75%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/nanops.py 95.05% <93.75%> (-0.15%) ⬇️
pandas/core/series.py 93.91% <0%> (-0.01%) ⬇️
pandas/core/arrays/sparse.py 91.84% <0%> (ø) ⬆️
pandas/core/arrays/datetimes.py 97.46% <0%> (ø) ⬆️
pandas/core/dtypes/cast.py 89.28% <0%> (+0.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 437f31c...a70b903. Read the comment docs.

@JustinZhengBC JustinZhengBC force-pushed the BUG-23282 branch 2 times, most recently from 3f35609 to 95f3bf6 Compare October 23, 2018 18:40
@jreback jreback added this to the 0.24.0 milestone Oct 24, 2018
@jreback jreback added the Bug label Oct 24, 2018
@jreback
Copy link
Contributor

jreback commented Oct 24, 2018

i pushed a commit. have a look.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably needs a release note.

@@ -346,7 +350,7 @@ def nanany(values, axis=None, skipna=True, mask=None):
>>> nanops.nanany(s)
False
"""
values, mask, dtype, _ = _get_values(values, skipna, False, copy=skipna,
values, mask, dtype, _, _ = _get_values(values, skipna, False, copy=skipna,
mask=mask)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listing error I would guess.

""" wrap our results if needed """

if is_datetime64_dtype(dtype):
if not isinstance(result, np.ndarray):
if result == fill_value:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it assumed that fill_value is not NA? If not, this will be wrong, since fill_value will never equal fill_value.

Should we assert that it's not NA?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this can't be for an i8 type by definition. but yes we can assert it.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JustinZhengBC can you add a whatsnew note & some asserts. ping on green.

""" wrap our results if needed """

if is_datetime64_dtype(dtype):
if not isinstance(result, np.ndarray):
if result == fill_value:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this can't be for an i8 type by definition. but yes we can assert it.

@@ -1020,6 +1020,7 @@ Datetimelike
- Bug in :func:`to_datetime` with an :class:`Index` argument that would drop the ``name`` from the result (:issue:`21697`)
- Bug in :class:`PeriodIndex` where adding or subtracting a :class:`timedelta` or :class:`Tick` object produced incorrect results (:issue:`22988`)
- Bug in :func:`date_range` when decrementing a start date to a past end date by a negative frequency (:issue:`23270`)
- Bug in :func:`min` which would return ``NaN`` instead of ``NaT`` when called on a series of ``NaT`` (:issue:`23282`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the bug in the builtin min from the standard library, or Series.min? Right now, you're linking to the builtin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was to Series.min. I think it's fixed now

@JustinZhengBC
Copy link
Contributor Author

@jreback I added a whatsnew note and an assert in the datetime64 case (adding the assert to the timedelta64 case causes tests to fail)

@jreback
Copy link
Contributor

jreback commented Oct 28, 2018

lgtm. @WillAyd over to you.

@WillAyd WillAyd merged commit 360e727 into pandas-dev:master Oct 28, 2018
@WillAyd
Copy link
Member

WillAyd commented Oct 28, 2018

Thanks @JustinZhengBC !

thoo added a commit to thoo/pandas that referenced this pull request Oct 30, 2018
…y_tests

* repo_org/master: (52 commits)
  ENH: Allow rename_axis to specify index and columns arguments  (pandas-dev#20046)
  STY: proposed isort settings [ci skip] [skip ci] [ciskip] [skipci] (pandas-dev#23366)
  MAINT: Remove extraneous test.parquet file
  CLN: Follow-up comments to pandas-devgh-23392 (pandas-dev#23401)
  BUG GH23282 calling min on series of NaT returns NaT (pandas-dev#23289)
  unpin openpyxl (pandas-dev#23361)
  REF: collect ops dispatch functions in one place, try to de-duplicate SparseDataFrame methods (pandas-dev#23060)
  CLN: Remove pandas.tools module (pandas-dev#23376)
  CLN: Remove some dtype methods from API (pandas-dev#23390)
  CLN: Cleanup toplevel namespace shims (pandas-dev#23386)
  DOC: fixup whatsnew note for GH21394 (pandas-dev#23355)
  Fix import format at pandas/tests/extension directory (pandas-dev#23365)
  DOC: Remove Series.sortlevel from api.rst (pandas-dev#23395)
  API: Disallow dtypes w/o frequency when casting (pandas-dev#23392)
  BUG/TST/REF: Datetimelike Arithmetic Methods (pandas-dev#23215)
  STYLE: lint
  add np.nan* funcs to cython_table (pandas-dev#22109)
  Run Isort on tests/util single PR (pandas-dev#23347)
  BUG: Fix date_range overflow (pandas-dev#23345)
  Run Isort on tests/arrays single PR (pandas-dev#23346)
  ...
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.min() on a series of NaTs returns nan, while .max() returns NaT
6 participants