BUG: Implement PeriodEngine to fix PeriodIndex truncate bug #17755

Licht-T · 2017-10-02T23:35:04Z

closes .truncate() fails on PeriodIndexes with duplicate values #17717
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

max-sixty · 2017-10-03T00:00:05Z

Does this make sense though?

Even if it fixes that issue - why should a PeriodIndex ever be equal to an Int64Index?

Licht-T · 2017-10-03T00:19:18Z

@MaximilianR We are already treating Period and int as same.
https://github.com/Licht-T/pandas/blob/a240a734cd33362fb6825fd2ce67163e82f8f9a4/pandas/_libs/period.pyx#L761

jreback · 2017-10-03T01:13:37Z

@Licht-T no this is adding a well-defined value

In [2]: pd.Period('2012-02', freq='M') + 2
Out[2]: Period('2012-04', 'M')

this is NOT the same thing. We do not treat Periods and integers the same.

jreback · 2017-10-03T20:20:53Z

@Licht-T if you want to revise to the issue ok

Licht-T · 2017-10-05T00:22:57Z

@MaximilianR @jreback Thanks for your comment. I am making another solution.

codecov · 2017-10-05T15:19:24Z

Codecov Report

Merging #17755 into master will decrease coverage by 0.04%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #17755      +/-   ##
==========================================
- Coverage   91.25%   91.21%   -0.05%     
==========================================
  Files         163      163              
  Lines       49856    49856              
==========================================
- Hits        45496    45475      -21     
- Misses       4360     4381      +21

Flag	Coverage Δ
#multiple	`89.01% <ø> (-0.03%)`	⬇️
#single	`40.25% <ø> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/plotting/_converter.py	`63.38% <0%> (-1.82%)`	⬇️
pandas/core/frame.py	`97.73% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 97fea48...6255812. Read the comment docs.

codecov · 2017-10-05T15:19:59Z

Codecov Report

Merging #17755 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #17755      +/-   ##
==========================================
+ Coverage   91.25%   91.26%   +<.01%     
==========================================
  Files         163      163              
  Lines       50120    50123       +3     
==========================================
+ Hits        45737    45744       +7     
+ Misses       4383     4379       -4

Flag	Coverage Δ
#multiple	`89.07% <100%> (+0.02%)`	⬆️
#single	`40.32% <100%> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/indexes/period.py	`92.87% <100%> (+0.19%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.75% <0%> (-0.1%)`	⬇️
pandas/core/reshape/merge.py	`94.26% <0%> (ø)`	⬆️
pandas/plotting/_converter.py	`65.2% <0%> (+1.81%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27bbea7...ec60800. Read the comment docs.

Licht-T · 2017-10-05T15:26:11Z

@jreback @MaximilianR Now revised. I implemented PeriodEngine.

jreback · 2017-10-05T18:50:37Z

pandas/core/indexes/period.py

@@ -267,6 +269,10 @@ def __new__(cls, data=None, ordinal=None, freq=None, start=None, end=None,
        data = period.extract_ordinals(data, freq)
        return cls._from_ordinals(data, name=name, freq=freq)

+    @cache_readonly


this might already be implemented by a superclass

@jreback Yes, but the 1st argument is not same.
https://github.com/Licht-T/pandas/blob/6255812c30978c6a55f1758927f9c300e35cc82b/pandas/core/indexes/base.py#L1558

I c. ok, we should make this much more generic (IOW just pass in self and have the engine extract what it needs), but that's another issue.

jreback · 2017-10-05T18:51:08Z

pandas/tests/indexes/period/test_indexing.py

+
+    def test_truncate(self):
+        # GH 17717
+        idx1 = pd.PeriodIndex([


these should go with the Series tests for truncate.

jreback · 2017-10-05T18:51:45Z

pandas/tests/indexes/period/test_indexing.py

+            pd.Period('2017-09-03')
+        ])
+        series1 = pd.Series([1, 2, 3], index=idx1)
+        result1 = series1.truncate(after='2017-09-02')


I would like to have a few period index specific tests that exercise the engine. you can look at what tests we have for example for DatetimeIndex / TimedeltaIndex for the same.

Licht-T · 2017-10-06T00:20:39Z

@jreback Thank you! I'll add tests for PeriodEngine.

Licht-T · 2017-10-07T08:27:19Z

@jreback Added some tests.

Licht-T · 2017-10-24T15:39:32Z

Fixed conflicts.

jreback · 2017-10-28T00:13:59Z

can you rebase

jreback · 2017-10-28T00:14:55Z

pls add a note for 0.22

Licht-T · 2017-10-29T04:44:40Z

@jreback Rebased & added what's new note.

jreback · 2017-10-29T20:00:21Z

pandas/_libs/index.pyx

+    def _call_monotonic(self, values):
+        return super(PeriodEngine, self)._call_monotonic(values.view('i8'))
+
+    cdef _maybe_get_bool_indexer(self, object val):


is this duplicating some of the super code?

@jreback Almost same, but where getting the values is not same.
https://github.com/Licht-T/pandas/blob/master/pandas/_libs/index_class_helper.pxi.in#L69
https://github.com/pandas-dev/pandas/pull/17755/files#diff-495f440aad68721ad5ca1f35087fc450R520

so provide a helper function to avoid repeating all of this code again.

jreback · 2017-10-29T20:00:41Z

pandas/_libs/index.pyx

+    def get_indexer(self, values):
+        cdef ndarray[int64_t, ndim=1] ordinals
+
+        self._ensure_mapping_populated()


can you call super in most of these cases?

jreback · 2017-10-29T20:01:11Z

pandas/_libs/index.pyx

+    cdef _get_index_values(self):
+        return self.vgetter()
+
+    cpdef _call_map_locations(self, values):


is this necessary to create another way of calling this? e.g. _call_map_locations

@jreback So you mean that we don't have to define _call_map_locations by cpdef, instead of cdef?

@jreback I defined _call_map_locations as cdef, but some tests fail.
This is because _ensure_mapping_populated is inline.

jreback · 2017-10-29T20:02:39Z

pandas/core/indexes/period.py

@@ -267,6 +269,10 @@ def __new__(cls, data=None, ordinal=None, freq=None, start=None, end=None,
        data = period.extract_ordinals(data, freq)
        return cls._from_ordinals(data, name=name, freq=freq)

+    @cache_readonly


I c. ok, we should make this much more generic (IOW just pass in self and have the engine extract what it needs), but that's another issue.

jreback · 2017-10-29T20:03:17Z

pandas/tests/indexes/period/test_indexing.py

+        idx0 = pd.PeriodIndex([p0, p1, p2])
+        expected_idx1_p1 = 1
+        expected_idx1_p2 = 2
+


can you add some comments on these cases

jreback · 2017-10-29T20:03:55Z

pandas/tests/indexes/period/test_indexing.py

+
+        ps0 = [p0, p1, p2]
+        idx0 = pd.PeriodIndex(ps0)
+


cc @jorisvandenbossche @MaximilianR @shoyer these are similar to Interval discussions, though conceptually much simpler.

Yes, and similar to that discussion, I don't like the current (and tested on the lines below) behaviour.

I don't think that:

In [97]: p0 = pd.Period('2017-09-01') ...: p1 = pd.Period('2017-09-02') In [98]: idx = pd.PeriodIndex([p0, p1]) In [99]: idx.contains("2017-09-01 09:00") Out[99]: True

should be the behaviour of contains. But let's leave that discussion out of this PR, as I don't think the PR is making any changes to the behaviour of __contains__ / contains ? (or does it?)

@jorisvandenbossche @jreback Yes. No changes in __contains__ / contains.

Licht-T · 2017-11-03T18:59:36Z

@jreback Almost fixed, except this.
#17755 (comment)

jreback · 2017-11-03T23:07:05Z

pandas/tests/indexes/period/test_indexing.py

@@ -3,10 +3,11 @@
 import pytest

 import numpy as np
+from numpy import testing as ntm


NO, we never use numpy.testing.! remove!

use tm.assert_numpy_array_equal.

Licht-T · 2017-11-04T00:39:45Z

@jreback Removed numpy test module.

jreback · 2017-11-04T17:08:04Z

thanks @Licht-T if you'd like to see what can be simplified in index.pyx from the type hierarchy and/or general refactoring would be welcome.

…ev#17755)

xref pandas-dev#17755

xref #17755

jreback · 2017-11-06T13:57:51Z

one more indexer issue
https://travis-ci.org/MacPython/pandas-wheels/jobs/297752871

xref pandas-dev#17755

xref #17755

xref pandas-dev#17755

…ev#17755)

xref pandas-dev#17755

Licht-T force-pushed the period-and-int-comparable branch from 11090de to 165566d Compare October 3, 2017 00:34

jreback added the Period Period data type label Oct 3, 2017

Licht-T changed the title ~~BUG: Make Period and int comparable~~ BUG: Implement PeriodEngine to fix PeriodIndex truncate bug Oct 5, 2017

jreback requested changes Oct 5, 2017

View reviewed changes

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Oct 28, 2017

Licht-T force-pushed the period-and-int-comparable branch 2 times, most recently from a4f971b to f01bd95 Compare October 29, 2017 02:09

jreback requested changes Oct 29, 2017

View reviewed changes

jreback mentioned this pull request Oct 31, 2017

Can't Select by Label when the amount of PeriodIndex is more than 1,000,000 #18048

Closed

Licht-T added 2 commits November 4, 2017 03:29

BUG: Create PeriodEngine

23f4a50

BUG: Change the PeriodIndex engine to PeriodEngine

6bab80a

Licht-T force-pushed the period-and-int-comparable branch from 7c1f2f3 to 151fa5f Compare November 3, 2017 18:41

Licht-T added 3 commits November 4, 2017 03:57

TST: Add PeriodIndex/PeriodEngine tests

0635b88

DOC: Add whatsnew note of PeriodIndex.truncate bug

a0532c8

BUG: Force calling super methods

88a85b7

Remove duplicate codes

b2a281a

Licht-T force-pushed the period-and-int-comparable branch from 151fa5f to b2a281a Compare November 3, 2017 18:57

jreback requested changes Nov 3, 2017

View reviewed changes

Remove numpy test module

ec60800

jreback added this to the 0.22.0 milestone Nov 4, 2017

jreback approved these changes Nov 4, 2017

View reviewed changes

jreback merged commit 5f11353 into pandas-dev:master Nov 4, 2017

1kastner pushed a commit to 1kastner/pandas that referenced this pull request Nov 5, 2017

BUG: Implement PeriodEngine to fix PeriodIndex truncate bug (pandas-d…

2c3faad

…ev#17755)

jreback added a commit to jreback/pandas that referenced this pull request Nov 5, 2017

COMPAT: 32-bit indexers compat

6cacf1a

xref pandas-dev#17755

jreback mentioned this pull request Nov 5, 2017

COMPAT: 32-bit indexers compat #18122

Merged

jreback added a commit that referenced this pull request Nov 5, 2017

COMPAT: 32-bit indexers compat (#18122)

bc69dc6

xref #17755

jreback added a commit to jreback/pandas that referenced this pull request Nov 7, 2017

COMPAT: 32-bit indexer

42b526c

xref pandas-dev#17755

jreback mentioned this pull request Nov 7, 2017

COMPAT: 32-bit indexer #18149

Merged

jreback added a commit that referenced this pull request Nov 7, 2017

COMPAT: 32-bit indexer (#18149)

8441f02

xref #17755

watercrossing pushed a commit to watercrossing/pandas that referenced this pull request Nov 10, 2017

COMPAT: 32-bit indexer (pandas-dev#18149)

f8c16f7

xref pandas-dev#17755

reidy-p mentioned this pull request Nov 16, 2017

ERR: Improve error message on non-sorted input with .truncate #17984

Merged

4 tasks

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

BUG: Implement PeriodEngine to fix PeriodIndex truncate bug (pandas-d…

41d3c96

…ev#17755)

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

COMPAT: 32-bit indexers compat (pandas-dev#18122)

bbda0e7

xref pandas-dev#17755

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

COMPAT: 32-bit indexer (pandas-dev#18149)

9c9fd6b

xref pandas-dev#17755

BUG: Implement PeriodEngine to fix PeriodIndex truncate bug #17755

BUG: Implement PeriodEngine to fix PeriodIndex truncate bug #17755

Conversation

Licht-T commented Oct 2, 2017 • edited Loading

max-sixty commented Oct 3, 2017

Licht-T commented Oct 3, 2017

jreback commented Oct 3, 2017

jreback commented Oct 3, 2017

Licht-T commented Oct 5, 2017

codecov bot commented Oct 5, 2017

Codecov Report

codecov bot commented Oct 5, 2017 • edited Loading

Codecov Report

Licht-T commented Oct 5, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Licht-T commented Oct 6, 2017 • edited Loading

Licht-T commented Oct 7, 2017 • edited Loading

Licht-T commented Oct 24, 2017

jreback commented Oct 28, 2017

jreback commented Oct 28, 2017

Licht-T commented Oct 29, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Licht-T Oct 30, 2017 • edited Loading

Choose a reason for hiding this comment

Licht-T Nov 3, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Licht-T commented Nov 3, 2017

Choose a reason for hiding this comment

Licht-T commented Nov 4, 2017

jreback commented Nov 4, 2017

jreback commented Nov 6, 2017

Licht-T commented Oct 2, 2017 •

edited

Loading

codecov bot commented Oct 5, 2017 •

edited

Loading

Licht-T commented Oct 6, 2017 •

edited

Loading

Licht-T commented Oct 7, 2017 •

edited

Loading

Licht-T Oct 30, 2017 •

edited

Loading

Licht-T Nov 3, 2017 •

edited

Loading