Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Implement PeriodEngine to fix PeriodIndex truncate bug #17755

Merged
merged 7 commits into from
Nov 4, 2017

Conversation

Licht-T
Copy link
Contributor

@Licht-T Licht-T commented Oct 2, 2017

@max-sixty
Copy link
Contributor

Does this make sense though?

Even if it fixes that issue - why should a PeriodIndex ever be equal to an Int64Index?

@Licht-T
Copy link
Contributor Author

Licht-T commented Oct 3, 2017

@jreback
Copy link
Contributor

jreback commented Oct 3, 2017

@Licht-T no this is adding a well-defined value

In [2]: pd.Period('2012-02', freq='M') + 2
Out[2]: Period('2012-04', 'M')

this is NOT the same thing. We do not treat Periods and integers the same.

@jreback
Copy link
Contributor

jreback commented Oct 3, 2017

@Licht-T if you want to revise to the issue ok

@jreback jreback added the Period Period data type label Oct 3, 2017
@Licht-T
Copy link
Contributor Author

Licht-T commented Oct 5, 2017

@MaximilianR @jreback Thanks for your comment. I am making another solution.

@codecov
Copy link

codecov bot commented Oct 5, 2017

Codecov Report

Merging #17755 into master will decrease coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17755      +/-   ##
==========================================
- Coverage   91.25%   91.21%   -0.05%     
==========================================
  Files         163      163              
  Lines       49856    49856              
==========================================
- Hits        45496    45475      -21     
- Misses       4360     4381      +21
Flag Coverage Δ
#multiple 89.01% <ø> (-0.03%) ⬇️
#single 40.25% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.38% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.73% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 97fea48...6255812. Read the comment docs.

@codecov
Copy link

codecov bot commented Oct 5, 2017

Codecov Report

Merging #17755 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17755      +/-   ##
==========================================
+ Coverage   91.25%   91.26%   +<.01%     
==========================================
  Files         163      163              
  Lines       50120    50123       +3     
==========================================
+ Hits        45737    45744       +7     
+ Misses       4383     4379       -4
Flag Coverage Δ
#multiple 89.07% <100%> (+0.02%) ⬆️
#single 40.32% <100%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/period.py 92.87% <100%> (+0.19%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️
pandas/core/reshape/merge.py 94.26% <0%> (ø) ⬆️
pandas/plotting/_converter.py 65.2% <0%> (+1.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27bbea7...ec60800. Read the comment docs.

@Licht-T Licht-T changed the title BUG: Make Period and int comparable BUG: Implement PeriodEngine to fix PeriodIndex truncate bug Oct 5, 2017
@Licht-T
Copy link
Contributor Author

Licht-T commented Oct 5, 2017

@jreback @MaximilianR Now revised. I implemented PeriodEngine.

@@ -267,6 +269,10 @@ def __new__(cls, data=None, ordinal=None, freq=None, start=None, end=None,
data = period.extract_ordinals(data, freq)
return cls._from_ordinals(data, name=name, freq=freq)

@cache_readonly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might already be implemented by a superclass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I c. ok, we should make this much more generic (IOW just pass in self and have the engine extract what it needs), but that's another issue.


def test_truncate(self):
# GH 17717
idx1 = pd.PeriodIndex([
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should go with the Series tests for truncate.

pd.Period('2017-09-03')
])
series1 = pd.Series([1, 2, 3], index=idx1)
result1 = series1.truncate(after='2017-09-02')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to have a few period index specific tests that exercise the engine. you can look at what tests we have for example for DatetimeIndex / TimedeltaIndex for the same.

@Licht-T
Copy link
Contributor Author

Licht-T commented Oct 6, 2017

@jreback Thank you! I'll add tests for PeriodEngine.

@Licht-T
Copy link
Contributor Author

Licht-T commented Oct 7, 2017

@jreback Added some tests.

@Licht-T
Copy link
Contributor Author

Licht-T commented Oct 24, 2017

Fixed conflicts.

@jreback
Copy link
Contributor

jreback commented Oct 28, 2017

can you rebase

@jreback
Copy link
Contributor

jreback commented Oct 28, 2017

pls add a note for 0.22

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Oct 28, 2017
@Licht-T Licht-T force-pushed the period-and-int-comparable branch 2 times, most recently from a4f971b to f01bd95 Compare October 29, 2017 02:09
@Licht-T
Copy link
Contributor Author

Licht-T commented Oct 29, 2017

@jreback Rebased & added what's new note.

def _call_monotonic(self, values):
return super(PeriodEngine, self)._call_monotonic(values.view('i8'))

cdef _maybe_get_bool_indexer(self, object val):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this duplicating some of the super code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so provide a helper function to avoid repeating all of this code again.

def get_indexer(self, values):
cdef ndarray[int64_t, ndim=1] ordinals

self._ensure_mapping_populated()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you call super in most of these cases?

cdef _get_index_values(self):
return self.vgetter()

cpdef _call_map_locations(self, values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessary to create another way of calling this? e.g. _call_map_locations

Copy link
Contributor Author

@Licht-T Licht-T Oct 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback So you mean that we don't have to define _call_map_locations by cpdef, instead of cdef?

Copy link
Contributor Author

@Licht-T Licht-T Nov 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I defined _call_map_locations as cdef, but some tests fail.
This is because _ensure_mapping_populated is inline.

@@ -267,6 +269,10 @@ def __new__(cls, data=None, ordinal=None, freq=None, start=None, end=None,
data = period.extract_ordinals(data, freq)
return cls._from_ordinals(data, name=name, freq=freq)

@cache_readonly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I c. ok, we should make this much more generic (IOW just pass in self and have the engine extract what it needs), but that's another issue.

idx0 = pd.PeriodIndex([p0, p1, p2])
expected_idx1_p1 = 1
expected_idx1_p2 = 2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some comments on these cases


ps0 = [p0, p1, p2]
idx0 = pd.PeriodIndex(ps0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @jorisvandenbossche @MaximilianR @shoyer these are similar to Interval discussions, though conceptually much simpler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and similar to that discussion, I don't like the current (and tested on the lines below) behaviour.

I don't think that:

In [97]: p0 = pd.Period('2017-09-01')
    ...: p1 = pd.Period('2017-09-02')

In [98]: idx = pd.PeriodIndex([p0, p1])

In [99]: idx.contains("2017-09-01 09:00")
Out[99]: True

should be the behaviour of contains. But let's leave that discussion out of this PR, as I don't think the PR is making any changes to the behaviour of __contains__ / contains ? (or does it?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche @jreback Yes. No changes in __contains__ / contains.

@Licht-T
Copy link
Contributor Author

Licht-T commented Nov 3, 2017

@jreback Almost fixed, except this.
#17755 (comment)

@@ -3,10 +3,11 @@
import pytest

import numpy as np
from numpy import testing as ntm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NO, we never use numpy.testing.! remove!

use tm.assert_numpy_array_equal.

@Licht-T
Copy link
Contributor Author

Licht-T commented Nov 4, 2017

@jreback Removed numpy test module.

@jreback jreback added this to the 0.22.0 milestone Nov 4, 2017
@jreback jreback merged commit 5f11353 into pandas-dev:master Nov 4, 2017
@jreback
Copy link
Contributor

jreback commented Nov 4, 2017

thanks @Licht-T if you'd like to see what can be simplified in index.pyx from the type hierarchy and/or general refactoring would be welcome.

1kastner pushed a commit to 1kastner/pandas that referenced this pull request Nov 5, 2017
jreback added a commit to jreback/pandas that referenced this pull request Nov 5, 2017
jreback added a commit that referenced this pull request Nov 5, 2017
@jreback
Copy link
Contributor

jreback commented Nov 6, 2017

jreback added a commit to jreback/pandas that referenced this pull request Nov 7, 2017
@jreback jreback mentioned this pull request Nov 7, 2017
jreback added a commit that referenced this pull request Nov 7, 2017
watercrossing pushed a commit to watercrossing/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.truncate() fails on PeriodIndexes with duplicate values
4 participants