BUG: DTI.value_counts doesnt preserve tz #7735

Merged
merged 1 commit into from Jul 25, 2014

Conversation

Projects
None yet
2 participants
Member

sinhrks commented Jul 12, 2014

Found 2 problems related to value_counts.

  • DatetimeIndex.value_counts loses tz.
didx = pd.date_range('2011-01-01 09:00', freq='H', periods=3, tz='Asia/Tokyo')
print(didx.value_counts())
#2011-01-01 00:00:00    1
#2011-01-01 01:00:00    1
#2011-01-01 02:00:00    1
# dtype: int64
  • PeriodIndex.value_counts results in Int64Index, and unable to drop NaT.
pidx = pd.PeriodIndex(['2011-01-01 09:00', '2011-01-01 10:00', pd.NaT], freq='H')
print(pidx.value_counts())
#  359410                 1
#  359409                 1
# -9223372036854775808    1
# dtype: int64

@jreback jreback and 1 other commented on an outdated diff Jul 12, 2014

pandas/core/algorithms.py
@@ -186,6 +186,8 @@ def value_counts(values, sort=True, ascending=False, normalize=False,
convenience for pd.cut, only works with numeric data
dropna : boolean, default True
Don't include counts of NaN
+ has_inat : boolean, default False
+ Handle int dtype as datetime representation and remove iNaT using dropna
@jreback

jreback Jul 12, 2014

Contributor

this doesn't seem necessary and is very confusing, just use dropna indicator

@sinhrks

sinhrks Jul 12, 2014

Member

Is it OK to always drop iNaT even if input is normal int64?

@jreback

jreback Jul 12, 2014

Contributor

u know the dtype of the index so u can so this only if ts a date like (period or DatetimeIndex)
eg look in the very next elif

@jreback

jreback Jul 12, 2014

Contributor

and u can simply mask the results (if dropna) then box

@sinhrks

sinhrks Jul 12, 2014

Member

OK, how about this?

Contributor

jreback commented Jul 12, 2014

better

jreback added this to the 0.15.0 milestone Jul 14, 2014

Contributor

jreback commented Jul 19, 2014

rebase this

Contributor

jreback commented Jul 21, 2014

@sinhrks pls rebase

Member

sinhrks commented Jul 21, 2014

@jreback Rebased.

Contributor

jreback commented Jul 21, 2014

can you run a perf test on this?

Member

sinhrks commented Jul 25, 2014

Attached. replace looks irrelevant for this.

reshape_stack_simple                         |   4.2387 |   3.6786 |   1.1522 |
append_frame_single_mixed                    |   3.7584 |   3.1943 |   1.1766 |
replace_replacena                            |   3.1474 |   2.6023 |   1.2094 |
replace_fillna                               |   4.2147 |   3.1803 |   1.3252 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Ratio < 1.0 means the target commit is faster then the baseline.
Seed used: 1234

Target [01a9e18] : BUG: DTI.value_counts doesnt preserve tz
Base   [bfa9bc5] : Merge pull request #7733 from sinhrks/tsplot_bug
Contributor

jreback commented Jul 25, 2014

yep, agreed. ping when green

Member

sinhrks commented Jul 25, 2014

@jreback Now green

@jreback jreback and 1 other commented on an outdated diff Jul 25, 2014

pandas/core/base.py
@@ -544,3 +544,20 @@ def _add_delta(self, other):
return NotImplemented
+ def value_counts(self, normalize=False, sort=True, ascending=False,
@jreback

jreback Jul 25, 2014

Contributor

didn't want to combine with the regular value_counts (in the OpsMixin)? (too hard?)

@sinhrks

sinhrks Jul 25, 2014

Member

Possible. Actually current logic has if-else clause, not much difference putting it to IndexOpsMixin...

@jreback

jreback Jul 25, 2014

Contributor

k...just trying to avoid dup code as much as possible...lmk

@sinhrks

sinhrks Jul 25, 2014

Member

Yep, fixed. Pls check.

Contributor

jreback commented Jul 25, 2014

perfect, ping when green

@jreback jreback added a commit that referenced this pull request Jul 25, 2014

@jreback jreback Merge pull request #7735 from sinhrks/value_counts_tz
BUG: DTI.value_counts doesnt preserve tz
a23c6c7

@jreback jreback merged commit a23c6c7 into pandas-dev:master Jul 25, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details

sinhrks deleted the sinhrks:value_counts_tz branch Jul 26, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment