Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DTI.value_counts doesnt preserve tz #7735

Merged
merged 1 commit into from
Jul 25, 2014

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented Jul 12, 2014

Found 2 problems related to value_counts.

  • DatetimeIndex.value_counts loses tz.
didx = pd.date_range('2011-01-01 09:00', freq='H', periods=3, tz='Asia/Tokyo')
print(didx.value_counts())
#2011-01-01 00:00:00    1
#2011-01-01 01:00:00    1
#2011-01-01 02:00:00    1
# dtype: int64
  • PeriodIndex.value_counts results in Int64Index, and unable to drop NaT.
pidx = pd.PeriodIndex(['2011-01-01 09:00', '2011-01-01 10:00', pd.NaT], freq='H')
print(pidx.value_counts())
#  359410                 1
#  359409                 1
# -9223372036854775808    1
# dtype: int64

@@ -186,6 +186,8 @@ def value_counts(values, sort=True, ascending=False, normalize=False,
convenience for pd.cut, only works with numeric data
dropna : boolean, default True
Don't include counts of NaN
has_inat : boolean, default False
Handle int dtype as datetime representation and remove iNaT using dropna
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't seem necessary and is very confusing, just use dropna indicator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK to always drop iNaT even if input is normal int64?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u know the dtype of the index so u can so this only if ts a date like (period or DatetimeIndex)
eg look in the very next elif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and u can simply mask the results (if dropna) then box

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, how about this?

@jreback
Copy link
Contributor

jreback commented Jul 12, 2014

better

@jreback
Copy link
Contributor

jreback commented Jul 19, 2014

rebase this

@jreback
Copy link
Contributor

jreback commented Jul 21, 2014

@sinhrks pls rebase

@sinhrks
Copy link
Member Author

sinhrks commented Jul 21, 2014

@jreback Rebased.

@jreback
Copy link
Contributor

jreback commented Jul 21, 2014

can you run a perf test on this?

@sinhrks
Copy link
Member Author

sinhrks commented Jul 25, 2014

Attached. replace looks irrelevant for this.

reshape_stack_simple                         |   4.2387 |   3.6786 |   1.1522 |
append_frame_single_mixed                    |   3.7584 |   3.1943 |   1.1766 |
replace_replacena                            |   3.1474 |   2.6023 |   1.2094 |
replace_fillna                               |   4.2147 |   3.1803 |   1.3252 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Ratio < 1.0 means the target commit is faster then the baseline.
Seed used: 1234

Target [01a9e18] : BUG: DTI.value_counts doesnt preserve tz
Base   [bfa9bc5] : Merge pull request #7733 from sinhrks/tsplot_bug

@jreback
Copy link
Contributor

jreback commented Jul 25, 2014

yep, agreed. ping when green

@sinhrks
Copy link
Member Author

sinhrks commented Jul 25, 2014

@jreback Now green

@@ -544,3 +544,20 @@ def _add_delta(self, other):
return NotImplemented


def value_counts(self, normalize=False, sort=True, ascending=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't want to combine with the regular value_counts (in the OpsMixin)? (too hard?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible. Actually current logic has if-else clause, not much difference putting it to IndexOpsMixin...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k...just trying to avoid dup code as much as possible...lmk

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, fixed. Pls check.

@jreback
Copy link
Contributor

jreback commented Jul 25, 2014

perfect, ping when green

jreback added a commit that referenced this pull request Jul 25, 2014
BUG: DTI.value_counts doesnt preserve tz
@jreback jreback merged commit a23c6c7 into pandas-dev:master Jul 25, 2014
@sinhrks sinhrks deleted the value_counts_tz branch July 26, 2014 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants