-
-
Notifications
You must be signed in to change notification settings - Fork 17.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: SparseSeries.value_counts ignores fill_value #12835
Conversation
79164d6
to
56a415a
Compare
@@ -268,69 +267,81 @@ def value_counts(values, sort=True, ascending=False, normalize=False, | |||
|
|||
if com.is_categorical_dtype(values.dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually with only a small rewriting I think you could do:
if com.is_extension_dtype(values) or com.is_datetimelike(values):
result = values.values_counts(dropna)
result.name = name
else:
.....
if we define DateTimeIndex.value_counts
that properly handles localization before/after (instead of in here)
will be much cleaner.
Further it makes sense to define .value_counts()
in tseries/base.py
(and move the inference from core/base.py/value_counts.py
there. To handle the dispatching of period/DatetimeIndex.
a bit of refactoring but moves functionaility where it should be.
81fac4e
to
91c4c2f
Compare
Found current master doesn't work for datetimetz. Let me check.
|
89ab6ea
to
add609f
Compare
Additionally fixing the other
|
c14cbf5
to
f1d62ee
Compare
@sinhrks pls rebase when you can |
f1d62ee
to
456a4a9
Compare
Current coverage is 84.04%@@ master #12835 diff @@
==========================================
Files 136 136
Lines 49908 49931 +23
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 41876 41962 +86
+ Misses 8032 7969 -63
Partials 0 0
|
keys, counts = htable.value_count_scalar64(values, dropna) | ||
dtype = values.dtype | ||
is_period = com.is_period_arraylike(values) | ||
is_datetimetz = com.is_datetimetz(values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won't happen here as is_extension_type
will do this
456a4a9
to
57f6667
Compare
@@ -1875,7 +1873,6 @@ def check_nunique(df, keys): | |||
check_nunique(frame, ['jim']) | |||
check_nunique(frame, ['jim', 'joe']) | |||
|
|||
@slow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed @slow
as groupby.value_coutns
has separate logic depending on numpy version.
57f6667
to
caeaed9
Compare
caeaed9
to
2392e7c
Compare
if sort: | ||
result = result.sort_values(ascending=ascending) | ||
values = values.view(np.int64) | ||
keys, counts = htable.value_count_scalar64(values, dropna) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, though I still think we should handle in tseries/base
, e.g. define DatetimeIndexOpsMixIN
and just defer to it like we are doing for sparse
/categorical
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sparse/categorical
value_counts
only supports dropna
kwds. It may better to be an internal method.
If we do the same thing using DatetimeIndexOpsMixin
, the impl should be:
DatetimeIndexOpsMixin
's publicvalue_counts
callsalgorithms.py
(as it needs to handle kwds likenormalize
)algorithms.py
convert datetime-likes to datetime-likeIndex
to handleSeries
- Calls
DatetimeIndexOpsMixin
's internal_value_counts
Suppose to be little complex, or do you have anything in your mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah ok, it can ge tricky. ok, we'll keep in mind for later. ideally to put specific logic. merging.
git diff upstream/master | flake8 --diff
Also fixed an issue which categorical
value_counts
resets name.