BUG: SparseSeries.value_counts ignores fill_value #12835

sinhrks · 2016-04-09T05:33:29Z

closes SparseSeries.value_counts doesn't include fill_value counts #6749
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

Also fixed an issue which categorical value_counts resets name.

jreback · 2016-04-09T14:52:10Z

pandas/core/algorithms.py

@@ -268,69 +267,81 @@ def value_counts(values, sort=True, ascending=False, normalize=False,

    if com.is_categorical_dtype(values.dtype):


actually with only a small rewriting I think you could do:

if com.is_extension_dtype(values) or com.is_datetimelike(values): result = values.values_counts(dropna) result.name = name else: .....

if we define DateTimeIndex.value_counts that properly handles localization before/after (instead of in here)
will be much cleaner.

Further it makes sense to define .value_counts() in tseries/base.py (and move the inference from core/base.py/value_counts.py there. To handle the dispatching of period/DatetimeIndex.

a bit of refactoring but moves functionaility where it should be.

sinhrks · 2016-04-09T18:03:26Z

Found current master doesn't work for datetimetz. Let me check.

values = [pd.Timestamp('2011-01-01 09:00', tz='US/Eastern'),
                  pd.Timestamp('2011-01-01 10:00', tz='US/Eastern'),
                  pd.Timestamp('2011-01-01 11:00', tz='US/Eastern'),
                  pd.Timestamp('2011-01-01 09:00', tz='US/Eastern'),
                  pd.Timestamp('2011-01-01 09:00', tz='US/Eastern'),
                  pd.Timestamp('2011-01-01 11:00', tz='US/Eastern')]
s = pd.Series(values, name='xxx')

# NG, index must have tz
s.value_counts()
# 2011-01-01 14:00:00    3
# 2011-01-01 16:00:00    2
# 2011-01-01 15:00:00    1
# Name: xxx, dtype: int64

sinhrks · 2016-04-24T03:30:59Z

Additionally fixing the other Categorical issue.

pd.CategoricalIndex([1, 2, 3, 1, 2, 3]).value_counts(normalize=True)
# UnboundLocalError: local variable 'counts' referenced before assignment

jreback · 2016-04-25T14:48:43Z

@sinhrks pls rebase when you can

codecov-io · 2016-04-28T12:38:58Z

Current coverage is 84.04%

Merging #12835 into master will increase coverage by +0.07%

@@             master     #12835   diff @@
==========================================
  Files           136        136          
  Lines         49908      49931    +23   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          41876      41962    +86   
+ Misses         8032       7969    -63   
  Partials          0          0

File ...ndas/core/groupby.py (not in diff) was modified. more
- Misses -63
- Partials 0
- Hits +63

Powered by Codecov. Last updated by 38a7531

jreback · 2016-04-28T13:12:07Z

pandas/core/algorithms.py

-            keys, counts = htable.value_count_scalar64(values, dropna)
+    dtype = values.dtype
+    is_period = com.is_period_arraylike(values)
+    is_datetimetz = com.is_datetimetz(values)


this won't happen here as is_extension_type will do this

sinhrks · 2016-04-28T13:23:44Z

pandas/tests/test_groupby.py

@@ -1875,7 +1873,6 @@ def check_nunique(df, keys):
            check_nunique(frame, ['jim'])
            check_nunique(frame, ['jim', 'joe'])

-    @slow


I removed @slow as groupby.value_coutns has separate logic depending on numpy version.

https://github.com/pydata/pandas/blob/master/pandas/core/groupby.py#L2892

jreback · 2016-04-29T13:45:44Z

pandas/core/algorithms.py

-    if sort:
-        result = result.sort_values(ascending=ascending)
+        values = values.view(np.int64)
+        keys, counts = htable.value_count_scalar64(values, dropna)



ok, though I still think we should handle in tseries/base, e.g. define DatetimeIndexOpsMixIN and just defer to it like we are doing for sparse/categorical.

sparse/categorical value_counts only supports dropna kwds. It may better to be an internal method.

If we do the same thing using DatetimeIndexOpsMixin, the impl should be:

DatetimeIndexOpsMixin 's public value_counts calls algorithms.py (as it needs to handle kwds like normalize)

algorithms.py convert datetime-likes to datetime-like Index to handle Series

Calls DatetimeIndexOpsMixin 's internal _value_counts

Suppose to be little complex, or do you have anything in your mind?

yeah ok, it can ge tricky. ok, we'll keep in mind for later. ideally to put specific logic. merging.

sinhrks added Sparse Sparse Data Type Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Apr 9, 2016

sinhrks added this to the 0.18.1 milestone Apr 9, 2016

sinhrks added the Bug label Apr 9, 2016

sinhrks force-pushed the sparse_valuecounts branch from 79164d6 to 56a415a Compare April 9, 2016 05:50

jreback reviewed Apr 9, 2016
View reviewed changes

sinhrks force-pushed the sparse_valuecounts branch 2 times, most recently from 81fac4e to 91c4c2f Compare April 9, 2016 18:00

sinhrks force-pushed the sparse_valuecounts branch 2 times, most recently from 89ab6ea to add609f Compare April 24, 2016 01:56

sinhrks added Timezones Timezone data dtype Categorical Categorical Data Type labels Apr 24, 2016

sinhrks force-pushed the sparse_valuecounts branch 2 times, most recently from c14cbf5 to f1d62ee Compare April 24, 2016 03:32

sinhrks force-pushed the sparse_valuecounts branch from f1d62ee to 456a4a9 Compare April 28, 2016 12:38

jreback reviewed Apr 28, 2016
View reviewed changes

sinhrks force-pushed the sparse_valuecounts branch from 456a4a9 to 57f6667 Compare April 28, 2016 13:22

sinhrks reviewed Apr 28, 2016
View reviewed changes

BUG: SparseSeries.value_counts ignores fill_value

2f46f73

sinhrks force-pushed the sparse_valuecounts branch from 57f6667 to caeaed9 Compare April 28, 2016 21:19

Move dtype handling to algorithm

2392e7c

sinhrks force-pushed the sparse_valuecounts branch from caeaed9 to 2392e7c Compare April 28, 2016 22:15

jreback reviewed Apr 29, 2016
View reviewed changes

jreback closed this in 8439d28 Apr 29, 2016

sinhrks deleted the sparse_valuecounts branch April 29, 2016 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: SparseSeries.value_counts ignores fill_value #12835

BUG: SparseSeries.value_counts ignores fill_value #12835

sinhrks commented Apr 9, 2016

jreback Apr 9, 2016

sinhrks commented Apr 9, 2016

sinhrks commented Apr 24, 2016

jreback commented Apr 25, 2016

codecov-io commented Apr 28, 2016 •

edited

Loading

jreback Apr 28, 2016

sinhrks Apr 28, 2016 •

edited

Loading

jreback Apr 29, 2016

sinhrks Apr 29, 2016 •

edited

Loading

jreback Apr 29, 2016

		@@ -268,69 +267,81 @@ def value_counts(values, sort=True, ascending=False, normalize=False,

		if com.is_categorical_dtype(values.dtype):

BUG: SparseSeries.value_counts ignores fill_value #12835

BUG: SparseSeries.value_counts ignores fill_value #12835

Conversation

sinhrks commented Apr 9, 2016

jreback Apr 9, 2016

Choose a reason for hiding this comment

sinhrks commented Apr 9, 2016

sinhrks commented Apr 24, 2016

jreback commented Apr 25, 2016

codecov-io commented Apr 28, 2016 • edited Loading

Current coverage is 84.04%

jreback Apr 28, 2016

Choose a reason for hiding this comment

sinhrks Apr 28, 2016 • edited Loading

Choose a reason for hiding this comment

jreback Apr 29, 2016

Choose a reason for hiding this comment

sinhrks Apr 29, 2016 • edited Loading

Choose a reason for hiding this comment

jreback Apr 29, 2016

Choose a reason for hiding this comment

codecov-io commented Apr 28, 2016 •

edited

Loading

sinhrks Apr 28, 2016 •

edited

Loading

sinhrks Apr 29, 2016 •

edited

Loading