Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add base test for Series[ExtensionArray].value_counts with sort=False #23074

Open
TomAugspurger opened this issue Oct 10, 2018 · 4 comments
Open
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff ExtensionArray Extending pandas with custom dtypes or arrays. Testing pandas testing functions or related to the test suite

Comments

@TomAugspurger
Copy link
Contributor

I had a bug in my PeriodArray.sort_values implementation w.r.t sorting.

Here's a generic test that would have caught it.

diff --git a/pandas/tests/extension/base/methods.py b/pandas/tests/extension/base/methods.py
index 4e7886dd2..4550817f0 100644
--- a/pandas/tests/extension/base/methods.py
+++ b/pandas/tests/extension/base/methods.py
@@ -24,6 +24,14 @@ class BaseMethodsTests(BaseExtensionTests):
 
         self.assert_series_equal(result, expected)
 
+    def test_value_counts_no_sort(self, data_for_grouping):
+        # BB..AABC -> BCCCBA
+        data = data_for_grouping.take([0, 7, 7, 7, 0, 4])
+        # B is the second most common, but should come first with sort=False
+        result = pd.Series(data).value_counts(sort=False)
+        expected = pd.Series([1, 2, 3], data_for_grouping.take([4, 0, 7]))
+        self.assert_series_equal(result, expected)
+
     def test_count(self, data_missing):
         df = pd.DataFrame({"A": data_missing})
         result = df.count(axis='columns')

We could add that to base/methods.py. It may have to be skipped for some like Categorical, since they return a CategoricalIndex, and this would build an object index.

@TomAugspurger TomAugspurger added Testing pandas testing functions or related to the test suite Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff ExtensionArray Extending pandas with custom dtypes or arrays. Effort Low labels Oct 10, 2018
akulagrawal added a commit to akulagrawal/pandas that referenced this issue Nov 5, 2018
Add base test for Series[ExtensionArray].sort_values with sort=False
Addresses pandas-dev#23074
@jorisvandenbossche
Copy link
Member

@TomAugspurger you mention sort_values in the title, but the example is using value_counts ?

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Nov 6, 2018

Sorry, that's a mistake. It should be Series[EA].value_counts(sort=False)

@TomAugspurger TomAugspurger changed the title Add base test for Series[ExtensionArray].sort_values with sort=False Add base test for Series[ExtensionArray].value_counts with sort=False Nov 6, 2018
@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Nov 6, 2018

OK, in that case, I would indeed say: let's remove it from the API, so then we also don't need to bother about the tests. But I think the blocker to do that is the ability to store EA in an Index? (although the current result of EA.value_counts also cannot do that right?)

@jbrockmendel
Copy link
Member

Since we now have Index[EA], is this now actionable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff ExtensionArray Extending pandas with custom dtypes or arrays. Testing pandas testing functions or related to the test suite
Projects
None yet
Development

No branches or pull requests

3 participants