TST : Adding new test case for pivot_table() with Categorical data #21381

uds5501 · 2018-06-08T11:25:31Z

TST: add additional test cases for pivot_table with categorical data #21370

closes TST: add additional test cases for pivot_table with categorical data #21370
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Added a new test case for checking index/column/value workability as per the patch offered by PR #21252

Note : the _civ means Column,Index and Value in the test name (wasn't being very creative here)

TST: add additional test cases for pivot_table with categorical data pandas-dev#21370

pep8speaks · 2018-06-08T11:25:33Z

Hello @uds5501! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on June 10, 2018 at 20:48 Hours UTC

Fixed PEP-8 issues

jreback

this will fail linting

Fixed linting

WillAyd · 2018-06-08T15:54:29Z

@uds5501 generally ensure that your code contributions follow the PEP8 coding standard before pushing to GitHub. You can find more info on how to do that in the contributing guide:

https://pandas.pydata.org/pandas-docs/stable/contributing.html#python-pep8

fixed linting

uds5501 · 2018-06-08T18:58:44Z

@WillAyd @jreback Fixed PEP8 issues, and yes @WillAyd I will follow this advice from now on

jschendel

Can you test for the columns/values case in addition to the index/columns/values case?

WillAyd · 2018-06-09T22:18:53Z

pandas/tests/reshape/test_pivot.py

+    def test_pivot_with_non_observable_dropna_civ(self, dropna):
+        # gh-21370
+        arr = [np.nan, 'low', 'high', 'low', 'high', 'high', np.nan]
+        df = pd.DataFrame({"In": pd.Categorical(arr,


Even if this technically passes LINTing I find the indentation very difficult to read. Can you try putting each key of the dict at the start of the line to see if that improves readability?

alright, I will look into it

Attempt on making the code readable

uds5501 · 2018-06-10T10:43:57Z

@WillAyd @jreback , I have tried to improve readability in this commit. Please see if it need further improvement

uds5501 · 2018-06-10T17:16:22Z

@jschendel Actually, I don't know how to recreate columns/values pivots. Help would be appreciated here

WillAyd

Much better - just some more minor style nits

WillAyd · 2018-06-10T19:03:16Z

pandas/tests/reshape/test_pivot.py

+        # gh-21370
+        arr = [np.nan, 'low', 'high', 'low', np.nan]
+        df = pd.DataFrame(
+            {"In": pd.Categorical(arr,


Move the angled bracket to the line above. Move the word arr to the next line and move the subsequent kwargs up to that line accordingly (so long as they fit)

WillAyd · 2018-06-10T19:03:26Z

pandas/tests/reshape/test_pivot.py

+        result = df.pivot_table(index="In", columns="Col", values="Val",
+                                dropna=dropna)
+        expected = pd.DataFrame(
+            {


Move this up a line

WillAyd · 2018-06-10T19:04:07Z

pandas/tests/reshape/test_pivot.py

+                'B': [2.0, np.nan],
+                'C': [np.nan, 3.0]},
+            index=pd.Index(
+                pd.Categorical.from_codes([0, 1],


Try moving [0, 1] to the next line and moving the subsequent kwargs up

uds5501 · 2018-06-10T20:49:11Z

@WillAyd I have tried to fix these according to your suggestions. look any good?

jschendel · 2018-06-11T18:19:43Z

pandas/tests/reshape/test_pivot.py

+            "In": pd.Categorical(arr,
+                                 categories=['low', 'high'],
+                                 ordered=True),
+            "Col": ["A", "B", "C", "A", "B"],


This should also be a Categorical with NaN values present in order to recreate the issue.

jschendel · 2018-06-11T19:05:14Z

@jschendel Actually, I don't know how to recreate columns/values pivots. Help would be appreciated here

Here's a slightly modified version of your setup to address my last comment of 'Col' not being a Categorical with NaN values:

In [2]: pd.__version__
Out[2]: '0.24.0.dev0+85.g4807905'

In [3]: idx = [np.nan, 'low', 'high', 'low', np.nan]
   ...: col = [np.nan, 'A', 'B', np.nan, 'A']
   ...: df = pd.DataFrame({
   ...:     "In": pd.Categorical(idx, categories=['low', 'high'], ordered=True),
   ...:     "Col": pd.Categorical(col, categories=['A', 'B'], ordered=True),
   ...:     "Val": range(1, 6)})

In [4]: df
Out[4]:
     In  Col  Val
0   NaN  NaN    1
1   low    A    2
2  high    B    3
3   low  NaN    4
4   NaN    A    5

Then doing both variations of pivot_table produces the correct output on master:

In [5]: df.pivot_table(index="In", columns="Col", values="Val")
Out[5]:
Col     A    B
In
low   2.0  NaN
high  NaN  3.0

In [6]: df.pivot_table(columns="Col", values="Val")
Out[6]:
Col    A    B
Val  3.5  3.0

So you'll need to create expected to match the above. Note since the columns are categorical you'll need to explicitly pass something like columns=CategoricalIndex(...) to the DataFrame constructor, like what you did with the index parameter in your current code (or convert the columns to categorical after creation, either works).

Just to verify, the code above does indeed replicate the buggy behavior on 0.23.0:

In [2]: pd.__version__
Out[2]: '0.23.0'

In [3]: idx = [np.nan, 'low', 'high', 'low', np.nan]
   ...: col = [np.nan, 'A', 'B', np.nan, 'A']
   ...: df = pd.DataFrame({
   ...:     "In": pd.Categorical(idx, categories=['low', 'high'], ordered=True),
   ...:     "Col": pd.Categorical(col, categories=['A', 'B'], ordered=True),
   ...:     "Val": range(1, 6)})

In [4]: df.pivot_table(index="In", columns="Col", values="Val")
Out[4]:
Col  NaN    A
In
NaN  2.0  NaN
low  NaN  3.0

In [5]: df.pivot_table(columns="Col", values="Val")
Out[5]:
Col  NaN    A
Val  3.5  3.0

uds5501 · 2018-06-12T08:38:43Z

@jschendel Thank you for the example! I will start working on it right away 😃

jreback · 2018-06-15T17:30:32Z

doesn't this duplicate #21393 ?

uds5501 · 2018-06-15T17:37:12Z

@jreback No sir. the aim of this PR is to add new Test Cases which include column/index/value form of DataFrame

jreback · 2018-06-19T00:11:00Z

@uds5501 ok pls rebase and fixup

jreback · 2018-09-25T16:55:29Z

closing as stale

Update test_pivot.py

ded4775

TST: add additional test cases for pivot_table with categorical data pandas-dev#21370

Update test_pivot.py

7dbaa95

Fixed PEP-8 issues

jreback requested changes Jun 8, 2018

View reviewed changes

uds5501 added 2 commits June 8, 2018 17:46

Update test_pivot.py

4dbb21a

Fixed linting

Update test_pivot.py

15c1ef0

jreback added Testing pandas testing functions or related to the test suite Categorical Categorical Data Type labels Jun 8, 2018

Update test_pivot.py

c6196aa

fixed linting

jschendel suggested changes Jun 9, 2018

View reviewed changes

WillAyd reviewed Jun 9, 2018

View reviewed changes

Update test_pivot.py

23be90a

Attempt on making the code readable

WillAyd requested changes Jun 10, 2018

View reviewed changes

Update test_pivot.py

ee6bf25

jschendel reviewed Jun 11, 2018

View reviewed changes

jreback closed this Sep 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST : Adding new test case for pivot_table() with Categorical data #21381

TST : Adding new test case for pivot_table() with Categorical data #21381

uds5501 commented Jun 8, 2018 •

edited

Loading

pep8speaks commented Jun 8, 2018 •

edited

Loading

jreback left a comment

WillAyd commented Jun 8, 2018

uds5501 commented Jun 8, 2018

jschendel left a comment

WillAyd Jun 9, 2018

uds5501 Jun 10, 2018

uds5501 commented Jun 10, 2018

uds5501 commented Jun 10, 2018 •

edited

Loading

WillAyd left a comment

WillAyd Jun 10, 2018

WillAyd Jun 10, 2018

WillAyd Jun 10, 2018

uds5501 commented Jun 10, 2018

jschendel Jun 11, 2018 •

edited

Loading

jschendel commented Jun 11, 2018 •

edited

Loading

uds5501 commented Jun 12, 2018

jreback commented Jun 15, 2018

uds5501 commented Jun 15, 2018

jreback commented Jun 19, 2018

jreback commented Sep 25, 2018

TST : Adding new test case for pivot_table() with Categorical data #21381

TST : Adding new test case for pivot_table() with Categorical data #21381

Conversation

uds5501 commented Jun 8, 2018 • edited Loading

pep8speaks commented Jun 8, 2018 • edited Loading

Comment last updated on June 10, 2018 at 20:48 Hours UTC

jreback left a comment

Choose a reason for hiding this comment

WillAyd commented Jun 8, 2018

uds5501 commented Jun 8, 2018

jschendel left a comment

Choose a reason for hiding this comment

WillAyd Jun 9, 2018

Choose a reason for hiding this comment

uds5501 Jun 10, 2018

Choose a reason for hiding this comment

uds5501 commented Jun 10, 2018

uds5501 commented Jun 10, 2018 • edited Loading

WillAyd left a comment

Choose a reason for hiding this comment

WillAyd Jun 10, 2018

Choose a reason for hiding this comment

WillAyd Jun 10, 2018

Choose a reason for hiding this comment

WillAyd Jun 10, 2018

Choose a reason for hiding this comment

uds5501 commented Jun 10, 2018

jschendel Jun 11, 2018 • edited Loading

Choose a reason for hiding this comment

jschendel commented Jun 11, 2018 • edited Loading

uds5501 commented Jun 12, 2018

jreback commented Jun 15, 2018

uds5501 commented Jun 15, 2018

jreback commented Jun 19, 2018

jreback commented Sep 25, 2018

uds5501 commented Jun 8, 2018 •

edited

Loading

pep8speaks commented Jun 8, 2018 •

edited

Loading

uds5501 commented Jun 10, 2018 •

edited

Loading

jschendel Jun 11, 2018 •

edited

Loading

jschendel commented Jun 11, 2018 •

edited

Loading