Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST : Adding new test case for pivot_table() with Categorical data #21381

Closed
wants to merge 7 commits into from

Conversation

uds5501
Copy link
Contributor

@uds5501 uds5501 commented Jun 8, 2018

TST: add additional test cases for pivot_table with categorical data #21370

Added a new test case for checking index/column/value workability as per the patch offered by PR #21252

Note : the _civ means Column,Index and Value in the test name (wasn't being very creative here)

TST: add additional test cases for pivot_table with categorical data pandas-dev#21370
@pep8speaks
Copy link

pep8speaks commented Jun 8, 2018

Hello @uds5501! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on June 10, 2018 at 20:48 Hours UTC

Fixed PEP-8 issues
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will fail linting

@jreback jreback added Testing pandas testing functions or related to the test suite Categorical Categorical Data Type labels Jun 8, 2018
@WillAyd
Copy link
Member

WillAyd commented Jun 8, 2018

@uds5501 generally ensure that your code contributions follow the PEP8 coding standard before pushing to GitHub. You can find more info on how to do that in the contributing guide:

https://pandas.pydata.org/pandas-docs/stable/contributing.html#python-pep8

fixed linting
@uds5501
Copy link
Contributor Author

uds5501 commented Jun 8, 2018

@WillAyd @jreback Fixed PEP8 issues, and yes @WillAyd I will follow this advice from now on

Copy link
Member

@jschendel jschendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test for the columns/values case in addition to the index/columns/values case?

def test_pivot_with_non_observable_dropna_civ(self, dropna):
# gh-21370
arr = [np.nan, 'low', 'high', 'low', 'high', 'high', np.nan]
df = pd.DataFrame({"In": pd.Categorical(arr,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if this technically passes LINTing I find the indentation very difficult to read. Can you try putting each key of the dict at the start of the line to see if that improves readability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, I will look into it

Attempt on making the code readable
@uds5501
Copy link
Contributor Author

uds5501 commented Jun 10, 2018

@WillAyd @jreback , I have tried to improve readability in this commit. Please see if it need further improvement

@uds5501
Copy link
Contributor Author

uds5501 commented Jun 10, 2018

@jschendel Actually, I don't know how to recreate columns/values pivots. Help would be appreciated here

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better - just some more minor style nits

# gh-21370
arr = [np.nan, 'low', 'high', 'low', np.nan]
df = pd.DataFrame(
{"In": pd.Categorical(arr,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the angled bracket to the line above. Move the word arr to the next line and move the subsequent kwargs up to that line accordingly (so long as they fit)

result = df.pivot_table(index="In", columns="Col", values="Val",
dropna=dropna)
expected = pd.DataFrame(
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this up a line

'B': [2.0, np.nan],
'C': [np.nan, 3.0]},
index=pd.Index(
pd.Categorical.from_codes([0, 1],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try moving [0, 1] to the next line and moving the subsequent kwargs up

@uds5501
Copy link
Contributor Author

uds5501 commented Jun 10, 2018

@WillAyd I have tried to fix these according to your suggestions. look any good?

"In": pd.Categorical(arr,
categories=['low', 'high'],
ordered=True),
"Col": ["A", "B", "C", "A", "B"],
Copy link
Member

@jschendel jschendel Jun 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be a Categorical with NaN values present in order to recreate the issue.

@jschendel
Copy link
Member

jschendel commented Jun 11, 2018

@jschendel Actually, I don't know how to recreate columns/values pivots. Help would be appreciated here

Here's a slightly modified version of your setup to address my last comment of 'Col' not being a Categorical with NaN values:

In [2]: pd.__version__
Out[2]: '0.24.0.dev0+85.g4807905'

In [3]: idx = [np.nan, 'low', 'high', 'low', np.nan]
   ...: col = [np.nan, 'A', 'B', np.nan, 'A']
   ...: df = pd.DataFrame({
   ...:     "In": pd.Categorical(idx, categories=['low', 'high'], ordered=True),
   ...:     "Col": pd.Categorical(col, categories=['A', 'B'], ordered=True),
   ...:     "Val": range(1, 6)})

In [4]: df
Out[4]:
     In  Col  Val
0   NaN  NaN    1
1   low    A    2
2  high    B    3
3   low  NaN    4
4   NaN    A    5

Then doing both variations of pivot_table produces the correct output on master:

In [5]: df.pivot_table(index="In", columns="Col", values="Val")
Out[5]:
Col     A    B
In
low   2.0  NaN
high  NaN  3.0

In [6]: df.pivot_table(columns="Col", values="Val")
Out[6]:
Col    A    B
Val  3.5  3.0

So you'll need to create expected to match the above. Note since the columns are categorical you'll need to explicitly pass something like columns=CategoricalIndex(...) to the DataFrame constructor, like what you did with the index parameter in your current code (or convert the columns to categorical after creation, either works).

Just to verify, the code above does indeed replicate the buggy behavior on 0.23.0:

In [2]: pd.__version__
Out[2]: '0.23.0'

In [3]: idx = [np.nan, 'low', 'high', 'low', np.nan]
   ...: col = [np.nan, 'A', 'B', np.nan, 'A']
   ...: df = pd.DataFrame({
   ...:     "In": pd.Categorical(idx, categories=['low', 'high'], ordered=True),
   ...:     "Col": pd.Categorical(col, categories=['A', 'B'], ordered=True),
   ...:     "Val": range(1, 6)})

In [4]: df.pivot_table(index="In", columns="Col", values="Val")
Out[4]:
Col  NaN    A
In
NaN  2.0  NaN
low  NaN  3.0

In [5]: df.pivot_table(columns="Col", values="Val")
Out[5]:
Col  NaN    A
Val  3.5  3.0

@uds5501
Copy link
Contributor Author

uds5501 commented Jun 12, 2018

@jschendel Thank you for the example! I will start working on it right away 😃

@jreback
Copy link
Contributor

jreback commented Jun 15, 2018

doesn't this duplicate #21393 ?

@uds5501
Copy link
Contributor Author

uds5501 commented Jun 15, 2018

@jreback No sir. the aim of this PR is to add new Test Cases which include column/index/value form of DataFrame

@jreback
Copy link
Contributor

jreback commented Jun 19, 2018

@uds5501 ok pls rebase and fixup

@jreback
Copy link
Contributor

jreback commented Sep 25, 2018

closing as stale

@jreback jreback closed this Sep 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TST: add additional test cases for pivot_table with categorical data
5 participants