Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pivot index bug #37771

Merged
merged 11 commits into from
Nov 14, 2020
Merged

Conversation

Jacob-Stevens-Haas
Copy link
Contributor

@Jacob-Stevens-Haas Jacob-Stevens-Haas commented Nov 11, 2020

Previously, the variable "cols" was created solely to concatenate
the lists "index" and "columns" when the "values" parameter was
left as None. It did so in a way that it modified the variable
passed as "index" using list.extend(), affecting the caller's
namespace.

This change simply uses an anonymous list in place of the
variable "cols"

Had some trouble setting up the dev environment to build pandas from source,
so no tests checked or docs built. However, it's fairly minor change, so hoping
that it builds green regardless...

Previously, the variable "cols" was created solely to concatenate
the lists "index" and "columns" when the "values" parameter was
left as None.  It did so in a way that it modified the variable
passed as "index" using list.extend(), affecting the caller's
namespace.

This change simply uses an anonymous list in place of the
variable "cols"
Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Jacob-Stevens-Haas for the PR!

Can you add tests? That's the first thing we look for when reviewing

@Jacob-Stevens-Haas
Copy link
Contributor Author

Thanks for the quick reply @arw2019!

Quick question: I'm essentially copying TestPivot.test_pivot(). The point of the bugfix is to prevent the pivot() method from modifying the passed argument index. Should I still include the assert statements from TestPivot.test_pivot() to verify that the pivot itself worked, or just include an assert statement focused on the bugfix?

@arw2019
Copy link
Member

arw2019 commented Nov 12, 2020

Thanks for the quick reply @arw2019!

Quick question: I'm essentially copying TestPivot.test_pivot(). The point of the bugfix is to prevent the pivot() method from modifying the passed argument index. Should I still include the assert statements from TestPivot.test_pivot() to verify that the pivot itself worked, or just include an assert statement focused on the bugfix?

It's easiest if you post the test then will look. As a general rule though we like DRY code yes

Jake Stevens-Haas added 2 commits November 11, 2020 19:26
This is a amended commit to solve pre-commit test:
"rst ``code`` is two backticks"
@@ -2153,3 +2153,20 @@ def test_pivot_index_none(self):

expected.columns.name = "columns"
tm.assert_frame_equal(result, expected)

def test_pivot_index_list_values_none(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong but I don't think this is testing the OP...?

I'd use the example from the issue. Do the pivot and test the result of the pivot operation, then test that the arguments have not changed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually definitely use that. This isn't quite the same use case

frame = DataFrame(data)
frame.pivot(index=index, columns=columns, values=None)

expected_ser = Series(["index"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you converting to Series? you want to check that index itself (the list) did not change, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was trying to use the assert functions in pandas._testing. Should I just use standard assert?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes use a standard assert here

@arw2019 arw2019 added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Nov 12, 2020
Jake Stevens-Haas added 2 commits November 11, 2020 20:48
* Use test case from the issue
* Use standard assert instead of making series and using pandas._testing
Now I think I've got the hang of when to use one backtick versus two
@pep8speaks
Copy link

pep8speaks commented Nov 12, 2020

Hello @Jacob-Stevens-Haas! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-11-13 18:14:02 UTC

Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, some comments


def test_pivot_index_list_values_none(self):
# Tests when index is a list and values None. See GH#37635
frame = pd.DataFrame({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-pick but call it df (convention)

index = ["lev1", "lev2"]
columns = ["lev3"]
frame.pivot(index=index, columns=columns, values=None)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check here that you got the correct result from the pivot (might need to hard-code)

@@ -2153,3 +2153,18 @@ def test_pivot_index_none(self):

expected.columns.name = "columns"
tm.assert_frame_equal(result, expected)

def test_pivot_index_list_values_none(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call it test_pivot_index_list_values_none_modifies_args

@@ -2153,3 +2153,18 @@ def test_pivot_index_none(self):

expected.columns.name = "columns"
tm.assert_frame_equal(result, expected)

def test_pivot_index_list_values_none(self):
# Tests when index is a list and values None. See GH#37635
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just do

# GH37635

Jake Stevens-Haas added 2 commits November 11, 2020 21:41
* My previous change caused the append = index is None to
  never set append = True
* black pandas fixed my KR braces in test_pivot.py
* Also black pandas fixed my KR braces on test_pivot.py
* Naming the test
* Commenting the test
* `frame` is now named `df`
* Added check that the frame actually pivots correctly
* Also: solved inconsistent pandas namespace in tests (precommit CI)
Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm pending CI turning anything up

"d": [1.0, float("nan"), 3.0, 5.0],
}
)
expected.index = MultiIndex.from_arrays(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the index and column names into the data frame constructor instead of setting them like this (can pass them in as keyword args)


expected = DataFrame(
{
"a": [1.0, 3.0, 5.0, float("nan")],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do "np.nan" here?

@@ -553,6 +553,7 @@ Reshaping
- Bug in :meth:`DataFrame.agg` with ``func={'name':<FUNC>}`` incorrectly raising ``TypeError`` when ``DataFrame.columns==['Name']`` (:issue:`36212`)
- Bug in :meth:`Series.transform` would give incorrect results or raise when the argument ``func`` was dictionary (:issue:`35811`)
- Bug in :meth:`DataFrame.pivot` did not preserve :class:`MultiIndex` level names for columns when rows and columns both multiindexed (:issue:`36360`)
- Bug in :meth:`DataFrame.pivot` modified caller's ``index`` parameter when ``columns`` was passed but ``values`` was not (:issue:`37635`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... modified index argument

)
index = ["lev1", "lev2"]
columns = ["lev3"]
pivoted = df.pivot(index=index, columns=columns, values=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a total nit but can you call this result

Copy link
Member

@ivanovmg ivanovmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jacob-Stevens-Haas, thank you for the bug fix!
Please see the comment below.

Comment on lines 2157 to 2158
def test_pivot_index_list_values_none_modifies_args(self):
# GH37635
Copy link
Member

@ivanovmg ivanovmg Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is about making sure that the index list does not change.
Would it be more reasonable to test here only that, without comparing pivoted table with the expected one?
This is related to one of your prior questions.

Thanks for the quick reply @arw2019!

Quick question: I'm essentially copying TestPivot.test_pivot(). The point of the bugfix is to prevent the pivot() method from modifying the passed argument index. Should I still include the assert statements from TestPivot.test_pivot() to verify that the pivot itself worked, or just include an assert statement focused on the bugfix?

Copy link
Contributor Author

@Jacob-Stevens-Haas Jacob-Stevens-Haas Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had that originally, but a previous review asked that I

check here that you got the correct result from the pivot (might need to hard-code)

Doesn't much matter to me - now that the code is written, it's easy to delete or leave in as desired.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly on this
But I'd say it's useful to check that the pivot is doing what it's supposed to before checking that the args aren't changed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked through the test module and noticed that there are no tests on pivot when index is a list and values are None.
Thus, there is no repetition of any kind and it is perfectly fine to have the test as it is.
Thank you!

Copy link
Member

@ivanovmg ivanovmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@ivanovmg
Copy link
Member

Only test_pivot_index_list_values_none_immutable_args may be a better name, since in fact the arguments must stay unchanged.

@Jacob-Stevens-Haas
Copy link
Contributor Author

Only test_pivot_index_list_values_none_immutable_args may be a better name, since in fact the arguments must stay unchanged.

That too was a code-review-requested name, so I'll wait for @arw2019 to comment on that suggestion

Separate issue that I may need guidance on - the test that's failing on the Azure pipeline is TestRegistration.test_pandas_plots_register, which asserts that a Series.plot() doesn't raise any warnings. Going through relevant code for that test and Series.plot() now, but it's odd that the only changes between that test passing and failing were to test_pivot.py and whatsnew... no actual code changes to the package.

@arw2019
Copy link
Member

arw2019 commented Nov 12, 2020

Only test_pivot_index_list_values_none_immutable_args may be a better name, since in fact the arguments must stay unchanged.

That too was a code-review-requested name, so I'll wait for @arw2019 to comment on that suggestion

Yeah that's a better name!

Separate issue that I may need guidance on - the test that's failing on the Azure pipeline is TestRegistration.test_pandas_plots_register, which asserts that a Series.plot() doesn't raise any warnings. Going through relevant code for that test and Series.plot() now, but it's odd that the only changes between that test passing and failing were to test_pivot.py and whatsnew... no actual code changes to the package.

Don't worry about it, it's unrelated (that test if flaky)

* renamed test variable to 'result'
* renamed test
@Jacob-Stevens-Haas
Copy link
Contributor Author

Jacob-Stevens-Haas commented Nov 12, 2020

🥳 . Now that it's passing tests, is there anything else that needs to be done? Also, should I also try to apply the changes to the 1.1.x branch too? (1.0.x branch is stale too... is that one at end of life?)

@jreback jreback added this to the 1.2 milestone Nov 13, 2020
@jreback jreback added the Bug label Nov 13, 2020

tm.assert_frame_equal(result, expected)

assert index == ["lev1", "lev2"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also assert that columns is the same as original, ping on green.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done but not green; new comment below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it's green, LGTM

Copy link
Contributor Author

@Jacob-Stevens-Haas Jacob-Stevens-Haas Nov 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, thanks! @jreback?

* Added test that columns variable is immutable
* Changed name of test per code review (missed in prev commit)
@Jacob-Stevens-Haas
Copy link
Contributor Author

Ok so same error to last time (TestRegistration.test_pandas_plots_register) on Windows37. Errors on Windows38 are all in listed as "new" tests, mostly with memory errors... Don't believe these are a consequence of my edits, but is it something I need to solve in this PR?

@jreback jreback merged commit eeaf828 into pandas-dev:master Nov 14, 2020
@jreback
Copy link
Contributor

jreback commented Nov 14, 2020

thanks @Jacob-Stevens-Haas

yeah CI has been having troubles with some windows lately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pivot() modifies arguments
5 participants