Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wide_to_long mishandles string arg for stubnames #22468

Closed
chrisjcameron opened this issue Aug 22, 2018 · 5 comments · Fixed by #22490
Closed

wide_to_long mishandles string arg for stubnames #22468

chrisjcameron opened this issue Aug 22, 2018 · 5 comments · Fixed by #22490
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@chrisjcameron
Copy link

wide_to_long should sanitize/format input args BEFORE sanity checking.

This bug report pertains to wide_to_long() in master/pandas/core/reshape/melt.py

Code Sample, a copy-pastable example if possible

# This check (~line 411) needs to move below input sanitization (~line 425).
 if any(col in stubnames for col in df.columns):
        raise ValueError("stubname can't be identical to a column name")

Problem description

The intention of the sanity checking code is to check if any element of the list stubname is in the the list of column names and generate an exception (line 411). Unfortunately, this check occours before the input sanitization that ensures that stubnames is a list -- it is also permitted to be string. As it is, the sanity check sometimes looks for substrings in a string and sometimes looks for substrings in a list, depending on the arg type for stubnames.

Expected Output

na

Output of pd.show_versions()

It is in master branch on github as of 8/22/2018

@WillAyd
Copy link
Member

WillAyd commented Aug 22, 2018

Can you provide a minimal code sample to reproduce the issue from an end user perspective?

@WillAyd WillAyd added the Needs Info Clarification about behavior needed to assess issue label Aug 22, 2018
@chrisjcameron
Copy link
Author

I will work on one that makes an error, but the error is not the issue - it a logic error that arises from assuming that stubnames is always a list BEFORE the code that enforces that assumption has a chance to run.

@chrisjcameron
Copy link
Author

The following code sample throws an error when stubnames='PA' and works when stubnames=['PA']. This illustrates how an end user might be impacted by the bug.

Notice that when stubnames is a string, line 411 (above) is actually checking the following:
any( substring in stubnames_str for substring in df.columns)
Because the column A is a substring of PA, it throws an error. The intention is to check if any column name matches the string PA, which it also fails to do.

Lines 415+ (below) need to run before the code on line 411:

# ~line 415
    if not is_list_like(stubnames):
        stubnames = [stubnames]
    else:
        stubnames = list(stubnames)

Example of a spurious error:

import pandas as pd

foo = {
   'node_id': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4}, 
   'A': {0: 0.805236, 1: 0.0, 2: 0.250981, 3: 1.0, 4: 0.812131}, 
   'PA0': {0: 0.775511, 1: 0.566016, 2: 0.5676359999999999, 3: 0.9837879999999999, 4: 0.67783},
   'PA1': {0: 0.775511, 1: 0.64623, 2: 0.525783, 3: 0.9837879999999999, 4: 0.67783},
   'PA3': {0: 0.775511, 1: 0.703248, 2: 0.526626, 3: 0.985926, 4: 0.67783}
}

test_df = pd.DataFrame.from_dict(foo)

pd.wide_to_long(
    test_df,
    stubnames='PA',  #also try as list ['PA'] 
    i = ['node_id', 'A'],
    j = 'time'
)

@chrisjcameron
Copy link
Author

PS: this is a simple fix involving swapping the order of existing code.

@csmcallister
Copy link
Contributor

Hi all. I just submitted a PR (my first!) for this issue. #22490

I tried my best to follow the docs and use the commit conventions. Open to any feedback if I'm going through this process incorrectly.

Also, thank you for maintaining such an awesome package. I love pandas!

@gfyoung gfyoung added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Needs Info Clarification about behavior needed to assess issue labels Aug 24, 2018
@jreback jreback added this to the 0.24.0 milestone Sep 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants