Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wide_to_long mishandles string arg for `stubnames` #22468

Closed
chrisjcameron opened this issue Aug 22, 2018 · 5 comments

Comments

Projects
None yet
5 participants
@chrisjcameron
Copy link

commented Aug 22, 2018

wide_to_long should sanitize/format input args BEFORE sanity checking.

This bug report pertains to wide_to_long() in master/pandas/core/reshape/melt.py

Code Sample, a copy-pastable example if possible

# This check (~line 411) needs to move below input sanitization (~line 425).
 if any(col in stubnames for col in df.columns):
        raise ValueError("stubname can't be identical to a column name")

Problem description

The intention of the sanity checking code is to check if any element of the list stubname is in the the list of column names and generate an exception (line 411). Unfortunately, this check occours before the input sanitization that ensures that stubnames is a list -- it is also permitted to be string. As it is, the sanity check sometimes looks for substrings in a string and sometimes looks for substrings in a list, depending on the arg type for stubnames.

Expected Output

na

Output of pd.show_versions()

It is in master branch on github as of 8/22/2018

@WillAyd

This comment has been minimized.

Copy link
Member

commented Aug 22, 2018

Can you provide a minimal code sample to reproduce the issue from an end user perspective?

@WillAyd WillAyd added the Needs Info label Aug 22, 2018

@chrisjcameron

This comment has been minimized.

Copy link
Author

commented Aug 22, 2018

I will work on one that makes an error, but the error is not the issue - it a logic error that arises from assuming that stubnames is always a list BEFORE the code that enforces that assumption has a chance to run.

@chrisjcameron

This comment has been minimized.

Copy link
Author

commented Aug 23, 2018

The following code sample throws an error when stubnames='PA' and works when stubnames=['PA']. This illustrates how an end user might be impacted by the bug.

Notice that when stubnames is a string, line 411 (above) is actually checking the following:
any( substring in stubnames_str for substring in df.columns)
Because the column A is a substring of PA, it throws an error. The intention is to check if any column name matches the string PA, which it also fails to do.

Lines 415+ (below) need to run before the code on line 411:

# ~line 415
    if not is_list_like(stubnames):
        stubnames = [stubnames]
    else:
        stubnames = list(stubnames)

Example of a spurious error:

import pandas as pd

foo = {
   'node_id': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4}, 
   'A': {0: 0.805236, 1: 0.0, 2: 0.250981, 3: 1.0, 4: 0.812131}, 
   'PA0': {0: 0.775511, 1: 0.566016, 2: 0.5676359999999999, 3: 0.9837879999999999, 4: 0.67783},
   'PA1': {0: 0.775511, 1: 0.64623, 2: 0.525783, 3: 0.9837879999999999, 4: 0.67783},
   'PA3': {0: 0.775511, 1: 0.703248, 2: 0.526626, 3: 0.985926, 4: 0.67783}
}

test_df = pd.DataFrame.from_dict(foo)

pd.wide_to_long(
    test_df,
    stubnames='PA',  #also try as list ['PA'] 
    i = ['node_id', 'A'],
    j = 'time'
)
@chrisjcameron

This comment has been minimized.

Copy link
Author

commented Aug 23, 2018

PS: this is a simple fix involving swapping the order of existing code.

@csmcallister

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2018

Hi all. I just submitted a PR (my first!) for this issue. #22490

I tried my best to follow the docs and use the commit conventions. Open to any feedback if I'm going through this process incorrectly.

Also, thank you for maintaining such an awesome package. I love pandas!

@gfyoung gfyoung added Bug Reshaping and removed Needs Info labels Aug 24, 2018

@jreback jreback added this to the 0.24.0 milestone Sep 18, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.