wide_to_long mishandles string arg for `stubnames` #22468

chrisjcameron · 2018-08-22T15:51:38Z

wide_to_long should sanitize/format input args BEFORE sanity checking.

This bug report pertains to wide_to_long() in master/pandas/core/reshape/melt.py

Code Sample, a copy-pastable example if possible

# This check (~line 411) needs to move below input sanitization (~line 425).
 if any(col in stubnames for col in df.columns):
        raise ValueError("stubname can't be identical to a column name")

Problem description

The intention of the sanity checking code is to check if any element of the list stubname is in the the list of column names and generate an exception (line 411). Unfortunately, this check occours before the input sanitization that ensures that stubnames is a list -- it is also permitted to be string. As it is, the sanity check sometimes looks for substrings in a string and sometimes looks for substrings in a list, depending on the arg type for stubnames.

Expected Output

na

Output of `pd.show_versions()`

It is in master branch on github as of 8/22/2018

The text was updated successfully, but these errors were encountered:

WillAyd · 2018-08-22T17:56:15Z

Can you provide a minimal code sample to reproduce the issue from an end user perspective?

chrisjcameron · 2018-08-22T17:59:01Z

I will work on one that makes an error, but the error is not the issue - it a logic error that arises from assuming that stubnames is always a list BEFORE the code that enforces that assumption has a chance to run.

chrisjcameron · 2018-08-23T20:59:14Z

The following code sample throws an error when stubnames='PA' and works when stubnames=['PA']. This illustrates how an end user might be impacted by the bug.

Notice that when stubnames is a string, line 411 (above) is actually checking the following:
any( substring in stubnames_str for substring in df.columns)
Because the column A is a substring of PA, it throws an error. The intention is to check if any column name matches the string PA, which it also fails to do.

Lines 415+ (below) need to run before the code on line 411:

# ~line 415
    if not is_list_like(stubnames):
        stubnames = [stubnames]
    else:
        stubnames = list(stubnames)

Example of a spurious error:

import pandas as pd

foo = {
   'node_id': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4}, 
   'A': {0: 0.805236, 1: 0.0, 2: 0.250981, 3: 1.0, 4: 0.812131}, 
   'PA0': {0: 0.775511, 1: 0.566016, 2: 0.5676359999999999, 3: 0.9837879999999999, 4: 0.67783},
   'PA1': {0: 0.775511, 1: 0.64623, 2: 0.525783, 3: 0.9837879999999999, 4: 0.67783},
   'PA3': {0: 0.775511, 1: 0.703248, 2: 0.526626, 3: 0.985926, 4: 0.67783}
}

test_df = pd.DataFrame.from_dict(foo)

pd.wide_to_long(
    test_df,
    stubnames='PA',  #also try as list ['PA'] 
    i = ['node_id', 'A'],
    j = 'time'
)

chrisjcameron · 2018-08-23T21:01:41Z

PS: this is a simple fix involving swapping the order of existing code.

csmcallister · 2018-08-23T22:23:48Z

Hi all. I just submitted a PR (my first!) for this issue. #22490

I tried my best to follow the docs and use the commit conventions. Open to any feedback if I'm going through this process incorrectly.

Also, thank you for maintaining such an awesome package. I love pandas!

WillAyd added the Needs Info Clarification about behavior needed to assess issue label Aug 22, 2018

csmcallister mentioned this issue Aug 23, 2018

BUG:reorder type check/conversion so wide_to_long handles str arg for… #22490

Merged

4 tasks

gfyoung added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Needs Info Clarification about behavior needed to assess issue labels Aug 24, 2018

jreback added this to the 0.24.0 milestone Sep 18, 2018

jreback closed this as completed in #22490 Sep 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wide_to_long mishandles string arg for `stubnames` #22468

wide_to_long mishandles string arg for `stubnames` #22468

chrisjcameron commented Aug 22, 2018

WillAyd commented Aug 22, 2018

chrisjcameron commented Aug 22, 2018

chrisjcameron commented Aug 23, 2018

chrisjcameron commented Aug 23, 2018

csmcallister commented Aug 23, 2018

wide_to_long mishandles string arg for stubnames #22468

wide_to_long mishandles string arg for stubnames #22468

Comments

chrisjcameron commented Aug 22, 2018

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

WillAyd commented Aug 22, 2018

chrisjcameron commented Aug 22, 2018

chrisjcameron commented Aug 23, 2018

chrisjcameron commented Aug 23, 2018

csmcallister commented Aug 23, 2018

wide_to_long mishandles string arg for `stubnames` #22468

wide_to_long mishandles string arg for `stubnames` #22468

Output of `pd.show_versions()`