Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: unstack doesn't preserve categorical dtype #14018

Closed
pijucha opened this issue Aug 16, 2016 · 10 comments · Fixed by #41482
Closed

BUG: unstack doesn't preserve categorical dtype #14018

pijucha opened this issue Aug 16, 2016 · 10 comments · Fixed by #41482
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@pijucha
Copy link
Contributor

pijucha commented Aug 16, 2016

Code Sample, a copy-pastable example if possible

idx = pd.MultiIndex.from_product([['A'], [0, 1]])
df = pd.DataFrame({'cat': pd.Categorical(['a', 'b'])}, index=idx)
df
Out[19]: 
    cat
A 0   a
  1   b

df.unstack()
Out[21]: 
  cat   
    0  1
A   a  b

Categorical dtype is lost:

df.unstack().dtypes
Out[22]: 
cat  0    object
     1    object
dtype: object

But it works ok for a Series:

df['cat'].unstack()
Out[24]: 
   0  1
A  a  b
df['cat'].unstack().dtypes
Out[25]: 
0    category
1    category
dtype: object

Expected Output

I believe df.unstack() should preserve categorical dtypes as df['cat'].unstack() does.

output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS
------------------
commit: 5d791cc7d955c0b074ad602eb03fa32bd3e17503
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.1.20-1
machine: x86_64
processor: Intel(R)_Core(TM)_i5-2520M_CPU_@_2.50GHz
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.18.1+368.g5d791cc
nose: 1.3.7
pip: 8.1.2
setuptools: 21.0.0
Cython: 0.24
numpy: 1.11.0
...

This is related to #13743 (and PR #13854), though a bit more difficult to fix.

@jreback jreback changed the title unstack doesn't preserve categorical dtype BUG: unstack doesn't preserve categorical dtype Aug 17, 2016
@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Categorical Categorical Data Type Difficulty Intermediate labels Aug 17, 2016
@jreback jreback added this to the Next Major Release milestone Aug 17, 2016
@mroeschke
Copy link
Member

Looks to be fixed on master. Could use a test

In [26]: df.unstack().dtypes
    ...:
Out[26]:
cat  0    category
     1    category
dtype: object

In [27]: pd.__version__
Out[27]: '1.1.0.dev0+1974.g0159cba6e'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jun 28, 2020
@Abhispy007
Copy link

I can take this one? Just need guidance on which test file to edit? First time Git contributor

@mroeschke
Copy link
Member

Sure @Abhispy007. pandas/tests/frame/tests_reshape.py would be a good place this test

@Abhispy007
Copy link

Afraid I don't understand the problem at hand here? The dtype remains the same?
image

@mroeschke
Copy link
Member

That dtype is the dtype of the resulting Series.

Since Series is holding dtype objects, that should be object

@Abhispy007
Copy link

I see, is this what it should look like?
image
If not, is it possible to set up a 2 minute call? I genuinely want to contribute

@mroeschke
Copy link
Member

Sorry don't have time for a call. But the output should be as noted in #14018 (comment)

@Abhispy007
Copy link

Came up with this test

def test_unstack_preserves_categorical_values():
    # GH 14018
    idx = pd.MultiIndex.from_product(
        [['A'], [0, 1]]
    )
    df = DataFrame(
        {'cat': pd.Categorical(['a', 'b'])},
        index=idx
    )
    assert np.array(df.unstack()).dtype.name == 'category'

Appropriately fails with AssertionError: assert 'object' == 'category'.
Should I submit a pull request? Forgive me if I'm being pedantic, first time contributor issues

@mroeschke
Copy link
Member

You can use tm.assert_frame_equal to create the expected dataframe and compare the result of df.unstack() to the expected dataframe.

No worries, yes please open a pull request. Discussing the test in a pull request will be a lot easier

@Abhispy007
Copy link

Turns out my test doesn't work... Fails on df[cat].unstack().dtypes as well.
Biggest problem I'm having is that the terminal output I get is different than my PyCharm Output. Even though both PyCharm and terminal are using the same conda environment/Python version. Not sure what to do
IDE is giving correct output, #14018 comments
Terminal is giving the incorrect output #14018

@jbrockmendel jbrockmendel added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Sep 20, 2020
@mroeschke mroeschke modified the milestones: Contributions Welcome, 1.3 May 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants