Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC/BUG? pivot functionality confusing/inconsistent #8160

Closed
seth-p opened this issue Sep 2, 2014 · 3 comments
Closed

DOC/BUG? pivot functionality confusing/inconsistent #8160

seth-p opened this issue Sep 2, 2014 · 3 comments
Labels
Docs good first issue Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@seth-p
Copy link
Contributor

seth-p commented Sep 2, 2014

  1. The docstring of the (non-member) pivot() function, http://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot.html#pandas.pivot, says Produce ‘pivot’ table based on 3 columns of this DataFrame. Uses unique values from index / columns and fills with values. But there is no DataFrame argument, and so no "this DataFrame". Is this an internal function that shouldn't be exposed? Or is the docstring wrong?
  2. While the (non-member) pivot_table() supports specifying multiple columns for columns, so that the resulting table has multi-index columns, DataFrame.pivot() does not. Any reason it doesn't? I would have expected the two functions to behave similarly. Granted, the docstring for DataFrame.pivot() doesn't claim that it supports multiple columns for columns, so this isn't a bug, but it does seem inconsistent (and restrictive) vs. pivot_table().
In [2]: from pandas import DataFrame, pivot_table

In [3]: df = DataFrame([['foo', 'ABC', 'A', 1],
   ...:                 ['foo', 'ABC', 'B', 2],
   ...:                 ['foo', 'XYZ', 'X', 3],
   ...:                 ['foo', 'XYZ', 'Y', 4],
   ...:                 ['bar', 'ABC', 'B', 5],
   ...:                 ['bar', 'XYZ', 'X', 6]],
   ...:                columns=['FooBar', 'TLA', 'Letter', 'Number'])

In [4]: df
Out[4]:
  FooBar  TLA Letter  Number
0    foo  ABC      A       1
1    foo  ABC      B       2
2    foo  XYZ      X       3
3    foo  XYZ      Y       4
4    bar  ABC      B       5
5    bar  XYZ      X       6

In [11]: pivot_table(df, index='FooBar', columns=['TLA', 'Letter'], values='Number')
Out[11]:
TLA     ABC     XYZ
Letter    A  B    X   Y
FooBar
bar     NaN  5    6 NaN
foo       1  2    3   4

In [13]: df.pivot(index='FooBar', columns=['TLA', 'Letter'], values='Number')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-8585f7e09b0c> in <module>()
----> 1 df.pivot(index='FooBar', columns=['TLA', 'Letter'], values='Number')

C:\Python34\lib\site-packages\pandas\core\frame.py in pivot(self, index, columns, values)
   3264         """
   3265         from pandas.core.reshape import pivot
-> 3266         return pivot(self, index=index, columns=columns, values=values)
   3267
   3268     def stack(self, level=-1, dropna=True):

C:\Python34\lib\site-packages\pandas\core\reshape.py in pivot(self, index, columns, values)
    357         indexed = Series(self[values].values,
    358                          index=MultiIndex.from_arrays([self[index],
--> 359                                                        self[columns]]))
    360         return indexed.unstack(columns)
    361

C:\Python34\lib\site-packages\pandas\core\index.py in from_arrays(cls, arrays, sortorder, names)
   2795             return Index(arrays[0], name=name)
   2796
-> 2797         cats = [Categorical.from_array(arr) for arr in arrays]
   2798         levels = [c.levels for c in cats]
   2799         labels = [c.labels for c in cats]

C:\Python34\lib\site-packages\pandas\core\index.py in <listcomp>(.0)
   2795             return Index(arrays[0], name=name)
   2796
-> 2797         cats = [Categorical.from_array(arr) for arr in arrays]
   2798         levels = [c.levels for c in cats]
   2799         labels = [c.labels for c in cats]

C:\Python34\lib\site-packages\pandas\core\categorical.py in from_array(cls, data)
    101             the unique values of `data`.
    102         """
--> 103         return Categorical(data)
    104
    105     _levels = None

C:\Python34\lib\site-packages\pandas\core\categorical.py in __init__(self, labels, levels, name)
     82                 name = getattr(labels, 'name', None)
     83             try:
---> 84                 labels, levels = factorize(labels, sort=True)
     85             except TypeError:
     86                 labels, levels = factorize(labels, sort=False)

C:\Python34\lib\site-packages\pandas\core\algorithms.py in factorize(values, sort, order, na_sentinel)
    128     table = hash_klass(len(vals))
    129     uniques = vec_klass()
--> 130     labels = table.get_labels(vals, uniques, 0, na_sentinel)
    131
    132     labels = com._ensure_platform_int(labels)

C:\Python34\lib\site-packages\pandas\hashtable.pyd in pandas.hashtable.PyObjectHashTable.get_labels (pandas\hashtable.c:13534)()

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

This is with Pandas v0.14.1.

@seth-p
Copy link
Contributor Author

seth-p commented Oct 16, 2014

Re # 2. See also https://groups.google.com/forum/#!topic/pydata/hjv_KeeuKsA, where the issue is that DataFrame.pivot() doesn't support a multi-entry list for index (vs the problem with columns mentioned above).

@TomAugspurger TomAugspurger added this to the Next Major Release milestone Jul 8, 2017
@TomAugspurger
Copy link
Contributor

For the first one, yes the docstring should remove the reference to "this dataframe"

@datapythonista datapythonista modified the milestones: Contributions Welcome, Someday Jul 8, 2018
@mroeschke mroeschke removed this from the Someday milestone Oct 13, 2022
@MarcoGorelli
Copy link
Member

this works now

In [2]: pivot_table(df, index='FooBar', columns=['TLA', 'Letter'], values='Number')
Out[2]: #
TLA     ABC       XYZ
Letter    A    B    X    Y
FooBar
bar     NaN  5.0  6.0  NaN
foo     1.0  2.0  3.0  4.0

In [3]: df.pivot(index='FooBar', columns=['TLA', 'Letter'], values='Number')
Out[3]:
TLA     ABC       XYZ
Letter    A    B    X    Y
FooBar
bar     NaN  5.0  6.0  NaN
foo     1.0  2.0  3.0  4.0

and the current docstring doesn't mention "this dataframe", so let's close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs good first issue Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

8 participants