DOC/BUG? pivot functionality confusing/inconsistent #8160

seth-p · 2014-09-02T14:21:44Z

The docstring of the (non-member) pivot() function, http://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot.html#pandas.pivot, says Produce ‘pivot’ table based on 3 columns of this DataFrame. Uses unique values from index / columns and fills with values. But there is no DataFrame argument, and so no "this DataFrame". Is this an internal function that shouldn't be exposed? Or is the docstring wrong?
While the (non-member) pivot_table() supports specifying multiple columns for columns, so that the resulting table has multi-index columns, DataFrame.pivot() does not. Any reason it doesn't? I would have expected the two functions to behave similarly. Granted, the docstring for DataFrame.pivot() doesn't claim that it supports multiple columns for columns, so this isn't a bug, but it does seem inconsistent (and restrictive) vs. pivot_table().

In [2]: from pandas import DataFrame, pivot_table

In [3]: df = DataFrame([['foo', 'ABC', 'A', 1],
   ...:                 ['foo', 'ABC', 'B', 2],
   ...:                 ['foo', 'XYZ', 'X', 3],
   ...:                 ['foo', 'XYZ', 'Y', 4],
   ...:                 ['bar', 'ABC', 'B', 5],
   ...:                 ['bar', 'XYZ', 'X', 6]],
   ...:                columns=['FooBar', 'TLA', 'Letter', 'Number'])

In [4]: df
Out[4]:
  FooBar  TLA Letter  Number
0    foo  ABC      A       1
1    foo  ABC      B       2
2    foo  XYZ      X       3
3    foo  XYZ      Y       4
4    bar  ABC      B       5
5    bar  XYZ      X       6

In [11]: pivot_table(df, index='FooBar', columns=['TLA', 'Letter'], values='Number')
Out[11]:
TLA     ABC     XYZ
Letter    A  B    X   Y
FooBar
bar     NaN  5    6 NaN
foo       1  2    3   4

In [13]: df.pivot(index='FooBar', columns=['TLA', 'Letter'], values='Number')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-8585f7e09b0c> in <module>()
----> 1 df.pivot(index='FooBar', columns=['TLA', 'Letter'], values='Number')

C:\Python34\lib\site-packages\pandas\core\frame.py in pivot(self, index, columns, values)
   3264         """
   3265         from pandas.core.reshape import pivot
-> 3266         return pivot(self, index=index, columns=columns, values=values)
   3267
   3268     def stack(self, level=-1, dropna=True):

C:\Python34\lib\site-packages\pandas\core\reshape.py in pivot(self, index, columns, values)
    357         indexed = Series(self[values].values,
    358                          index=MultiIndex.from_arrays([self[index],
--> 359                                                        self[columns]]))
    360         return indexed.unstack(columns)
    361

C:\Python34\lib\site-packages\pandas\core\index.py in from_arrays(cls, arrays, sortorder, names)
   2795             return Index(arrays[0], name=name)
   2796
-> 2797         cats = [Categorical.from_array(arr) for arr in arrays]
   2798         levels = [c.levels for c in cats]
   2799         labels = [c.labels for c in cats]

C:\Python34\lib\site-packages\pandas\core\index.py in <listcomp>(.0)
   2795             return Index(arrays[0], name=name)
   2796
-> 2797         cats = [Categorical.from_array(arr) for arr in arrays]
   2798         levels = [c.levels for c in cats]
   2799         labels = [c.labels for c in cats]

C:\Python34\lib\site-packages\pandas\core\categorical.py in from_array(cls, data)
    101             the unique values of `data`.
    102         """
--> 103         return Categorical(data)
    104
    105     _levels = None

C:\Python34\lib\site-packages\pandas\core\categorical.py in __init__(self, labels, levels, name)
     82                 name = getattr(labels, 'name', None)
     83             try:
---> 84                 labels, levels = factorize(labels, sort=True)
     85             except TypeError:
     86                 labels, levels = factorize(labels, sort=False)

C:\Python34\lib\site-packages\pandas\core\algorithms.py in factorize(values, sort, order, na_sentinel)
    128     table = hash_klass(len(vals))
    129     uniques = vec_klass()
--> 130     labels = table.get_labels(vals, uniques, 0, na_sentinel)
    131
    132     labels = com._ensure_platform_int(labels)

C:\Python34\lib\site-packages\pandas\hashtable.pyd in pandas.hashtable.PyObjectHashTable.get_labels (pandas\hashtable.c:13534)()

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

This is with Pandas v0.14.1.

The text was updated successfully, but these errors were encountered:

seth-p · 2014-10-16T19:01:51Z

Re # 2. See also https://groups.google.com/forum/#!topic/pydata/hjv_KeeuKsA, where the issue is that DataFrame.pivot() doesn't support a multi-entry list for index (vs the problem with columns mentioned above).

TomAugspurger · 2018-07-06T22:42:05Z

For the first one, yes the docstring should remove the reference to "this dataframe"

MarcoGorelli · 2023-03-30T10:27:48Z

this works now

In [2]: pivot_table(df, index='FooBar', columns=['TLA', 'Letter'], values='Number')
Out[2]: #
TLA     ABC       XYZ
Letter    A    B    X    Y
FooBar
bar     NaN  5.0  6.0  NaN
foo     1.0  2.0  3.0  4.0

In [3]: df.pivot(index='FooBar', columns=['TLA', 'Letter'], values='Number')
Out[3]:
TLA     ABC       XYZ
Letter    A    B    X    Y
FooBar
bar     NaN  5.0  6.0  NaN
foo     1.0  2.0  3.0  4.0

and the current docstring doesn't mention "this dataframe", so let's close

jreback added the Reshaping label Sep 4, 2014

jorisvandenbossche added the Docs label Feb 4, 2015

jreback mentioned this issue Jun 2, 2017

DOC: pivot function #16578

Open

TomAugspurger added Difficulty Intermediate labels Jul 8, 2017

TomAugspurger added this to the Next Major Release milestone Jul 8, 2017

TomAugspurger added the good first issue label Jul 6, 2018

datapythonista modified the milestones: Contributions Welcome, Someday Jul 8, 2018

jbrockmendel removed Difficulty Intermediate labels Oct 21, 2019

mroeschke removed this from the Someday milestone Oct 13, 2022

MarcoGorelli closed this as completed Mar 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC/BUG? pivot functionality confusing/inconsistent #8160

DOC/BUG? pivot functionality confusing/inconsistent #8160

seth-p commented Sep 2, 2014

seth-p commented Oct 16, 2014

TomAugspurger commented Jul 6, 2018

MarcoGorelli commented Mar 30, 2023

DOC/BUG? pivot functionality confusing/inconsistent #8160

DOC/BUG? pivot functionality confusing/inconsistent #8160

Comments

seth-p commented Sep 2, 2014

seth-p commented Oct 16, 2014

TomAugspurger commented Jul 6, 2018

MarcoGorelli commented Mar 30, 2023