Skip to content

BUG: weird behaviour of pivot_table when aggfunc tries to join string sequence containing None #33849

@jkukul

Description

@jkukul
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd

df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                          "bar", "bar", "bar", "bar"],
                   "B": ["one", "one", "one", "two", "two",
                         "one", "one", "two", None],
                   "C": ["small", "large", "large", "small",
                         "small", "large", "small", "small",
                         "large"]})
df.pivot_table(index='A', columns='C', values='B', aggfunc=lambda x: ' '.join(x))

Problem description

None value in the B column should make the aggregation fail or, alternatively, the aggregation should ignore the None value. Instead, the output looks like this:

C    large  small
A
bar  A B C  A B C
foo  A B C  A B C

It looks like the names of all the pivot-ed columns were actually aggregated, which is a very surprising behaviour.

Expected Output

TypeError: sequence item X: expected str instance, NoneType found, just like ''.join([..., None]) would return

Alternatively, the None value could be ignored and then the output could look like this:

C      large        small
A
bar      one     one two
foo  one one  one two two

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None

pandas: 0.20.1
pytest: None
pip: 20.1
setuptools: 46.1.3
Cython: 0.28.5
numpy: 1.15.0
scipy: 1.1.0
xarray: None
IPython: 7.6.1
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.1.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.8.0
html5lib: None
sqlalchemy: 1.2.1
pymysql: None
psycopg2: 2.8.3 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions