-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
import pandas as pd
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
"bar", "bar", "bar", "bar"],
"B": ["one", "one", "one", "two", "two",
"one", "one", "two", None],
"C": ["small", "large", "large", "small",
"small", "large", "small", "small",
"large"]})
df.pivot_table(index='A', columns='C', values='B', aggfunc=lambda x: ' '.join(x))
Problem description
None
value in the B
column should make the aggregation fail or, alternatively, the aggregation should ignore the None
value. Instead, the output looks like this:
C large small
A
bar A B C A B C
foo A B C A B C
It looks like the names of all the pivot-ed columns were actually aggregated, which is a very surprising behaviour.
Expected Output
TypeError: sequence item X: expected str instance, NoneType found
, just like ''.join([..., None])
would return
Alternatively, the None
value could be ignored and then the output could look like this:
C large small
A
bar one one two
foo one one one two two
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
pandas: 0.20.1
pytest: None
pip: 20.1
setuptools: 46.1.3
Cython: 0.28.5
numpy: 1.15.0
scipy: 1.1.0
xarray: None
IPython: 7.6.1
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.1.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.8.0
html5lib: None
sqlalchemy: 1.2.1
pymysql: None
psycopg2: 2.8.3 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
pandas_gbq: None
pandas_datareader: None