Strange output from `DataFrame.apply` when applied func creates a dict #8735

Closed
onesandzeroes opened this Issue Nov 5, 2014 · 4 comments

Comments

Projects
None yet
4 participants
Contributor

onesandzeroes commented Nov 5, 2014

Just had something odd come up while trying to come up with something for this SO question.

If we use DataFrame.apply() to try and create dictionaries from the rows of a dataframe, it seems to return the dict.values() method rather than returning the dict itself.

df = pd.DataFrame({'k': ['a', 'b', 'c'], 'v': [1, 2, 3]})
df.apply(lambda row: {row['k']: row['v']}, axis=1)
Out[52]: 
0    <built-in method values of dict object at 0x07...
1    <built-in method values of dict object at 0x03...
2    <built-in method values of dict object at 0x07...
dtype: object

Looks like it's probably something to do with trying to grab the values attribute when the output of the applied function is a Series or something similar.

Library versions:

pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 30 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.15.0
nose: 1.3.4
Cython: 0.21
numpy: 1.9.0
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.3.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 1.5
pytz: 2014.7
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.2
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.7
lxml: 3.4.0
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None
Contributor

jreback commented Nov 7, 2014

This inference is done in cython. It is indeed trying to get the values attribute (e.g. _values_from_object). I suppose you could fix it.

Do you want something like this?

In [1]: df = pd.DataFrame({'k': ['a', 'b', 'c'], 'v': [1, 2, 3]})

In [2]: def f(row):
   ...:         return pd.Series({row['k']: row['v']})
   ...: 

In [3]: df.apply(f,axis=1)
Out[3]: 
    a   b   c
0   1 NaN NaN
1 NaN   2 NaN
2 NaN NaN   3

jreback added the API Design label Nov 7, 2014

Contributor

ringw commented Jul 29, 2015

Hi all, I've run into the same issue myself. I actually need to pass a list of dicts for each row as my output, and it would be nice to be able to do list(df.apply(lambda x: x.to_dict(), 0)). This would require df.apply to return a Series of dtype object, where the elements are dicts.

apply returning a Series of dicts would be consistent with the behavior when the passed function returns an object Pandas doesn't understand (apparently, any non-numeric without a values attribute). I guess I can add a check to _values_from_object to make sure the input is, say, a subclass of PandasObject.

jreback added this to the 0.17.0 milestone Aug 3, 2015

jreback added the Bug label Aug 3, 2015

jreback closed this in 59da781 Aug 28, 2015

Hey, I still have this issue with my function returing a dict as part of aggregation

Contributor

jreback commented Dec 13, 2015

well this was fixed in 0.17.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment