Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange output from DataFrame.apply when applied func creates a dict #8735

Closed
onesandzeroes opened this issue Nov 5, 2014 · 4 comments
Closed

Comments

@onesandzeroes
Copy link
Contributor

Just had something odd come up while trying to come up with something for this SO question.

If we use DataFrame.apply() to try and create dictionaries from the rows of a dataframe, it seems to return the dict.values() method rather than returning the dict itself.

df = pd.DataFrame({'k': ['a', 'b', 'c'], 'v': [1, 2, 3]})
df.apply(lambda row: {row['k']: row['v']}, axis=1)
Out[52]: 
0    <built-in method values of dict object at 0x07...
1    <built-in method values of dict object at 0x03...
2    <built-in method values of dict object at 0x07...
dtype: object

Looks like it's probably something to do with trying to grab the values attribute when the output of the applied function is a Series or something similar.

Library versions:

pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 30 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.15.0
nose: 1.3.4
Cython: 0.21
numpy: 1.9.0
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.3.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 1.5
pytz: 2014.7
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.2
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.7
lxml: 3.4.0
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None
@jreback
Copy link
Contributor

jreback commented Nov 7, 2014

This inference is done in cython. It is indeed trying to get the values attribute (e.g. _values_from_object). I suppose you could fix it.

Do you want something like this?

In [1]: df = pd.DataFrame({'k': ['a', 'b', 'c'], 'v': [1, 2, 3]})

In [2]: def f(row):
   ...:         return pd.Series({row['k']: row['v']})
   ...: 

In [3]: df.apply(f,axis=1)
Out[3]: 
    a   b   c
0   1 NaN NaN
1 NaN   2 NaN
2 NaN NaN   3

@ringw
Copy link
Contributor

ringw commented Jul 29, 2015

Hi all, I've run into the same issue myself. I actually need to pass a list of dicts for each row as my output, and it would be nice to be able to do list(df.apply(lambda x: x.to_dict(), 0)). This would require df.apply to return a Series of dtype object, where the elements are dicts.

apply returning a Series of dicts would be consistent with the behavior when the passed function returns an object Pandas doesn't understand (apparently, any non-numeric without a values attribute). I guess I can add a check to _values_from_object to make sure the input is, say, a subclass of PandasObject.

@Casyfill
Copy link

Hey, I still have this issue with my function returing a dict as part of aggregation

@jreback
Copy link
Contributor

jreback commented Dec 13, 2015

well this was fixed in 0.17.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants