Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning Series of dicts from DataFrame.apply #1738

Closed
wesm opened this issue Aug 7, 2012 · 2 comments
Closed

Returning Series of dicts from DataFrame.apply #1738

wesm opened this issue Aug 7, 2012 · 2 comments
Milestone

Comments

@wesm
Copy link
Member

wesm commented Aug 7, 2012

from the mailing list cc @lodagro

That is indeed odd.
There also seems to be a dependency on the dtype. Below an example for all ints and all floats.

In [121]: df
Out[121]: 
   A  B
a  7  0
b  7 -2

In [122]: df.apply(foo, 1)
Out[122]: 
    A   B
a NaN NaN
b NaN NaN

All NaN???
I would expect to see this:

In [123]: s = pandas.Series([foo(row[1]) for row in df.iterrows()], df.index)

In [124]: s
Out[124]: 
a     {'properties': {'A': 7, 'B': 0}}
b    {'properties': {'A': 7, 'B': -2}}

Seems to work fine for all floats.

In [125]: df = pandas.DataFrame(np.random.randn(2, 2), columns=list('AB'), index=list('ab'))

In [126]: df
Out[126]: 
          A         B
a -0.407883  0.018206
b -1.081038  0.492944

In [127]: df.apply(foo, 1)
Out[127]: 
a    {'properties': {'A': -0.407882576359619, 'B': 0.0
b    {'properties': {'A': -1.081038117264707, 'B': 0.4
@lodagro
Copy link
Contributor

lodagro commented Aug 16, 2012

Related to this, from mailing list
Returning Series of arrays from DataFrame.apply.

df = pandas.DataFrame({'group1': ['a','a','a','b','b','b','a','a','a','b','b','b'],
                       'group2': ['c','c','d','d','d','e','c','c','d','d','d','e'],
                       'weight': [1.1,2,3,4,5,6,2,4,6,8,1,2],
                       'value': [7.1,8,9,10,11,12,8,7,6,5,4,3]
})
df = df.set_index(['group1', 'group2'])
df_grouped = df.groupby(level=['group1','group2'], sort=True)

def noddy(value, weight):
    out = numpy.array( value * weight ).repeat(3)
    return list(out)

no_toes = df_grouped.apply(lambda x: noddy(x.value, x.weight ))

no_toes

group1  group2
a       c         [7.8100000000000005, 7.8100000000000005, 7.810000
        d                      [27.0, 27.0, 27.0, 36.0, 36.0, 36.0]
b       d         [40.0, 40.0, 40.0, 55.0, 55.0, 55.0, 40.0, 40.0, 
        e                         [72.0, 72.0, 72.0, 6.0, 6.0, 6.0]


def noddy(value, weight):
    out = numpy.array( value * weight ).repeat(3)
    return out


no_toes = df_grouped.apply(lambda x: noddy(x.value, x.weight ))
---------------------------------------------------------------------------
...
ValueError: array dimensions must agree except for d_0

@ghost ghost self-assigned this Mar 23, 2013
@ghost
Copy link

ghost commented Mar 23, 2013

The first I can't repro, seems to be fixed.
the second is a case of type sniffing gone wrong, if an exception
is raised due to unequal len arrays in list, it will now fall back and match
the behaviour for a list of unequal lists. 6626a7a

@ghost ghost closed this as completed Mar 23, 2013
@wesm wesm unassigned ghost Oct 12, 2016
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants