Returning Series of dicts from DataFrame.apply #1738

Closed
wesm opened this Issue Aug 7, 2012 · 2 comments

Comments

Projects
None yet
3 participants
Owner

wesm commented Aug 7, 2012

from the mailing list cc @lodagro

That is indeed odd.
There also seems to be a dependency on the dtype. Below an example for all ints and all floats.

In [121]: df
Out[121]: 
   A  B
a  7  0
b  7 -2

In [122]: df.apply(foo, 1)
Out[122]: 
    A   B
a NaN NaN
b NaN NaN

All NaN???
I would expect to see this:

In [123]: s = pandas.Series([foo(row[1]) for row in df.iterrows()], df.index)

In [124]: s
Out[124]: 
a     {'properties': {'A': 7, 'B': 0}}
b    {'properties': {'A': 7, 'B': -2}}

Seems to work fine for all floats.

In [125]: df = pandas.DataFrame(np.random.randn(2, 2), columns=list('AB'), index=list('ab'))

In [126]: df
Out[126]: 
          A         B
a -0.407883  0.018206
b -1.081038  0.492944

In [127]: df.apply(foo, 1)
Out[127]: 
a    {'properties': {'A': -0.407882576359619, 'B': 0.0
b    {'properties': {'A': -1.081038117264707, 'B': 0.4
Contributor

lodagro commented Aug 16, 2012

Related to this, from mailing list
Returning Series of arrays from DataFrame.apply.

df = pandas.DataFrame({'group1': ['a','a','a','b','b','b','a','a','a','b','b','b'],
                       'group2': ['c','c','d','d','d','e','c','c','d','d','d','e'],
                       'weight': [1.1,2,3,4,5,6,2,4,6,8,1,2],
                       'value': [7.1,8,9,10,11,12,8,7,6,5,4,3]
})
df = df.set_index(['group1', 'group2'])
df_grouped = df.groupby(level=['group1','group2'], sort=True)

def noddy(value, weight):
    out = numpy.array( value * weight ).repeat(3)
    return list(out)

no_toes = df_grouped.apply(lambda x: noddy(x.value, x.weight ))

no_toes

group1  group2
a       c         [7.8100000000000005, 7.8100000000000005, 7.810000
        d                      [27.0, 27.0, 27.0, 36.0, 36.0, 36.0]
b       d         [40.0, 40.0, 40.0, 55.0, 55.0, 55.0, 40.0, 40.0, 
        e                         [72.0, 72.0, 72.0, 6.0, 6.0, 6.0]


def noddy(value, weight):
    out = numpy.array( value * weight ).repeat(3)
    return out


no_toes = df_grouped.apply(lambda x: noddy(x.value, x.weight ))
---------------------------------------------------------------------------
...
ValueError: array dimensions must agree except for d_0

y-p was assigned Mar 23, 2013

Contributor

y-p commented Mar 23, 2013

The first I can't repro, seems to be fixed.
the second is a case of type sniffing gone wrong, if an exception
is raised due to unequal len arrays in list, it will now fall back and match
the behaviour for a list of unequal lists. 6626a7a

y-p closed this Mar 23, 2013

y-p was unassigned by wesm Oct 12, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment