Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

AssertionError when using apply after GroupBy #2605

Closed
tlmaloney opened this Issue · 3 comments

2 participants

@tlmaloney

The following code raises an AssertionError with pandas 0.10.0, but works fine in 0.9.1. The error still exists in the latest dev version here. The code comes from Wes' book, pages 33-36. The data files are from https://github.com/pydata/pydata-book

import pandas as pd

years = range(1880, 2011)
pieces  = []
columns = ['name', 'sex', 'births']

for year in years:
    path = 'ch02/names/yob%d.txt' % year
    frame = pd.read_csv(path, names=columns)
    frame['year'] = year
    pieces.append(frame)

names = pd.concat(pieces, ignore_index=True)

def get_top1000(group):
    return group.sort_index(by='births', ascending=False)[:1000]

top1000 = names.groupby(['year', 'sex']).apply(get_top1000)

The last line results in the following error trace:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-8-569425503c6b> in <module>()
----> 1 top1000 = names.groupby(['year', 'sex']).apply(get_top1000)

/home/tlmaloney/vedev/ve-pydata-book/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in apply(self, func, *args, **kwargs)
    319         func = _intercept_function(func)
    320         f = lambda g: func(g, *args, **kwargs)
--> 321         return self._python_apply_general(f)
    322 
    323     def _python_apply_general(self, f):

/home/tlmaloney/vedev/ve-pydata-book/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in _python_apply_general(self, f)
    322 
    323     def _python_apply_general(self, f):
--> 324         keys, values, mutated = self.grouper.apply(f, self.obj, self.axis)
    325 
    326         return self._wrap_applied_output(keys, values,

/home/tlmaloney/vedev/ve-pydata-book/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in apply(self, f, data, axis, keep_internal)
    583         if hasattr(splitter, 'fast_apply') and axis == 0:
    584             try:
--> 585                 values, mutated = splitter.fast_apply(f, group_keys)
    586                 return group_keys, values, mutated
    587             except lib.InvalidApply:

/home/tlmaloney/vedev/ve-pydata-book/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in fast_apply(self, f, names)
   2125 
   2126         sdata = self._get_sorted_data()
-> 2127         results, mutated = lib.apply_frame_axis0(sdata, f, names, starts, ends)
   2128 
   2129         return results, mutated

/home/tlmaloney/vedev/ve-pydata-book/local/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.apply_frame_axis0 (pandas/lib.c:24934)()

/home/tlmaloney/vedev/ve-pydata-book/local/lib/python2.7/site-packages/pandas/core/frame.pyc in __setattr__(self, name, value)
   2026                     super(DataFrame, self).__setattr__(name, value)
   2027                 elif name in self.columns:
-> 2028                     self[name] = value
   2029                 else:
   2030                     object.__setattr__(self, name, value)

/home/tlmaloney/vedev/ve-pydata-book/local/lib/python2.7/site-packages/pandas/core/frame.pyc in __setitem__(self, key, value)
   2041         else:
   2042             # set column
-> 2043             self._set_item(key, value)
   2044 
   2045     def _boolean_set(self, key, value):

/home/tlmaloney/vedev/ve-pydata-book/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _set_item(self, key, value)
   2076         ensure homogeneity.
   2077         """
-> 2078         value = self._sanitize_column(key, value)
   2079         NDFrame._set_item(self, key, value)
   2080 

/home/tlmaloney/vedev/ve-pydata-book/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _sanitize_column(self, key, value)
   2110             else:
   2111                 if len(value) != len(self.index):
-> 2112                     raise AssertionError('Length of values does not match '
   2113                                          'length of index')
   2114 

AssertionError: Length of values does not match length of index
@wesm
Owner

Shoot. A reminder that I should add all the book examples as a smoke test to be run with the rest of the pandas tests. I will sort out a fix soon.

@wesm wesm closed this in 25de028
@wesm
Owner

Thanks-- dev build should work now. Bummed this made it into the release

@tlmaloney

Thanks for the quick fix! Confirmed no error with the new dev build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.