Series.describe() fails for empty and None series. #1650

todddeluca · 2012-07-19T21:10:41Z

Running describe() raises and exception for Series([]) and Series([None]). See the following examples:

In [416]: Series([]).describe()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-416-162185f07510> in <module>()
----> 1 Series([]).describe()

/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.8.0rc2-py2.7-macosx-10.4-x86_64.egg/pandas/core/series.pyc in describe(self, percentile_width)
  1363                      'max']
  1364 
-> 1365             data = [self.count(), self.mean(), self.std(), self.min(),
  1366                     self.quantile(lb), self.median(), self.quantile(ub),
  1367                     self.max()]

/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.8.0rc2-py2.7-macosx-10.4-x86_64.egg/pandas/core/series.pyc in min(self, axis, out, skipna, level)
  1067         if level is not None:
  1068             return self._agg_by_level('min', level=level, skipna=skipna)
-> 1069         return nanops.nanmin(self.values, skipna=skipna)
  1070 
  1071     @Substitution(name='maximum', shortname='max',

/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.8.0rc2-py2.7-macosx-10.4-x86_64.egg/pandas/core/nanops.pyc in f(values, axis, skipna, **kwds)
    41                 result = alt(values, axis=axis, skipna=skipna, **kwds)
    42         except Exception:
---> 43             result = alt(values, axis=axis, skipna=skipna, **kwds)
    44 
    45         return result

/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.8.0rc2-py2.7-macosx-10.4-x86_64.egg/pandas/core/nanops.pyc in _nanmin(values, axis, skipna)
    152             result = __builtin__.min(values)
    153     else:
--> 154         result = values.min(axis)
    155 
    156     return _maybe_null_out(result, axis, mask)

ValueError: zero-size array to minimum.reduce without identity

In [388]: s = pandas.Series([None])

In [389]: s.describe()
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-389-97bd840d09f5> in <module>()
----> 1 s.describe()

/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.8.0rc2-py2.7-macosx-10.4-x86_64.egg/pandas/core/series.pyc in describe(self, percentile_width)
  1334 
  1335             objcounts = Counter(self.dropna().values)
-> 1336             top, freq = objcounts.most_common(1)[0]
  1337             data = [self.count(), len(objcounts), top, freq]
  1338 

IndexError: list index out of range

In [395]: s = pandas.Series([np.nan, np.nan])

In [396]: s.describe()
Out[396]: 
count     0
mean    NaN
std     NaN
min     NaN
25%     NaN
50%     NaN
75%     NaN
max     NaN

I have included Series([np.nan]) because its output is affected by the pull request.

After the pull request the output is as follows:

In [3]: Series([]).describe()
Out[3]: count    0

In [4]: Series([None]).describe()
Out[4]: 
count     0
unique    0

In [5]: Series([np.nan]).describe()
Out[5]: count    0

The pull request fixes the ValueError from the call to values.min(axis) and the IndexError from accessing most_common(1)[0] by only returning a small amount of count data when the Series object is empty or contains all NaN or None values. Is this the right approach? It changes the output for Series objects with all NaN values.

Test coverage was written but I couldn't figure out how to run the tests. Any hints?

Regards,
Todd

wesm · 2012-07-20T00:09:08Z

Hey Todd-- thanks. Couple points

For things like this, it's best to work in feature branches, e.g.: http://pandas.pydata.org/developers.html#getting-started-with-git. Otherwise you get merge commits like above in your master branch and it becomes harder to collaborate by pull request.
For the test suite, use nosetests path/to/test_foo.py or python path/to/test_foo.py. Either way you need nose

wesm · 2012-07-20T00:12:52Z

I'll cherry pick your fixes and and fix things up

wesm · 2012-07-20T00:29:24Z

I decided to make min/max and everything else return NaN when length 0

todddeluca · 2012-07-20T00:47:12Z

Hi Wes,

Thanks for the pointer to the "getting started" link. I'll use a feature
branch next time.

Regards,
Todd

On Thu, Jul 19, 2012 at 8:09 PM, Wes McKinney <
reply@reply.github.com

wrote:

Hey Todd-- thanks. Couple points

For things like this, it's best to work in feature branches, e.g.:
http://pandas.pydata.org/developers.html#getting-started-with-git.
Otherwise you get merge commits like above in your master branch and it
becomes harder to collaborate by pull request.

For the test suite, use nosetests path/to/test_foo.py or python path/to/test_foo.py. Either way you need nose

Reply to this email directly or view it on GitHub:
#1650 (comment)

Todd DeLuca
http://todddeluca.com
http://wall.hms.harvard.edu/

Version 0.8.1 * tag 'v0.8.1': (126 commits) RLS: Version 0.8.1 DOC: tweak DOC: set_index/reset_index examples DOC: doc fixes and what's new in 0.8.1, vectorized string methods ENH: better string element access/slicing notation close pandas-dev#1656 DOC: minor additions to release notes for 0.8.1 BUG: handle Yahoo! finance returning duplicate dates for prev bus day, doc fixes BUG: fix windows/32-bit builds BUG: get pandas-dev#1620 fix working on python 3 ENH: handling of UTF-8 strings in DataFrame columns, close pandas-dev#1620 TST: span unit test pandas-dev#1635 TST: skip another @network test if no internet connection ENH/BUG: handle tz-aware datetime.datetime in to_datetime, add utc=True option to allow conversion to utc, close pandas-dev#1581 ENH: hack to not compress single group keys, accelerate single-key and Categorical groupby operations BUG: fix merge bug with left joins on length-0 DataFrame, close pandas-dev#1628 BUG: Series.interpolate bug with method='values' and datetime64[ns], close pandas-dev#1646 BUG: properly handle None values in dict input to concat, close pandas-dev#1649 BUG: len-0 Series min/max/describe pandas-dev#1650 Fix describe() failure for None and empty Series. BUG: string date aliases now work with tz-aware time series close pandas-dev#1647 ...

Todd DeLuca added 3 commits July 13, 2012 09:36

Show idxmin behavior for multiple min values.

d4be1d8

Merge branch 'master' of git://github.com/pydata/pandas

c682c41

Fix describe() failure for None and empty Series.

1a43eff

wesm added a commit that referenced this pull request Jul 20, 2012

BUG: len-0 Series min/max/describe #1650

6a0863f

wesm closed this Jul 20, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Series.describe() fails for empty and None series. #1650

Series.describe() fails for empty and None series. #1650

todddeluca commented Jul 19, 2012

wesm commented Jul 20, 2012

wesm commented Jul 20, 2012

wesm commented Jul 20, 2012

todddeluca commented Jul 20, 2012

Series.describe() fails for empty and None series. #1650

Series.describe() fails for empty and None series. #1650

Conversation

todddeluca commented Jul 19, 2012

wesm commented Jul 20, 2012

wesm commented Jul 20, 2012

wesm commented Jul 20, 2012

todddeluca commented Jul 20, 2012