Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series.describe() fails for empty and None series. #1650

Closed
wants to merge 3 commits into from
Closed

Series.describe() fails for empty and None series. #1650

wants to merge 3 commits into from

Conversation

todddeluca
Copy link

Running describe() raises and exception for Series([]) and Series([None]). See the following examples:

In [416]: Series([]).describe()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-416-162185f07510> in <module>()
----> 1 Series([]).describe()

/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.8.0rc2-py2.7-macosx-10.4-x86_64.egg/pandas/core/series.pyc in describe(self, percentile_width)
  1363                      'max']
  1364 
-> 1365             data = [self.count(), self.mean(), self.std(), self.min(),
  1366                     self.quantile(lb), self.median(), self.quantile(ub),
  1367                     self.max()]

/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.8.0rc2-py2.7-macosx-10.4-x86_64.egg/pandas/core/series.pyc in min(self, axis, out, skipna, level)
  1067         if level is not None:
  1068             return self._agg_by_level('min', level=level, skipna=skipna)
-> 1069         return nanops.nanmin(self.values, skipna=skipna)
  1070 
  1071     @Substitution(name='maximum', shortname='max',

/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.8.0rc2-py2.7-macosx-10.4-x86_64.egg/pandas/core/nanops.pyc in f(values, axis, skipna, **kwds)
    41                 result = alt(values, axis=axis, skipna=skipna, **kwds)
    42         except Exception:
---> 43             result = alt(values, axis=axis, skipna=skipna, **kwds)
    44 
    45         return result

/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.8.0rc2-py2.7-macosx-10.4-x86_64.egg/pandas/core/nanops.pyc in _nanmin(values, axis, skipna)
    152             result = __builtin__.min(values)
    153     else:
--> 154         result = values.min(axis)
    155 
    156     return _maybe_null_out(result, axis, mask)

ValueError: zero-size array to minimum.reduce without identity

In [388]: s = pandas.Series([None])

In [389]: s.describe()
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-389-97bd840d09f5> in <module>()
----> 1 s.describe()

/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.8.0rc2-py2.7-macosx-10.4-x86_64.egg/pandas/core/series.pyc in describe(self, percentile_width)
  1334 
  1335             objcounts = Counter(self.dropna().values)
-> 1336             top, freq = objcounts.most_common(1)[0]
  1337             data = [self.count(), len(objcounts), top, freq]
  1338 

IndexError: list index out of range

In [395]: s = pandas.Series([np.nan, np.nan])

In [396]: s.describe()
Out[396]: 
count     0
mean    NaN
std     NaN
min     NaN
25%     NaN
50%     NaN
75%     NaN
max     NaN

I have included Series([np.nan]) because its output is affected by the pull request.

After the pull request the output is as follows:

In [3]: Series([]).describe()
Out[3]: count    0

In [4]: Series([None]).describe()
Out[4]: 
count     0
unique    0

In [5]: Series([np.nan]).describe()
Out[5]: count    0

The pull request fixes the ValueError from the call to values.min(axis) and the IndexError from accessing most_common(1)[0] by only returning a small amount of count data when the Series object is empty or contains all NaN or None values. Is this the right approach? It changes the output for Series objects with all NaN values.

Test coverage was written but I couldn't figure out how to run the tests. Any hints?

Regards,
Todd

@wesm
Copy link
Member

wesm commented Jul 20, 2012

Hey Todd-- thanks. Couple points

  • For things like this, it's best to work in feature branches, e.g.: http://pandas.pydata.org/developers.html#getting-started-with-git. Otherwise you get merge commits like above in your master branch and it becomes harder to collaborate by pull request.
  • For the test suite, use nosetests path/to/test_foo.py or python path/to/test_foo.py. Either way you need nose

@wesm
Copy link
Member

wesm commented Jul 20, 2012

I'll cherry pick your fixes and and fix things up

wesm added a commit that referenced this pull request Jul 20, 2012
@wesm wesm closed this Jul 20, 2012
@wesm
Copy link
Member

wesm commented Jul 20, 2012

I decided to make min/max and everything else return NaN when length 0

@todddeluca
Copy link
Author

Hi Wes,

Thanks for the pointer to the "getting started" link. I'll use a feature
branch next time.

Regards,
Todd

On Thu, Jul 19, 2012 at 8:09 PM, Wes McKinney <
reply@reply.github.com

wrote:

Hey Todd-- thanks. Couple points

  • For things like this, it's best to work in feature branches, e.g.:
    http://pandas.pydata.org/developers.html#getting-started-with-git.
    Otherwise you get merge commits like above in your master branch and it
    becomes harder to collaborate by pull request.
  • For the test suite, use nosetests path/to/test_foo.py or python path/to/test_foo.py. Either way you need nose

Reply to this email directly or view it on GitHub:
#1650 (comment)

Todd DeLuca
http://todddeluca.com
http://wall.hms.harvard.edu/

yarikoptic added a commit to neurodebian/pandas that referenced this pull request Sep 12, 2012
Version 0.8.1

* tag 'v0.8.1': (126 commits)
  RLS: Version 0.8.1
  DOC: tweak
  DOC: set_index/reset_index examples
  DOC: doc fixes and what's new in 0.8.1, vectorized string methods
  ENH: better string element access/slicing notation close pandas-dev#1656
  DOC: minor additions to release notes for 0.8.1
  BUG: handle Yahoo! finance returning duplicate dates for prev bus day, doc fixes
  BUG: fix windows/32-bit builds
  BUG: get pandas-dev#1620 fix working on python 3
  ENH: handling of UTF-8 strings in DataFrame columns, close pandas-dev#1620
  TST: span unit test pandas-dev#1635
  TST: skip another @network test if no internet connection
  ENH/BUG: handle tz-aware datetime.datetime in to_datetime, add utc=True option to allow conversion to utc, close pandas-dev#1581
  ENH: hack to not compress single group keys, accelerate single-key and Categorical groupby operations
  BUG: fix merge bug with left joins on length-0 DataFrame, close pandas-dev#1628
  BUG: Series.interpolate bug with method='values' and datetime64[ns], close pandas-dev#1646
  BUG: properly handle None values in dict input to concat, close pandas-dev#1649
  BUG: len-0 Series min/max/describe pandas-dev#1650
  Fix describe() failure for None and empty Series.
  BUG: string date aliases now work with tz-aware time series close pandas-dev#1647
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants