Dropping non-finite entries #7314

amelio-vazquez-reina · 2014-06-02T15:58:40Z

I have been looking for a solution for this for a long time. I tried the ideas in the following threads (with the latest Pandas):

but none of them work. See thread 2 above, and the comments in its only answer to see why.

What is a good way to drop indices (either rows or columns) that meet a specific criteria such as: "they contain entries that are not finite".?

The text was updated successfully, but these errors were encountered:

cpcloud · 2014-06-02T16:03:41Z

You can use the option mode.use_inf_as_null to do this:

In [14]: df = DataFrame({'a': randint(3,size=10)})

In [15]: df['b'] = tm.choice([2,3,nan,inf,-inf], size=len(df))

In [16]: df
Out[16]:
   a       b
0  1     inf
1  2    -inf
2  0  3.0000
3  1    -inf
4  2     NaN
5  1  3.0000
6  1     inf
7  0  2.0000
8  2    -inf
9  2     inf

In [17]: with pd.option_context('mode.use_inf_as_null', True):
   ....:     res = df.dropna()
   ....:

In [18]: res
Out[18]:
   a  b
2  0  3
5  1  3
7  0  2

jreback · 2014-06-02T16:03:48Z

Well using the example from 2

In [81]: x = pandas.DataFrame([
   ....:     [1, 2, np.inf],
   ....:     [4, np.inf, 5],
   ....:     [6, 7, 8]
   ....: ])

In [82]: x
Out[82]: 
   0         1         2
0  1  2.000000       inf
1  4       inf  5.000000
2  6  7.000000  8.000000

In [84]: np.isinf(x)
Out[84]: 
       0      1      2
0  False  False   True
1  False   True  False
2  False  False  False

In [85]: x[np.isinf(x)] = np.nan

In [86]: x.dropna()
Out[86]: 
   0  1  2
2  6  7  8

In [87]: x
Out[87]: 
   0   1   2
0  1   2 NaN
1  4 NaN   5
2  6   7   8

isn't this what you want?

(its only slightly more tricky to NOT convert the existing nans if you have)

jreback · 2014-06-02T16:04:28Z

ahhh yes...forgot about the use_inf_as_null option... +1 on that!

cpcloud · 2014-06-02T16:07:50Z

curious that inf makes the numbers in the Series do %.2f repr instead of a %.2g-style repr, is that intentional?

amelio-vazquez-reina · 2014-06-02T17:17:32Z

Thanks @cpcloud and @jreback . Any way to just drop Inf (and non-Inf) entries when working with dfs with mixed types?

cpcloud · 2014-06-02T17:25:14Z

what do you mean inf and non-inf? isn't that everything?

cpcloud · 2014-06-02T17:28:56Z

oh i see ... because isfinite doesn't work on object dtypes

cpcloud · 2014-06-02T17:32:25Z

seems like a bug, dropna doesn't work on inf when dtypes are mixed and mode.use_inf_as_null is True

amelio-vazquez-reina · 2014-06-02T17:33:50Z

Thanks @cpcloud Yes I have had the problem you just mentioned before. Also, sometimes I just want to drop Inf and -Inf values (keeping NaNs untouched)

cpcloud · 2014-06-02T17:37:31Z

that's a bit of a strange use case. i would suggest something like replacing nan with a string like nan_str or something then dropping inf/-inf with isnull (once I fix this) then replacing the nan_str back with nan

cpcloud · 2014-06-02T17:37:56Z

or you could use replace

cpcloud · 2014-06-02T17:48:42Z

@ribonoous i put up the fix if you want to check it out

hayd · 2014-06-02T17:52:00Z

or you could use replace

Is the answer to everything.

should isnull/dropna do infs by default?? It seems like they may sometimes have special meaning (different from NaN).

cpcloud · 2014-06-02T18:01:50Z

:) replace is that person who raises their hand to answer every question whether or not they know the answer. Judging by the way things are named I would guess that this used to be the default but for some reason was changed.

jreback · 2014-06-02T18:07:07Z

changed here: http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#v0-10-0-december-17-2012 (look down a bit); inf should not be treated as nan makes sense as the default. as pretty simple to convert them if needed and its sematically wrong (as they are an actual value)

cpcloud · 2014-06-02T18:20:44Z

we could have an isinf that handles object dtype, but i'm not really sure how widely used inf is ... personally I almost never use it and when I do, it'll eventually be replaced by 0 or NaN or something else that's easy(ier) to work with.

hayd · 2014-06-02T18:23:39Z

You can do applymap(np.isinf) or df.where(df.applymap(np.isinf)...

If perf is the issue convert to float!

jreback · 2014-06-02T18:33:25Z

easy enough to df._get_numeric_data()

fyi, maybe we should make a method (needs a better name maybe)

df.get_for_dtypes(list_of_dtypes), where list_of_dtypes could be actual dtypes and/or numeric/datetime

TomAugspurger · 2014-06-02T19:04:37Z

@jreback I could use something like that in #7308 for [numeric, datetime]

cpcloud mentioned this issue Jun 2, 2014

BUG: isnull doesn't properly check for inf when requested #7315

Merged

cpcloud added Bug labels Jun 2, 2014

cpcloud added this to the 0.14.1 milestone Jun 2, 2014

cpcloud self-assigned this Jun 2, 2014

jreback mentioned this issue Jun 2, 2014

API: select_dtypes #7316

Closed

cpcloud closed this as completed in #7315 Jun 3, 2014

wesm unassigned cpcloud Oct 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dropping non-finite entries #7314

Dropping non-finite entries #7314

amelio-vazquez-reina commented Jun 2, 2014

cpcloud commented Jun 2, 2014

jreback commented Jun 2, 2014

jreback commented Jun 2, 2014

cpcloud commented Jun 2, 2014

amelio-vazquez-reina commented Jun 2, 2014

cpcloud commented Jun 2, 2014

cpcloud commented Jun 2, 2014

cpcloud commented Jun 2, 2014

amelio-vazquez-reina commented Jun 2, 2014

cpcloud commented Jun 2, 2014

cpcloud commented Jun 2, 2014

cpcloud commented Jun 2, 2014

hayd commented Jun 2, 2014

cpcloud commented Jun 2, 2014

jreback commented Jun 2, 2014

cpcloud commented Jun 2, 2014

hayd commented Jun 2, 2014

jreback commented Jun 2, 2014

TomAugspurger commented Jun 2, 2014

Dropping non-finite entries #7314

Dropping non-finite entries #7314

Comments

amelio-vazquez-reina commented Jun 2, 2014

cpcloud commented Jun 2, 2014

jreback commented Jun 2, 2014

jreback commented Jun 2, 2014

cpcloud commented Jun 2, 2014

amelio-vazquez-reina commented Jun 2, 2014

cpcloud commented Jun 2, 2014

cpcloud commented Jun 2, 2014

cpcloud commented Jun 2, 2014

amelio-vazquez-reina commented Jun 2, 2014

cpcloud commented Jun 2, 2014

cpcloud commented Jun 2, 2014

cpcloud commented Jun 2, 2014

hayd commented Jun 2, 2014

cpcloud commented Jun 2, 2014

jreback commented Jun 2, 2014

cpcloud commented Jun 2, 2014

hayd commented Jun 2, 2014

jreback commented Jun 2, 2014

TomAugspurger commented Jun 2, 2014