Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add boolean masking operation for DataFrame #2109

Closed
jreback opened this issue Oct 24, 2012 · 0 comments
Closed

Add boolean masking operation for DataFrame #2109

jreback opened this issue Oct 24, 2012 · 0 comments
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Oct 24, 2012

import pandas
import numpy as np

define the mask function

def mask(self,condition):
    new_self = self.copy()
    new_self[~condition.values] = np.nan
    return new_self
pandas.DataFrame.mask = mask

index = pandas.date_range('1/1/2000',periods=8,)
columns = ['A','B','C','D']
df = pandas.DataFrame(np.random.randn(len(index),len(columns)),index=index
printdf
A B C D
2000-01-01 0.752832 0.083465-0.273210 1.128781
2000-01-02 0.895254 0.401056 1.473770 1.998924
2000-01-03 2.318820 0.384354-1.056422-1.280257
2000-01-04 0.981042 0.717762-1.015285-1.146636
2000-01-05-0.979061-1.765188 0.025436-0.815622
2000-01-06-0.166251 1.887524-0.131171-0.802795
2000-01-07 0.025936 0.122587 0.517295 0.589679
2000-01-08 0.691059 0.458683-0.856201-0.412374

pandas supports boolean indexing for Series

s = pandas.Series(np.random.randn(len(index)),index=index)
prints
prints[s<0]
2000-01-01 -0.182340
2000-01-02 0.031729
2000-01-03 0.616713
2000-01-04 -0.329961
2000-01-05 -1.220345
2000-01-06 -1.323948
2000-01-07 1.182522
2000-01-08 -0.622332
Freq:D
2000-01-01 -0.182340
2000-01-04 -0.329961
2000-01-05 -1.220345
2000-01-06 -1.323948
2000-01-08 -0.622332

but not directly in DataFrame
the mask function will enable a convenient operation
df[df < 0] currently returns a numpy array which is correct but not that useful

print df.mask(df<0)
A B C D
2000-01-01 NaN-1.518799-0.574630 NaN
2000-01-02-1.023108 NaN NaN-0.009226
2000-01-03 NaN-0.623582 NaN-1.801656
2000-01-04-0.984583 NaN-1.082821 NaN
2000-01-05-0.709460-1.202316 NaN-0.484609
2000-01-06-0.775715 NaN-0.415970 NaN
2000-01-07-1.395435 NaN-0.293588 NaN
2000-01-08-0.377900-0.526218-0.660083 NaN
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants