-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Closed
Labels
PerformanceMemory or execution speed performanceMemory or execution speed performance
Milestone
Description
data = np.random.randn(10000, 500)
df = pd.DataFrame(data)
df = df.where(df > 0) # create nans
bools = df > 0
reg_mask = bools.mask(pd.isnull(df)) # walltime 10s
float_mask = bools.astype(float).mask(pd.isnull(df)) # walltime 200ms
tm.assert_frame_equal(reg_mask, float_mask) # success
I found that while masking boolean DataFrames, converting to float speeds up the process considerably. When you're masking at least one row in each column, the outputs are equivalent.
Note, the mask is slow even when you're only masking one column
data = np.random.randn(10000, 500)
df = pd.DataFrame(data)
df.ix[:, 0] = np.nan
bools = df > 0
reg_mask = bools.mask(pd.isnull(df)) # walltime 10s
# dtypes: bool(499), float64(1)
Metadata
Metadata
Assignees
Labels
PerformanceMemory or execution speed performanceMemory or execution speed performance