In this notebook, I introduce a code that can easily improve the speed of **diff** and **rolling** feature extraction.

In [None]:
import numpy as np
import pandas as pd

df = pd.read_csv('../input/ventilator-pressure-prediction/train.csv')

# diff

In [None]:
%%time

# Normal code
lag = 1
df[f'normal_diff{lag}_u_in'] = df.groupby('breath_id')['u_in'].diff(lag)

In [None]:
%%time

# Speed up code
lag = 1
shift_u_in = df.groupby('breath_id')['u_in'].shift(lag)
df[f'speedup_diff{lag}_u_in'] = df['u_in'] - shift_u_in

In [None]:
(df[f'normal_diff{lag}_u_in'] - df[f'speedup_diff{lag}_u_in']).max()

# rolling

In [None]:
%%time

# Normal code
lag = 5
df[f'normal_windowmean{lag}_u_in'] = df.groupby('breath_id')['u_in'] \
                                .rolling(window=lag, min_periods=1).mean() \
                                .reset_index(drop=True)

In [None]:
%%time

# Speed up code
lag = 5
tmp_df = pd.DataFrame()
for i in range(lag):
    tmp_df[f'tmp_shif{i}'] = df.groupby('breath_id')['u_in'].shift(i)

df[f'speedup_windowmean{lag}_u_in'] = tmp_df.mean(axis=1)

In [None]:
(df[f'normal_windowmean{lag}_u_in'] - df[f'speedup_windowmean{lag}_u_in']).max()

Thank you for taking a look at this notebook. If there is anything else that can be improved, please let us know in the comments.