## Sliding windows
This is the final procedure of features engineerings. It should be done after the features extraction and after the one-hot-encoding procedure.

### Some libraries

In [1]:
import pandas as pd
import numpy as np
import time
from scipy import signal
from modules.feature_extraction import *
import warnings

### Importing DataFrame 'one-hot-enc'

In [2]:
df = pd.read_csv(r'datasets\one-hot-enc.csv')
LIP = df.LIP
df.drop(['LIP_SCORE', 'LIP'], inplace = True, axis = 1)
df.head()

Unnamed: 0,PDB_ID,CHAIN_ID,RES_ID,REL_ASA,PHI,PSI,NH_O_1_relidx,NH_O_1_energy,O_NH_1_relidx,O_NH_1_energy,...,MC_SC,NO_EDGE_LOC,SC_MC,SC_SC,HBOND,IAC,NO_EDGE_TYPE,PIPISTACK,VDW,CHAIN_LEN
0,1cee,A,1,1.0,360.0,97.6,0.0,0.0,2.0,-0.3,...,0,1,0,0,0,0,1,0,0,179
1,1cee,A,2,0.348485,-91.3,147.8,48.0,-0.1,50.0,-1.7,...,0,0,0,2,2,0,0,0,1,179
2,1cee,A,3,0.387324,-142.6,136.7,-2.0,-0.3,50.0,-0.2,...,0,1,0,0,0,0,1,0,0,179
3,1cee,A,4,0.005917,-94.3,154.4,48.0,-1.9,50.0,-1.6,...,0,0,0,3,2,0,0,0,3,179
4,1cee,A,5,0.346341,-112.5,70.5,-2.0,-0.2,71.0,-1.6,...,0,0,1,0,1,0,0,0,1,179


### Apply sliding windows

Use odd numbers for windows or some extra NaN could be generated

In [8]:
df_slided = sliding_windows(data=df, window=7, std=1, get_time=True, ignore_warnings=True)
df_slided.head()

24.9086856842041


Unnamed: 0,PDB_ID,CHAIN_ID,RES_ID,REL_ASA,PHI,PSI,NH_O_1_relidx,NH_O_1_energy,O_NH_1_relidx,O_NH_1_energy,...,MC_SC,NO_EDGE_LOC,SC_MC,SC_SC,HBOND,IAC,NO_EDGE_TYPE,PIPISTACK,VDW,CHAIN_LEN
0,1cee,A,1,0.189104,29.538351,45.893053,8.393152,-0.188069,11.042499,-0.341764,...,0.0,0.181524,0.0,0.441171,0.352937,0.0,0.181524,0.0,0.352937,64.080718
1,1cee,A,2,0.171384,1.743089,47.714779,8.533527,-0.114541,13.773902,-0.350905,...,0.0,0.174881,0.001587,0.401716,0.364636,0.0,0.174881,0.0,0.260446,64.080718
2,1cee,A,3,0.112079,-31.954699,49.379458,8.146105,-0.22589,17.377634,-0.358541,...,0.0,0.162191,0.019334,0.439584,0.372271,0.0,0.162191,0.0,0.372271,64.080718
3,1cee,A,4,0.07274,-38.616482,45.319353,8.477671,-0.353073,19.677972,-0.480324,...,0.0,0.088234,0.086647,0.486572,0.451283,0.0,0.088234,0.0,0.553886,64.080718
4,1cee,A,5,0.058043,-36.939893,38.438409,9.499458,-0.379061,21.324981,-0.6888,...,0.0,0.019334,0.142857,0.352937,0.513541,0.0,0.019334,0.0,0.494207,64.080718


### NaN values
If we have n NaN values before sliding windows, after the procedure we will have:
  
    (NaN values) proportional to (n*window_size) 
    
This is due by the fact that if a NaN is present in any windows the feature of the center residue is also set to NaN')

In [11]:
NaN_before = len(df[df.REL_ASA.isna()].index)
NaN_after = len(df_slided[df_slided.REL_ASA.isna()].index)
print('Numbers of NaN in REL_ASA before sliding windows: {}'.format(NaN_before))
print('Numbers of NaN in REL_ASA after sliding windows: {}'.format(NaN_after))

Numbers of NaN in REL_ASA before sliding windows: 46
Numbers of NaN in REL_ASA after sliding windows: 279
