## Feature Engineering for Time Series Data

### Time Series

A time series is a sequential set of data points, measured typically over successive times. It is mathematically defined as a set of vectors *f(t),t = 0,1,2,*... where *t* represents the time elapsed . The variable *f(t)* is treated as a random variable. The measurements taken during an event in a time series are arranged in a proper chronological order. 

   * A time series containing records of a single variable is is termed as **univarite**.
   * A time series containing records of more than one variable is reffered as **multivarite**


A time series can be discrete or continuous. 

### Creating Feature from Time

In [None]:
%matplotlib inline
from matplotlib import pylab as plt

import matplotlib.dates as mdates
plt.rcParams['figure.figsize'] = (15.0, 8.0)
import pandas as pd
import seaborn as sns

from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot
from pandas.tseries.holiday import USFederalHolidayCalendar as calendar

import bokeh as bk

In [None]:
energy_data = pd.read_csv("../input/house-hold-energy-data/D202.csv")

In [None]:
energy_data["DATE_TIME"] = pd.to_datetime(energy_data.DATE + " " + energy_data["END TIME"])
energy_data = energy_data[["DATE_TIME","USAGE"]]

In [None]:
energy_data.head()

### Determine Type of Day

In [None]:
energy_data["DAY_TYPE"] = energy_data.DATE_TIME.apply(lambda x: 1 if x.dayofweek > 5 else 0  )

In [None]:
energy_data.head()

### Impact of Holidays

In [None]:
cal = calendar()
holidays = cal.holidays(start = energy_data.DATE_TIME.min(),
                        end = energy_data.DATE_TIME.max())
energy_data["IS_HOLIDAY"] = energy_data.DATE_TIME.isin(holidays)

In [None]:
energy_data.head()

### Excercise 

Create a feature to represent part of the day such as Morning (M), Noon (N), Evening (E) and Night (N).


### Target from Time

Sometimes we may have to create taget variables from time itself. Such an excercise is required in usecases such as **Survival Models/Predictive Maintainance** .

In [None]:
phm_raw = pd.read_csv("../input/phm-2018/05_M02_DC_train.csv")
phm_raw.head(2)

In [None]:
phm_raw.shape

In [None]:
phm_tgt = pd.read_csv("../input/phm-2018/05_M02_train_fault_data.csv")
phm_tgt.head(2)

In [None]:
phm_tgt.shape

In [None]:
phm_joined = pd.merge(phm_raw,
                     phm_tgt,
                     how='left',
                    on=['time','Tool'])

In [None]:
phm_joined.head(5)

In [None]:
phm_joined['time_stamp'] = pd.to_datetime(phm_joined.time,unit='s')

In [None]:
from IPython.display import Image, display

In [None]:
display(Image("../input/pic-nb/1_EXwJYgnAok6XLu1x1l3V_g.png"))

![](http://)

## Excercise 

Create Time to failure bases on the filure time.

Tip!

For each observation in the uptime duration sustract the failure time !


# Let's Nuke the Plan!!!

Data Source - https://www.kaggle.com/imeintanis/collision-detection-ai-using-vibration-data 

In [None]:
train_data = pd.read_csv("/kaggle/input/collision-detection-ai-using-vibration-data/train_features.csv")
train_target = pd.read_csv("/kaggle/input/collision-detection-ai-using-vibration-data/train_target.csv")

In [None]:
import numpy as np
import sklearn as sl
import scipy as sp
from tqdm import tqdm

In [None]:
def plot_data(accelaration_df : pd.DataFrame,features : list, title : str) -> None:
    """ Plot the accelaration data
        :params accelaration_df: accelaration data for one id
        :params title: string
    """
    
    fig = plt.figure(figsize=(10,6))
    fig.tight_layout(pad=10.0)
    fig.suptitle(title)
    
    for idx,feature in enumerate(features):
        ax = fig.add_subplot(2,2,idx+1)
        accelaration_df[feature].plot(kind='line',
                                     title = title + " " + feature,
                                     ax=ax)

In [None]:
feats_to_plot = ["S1","S2","S3", "S4"]
plot_data(train_data[train_data.id == 0],feats_to_plot,"Accelaration Params")

## What is the Challenge Here!

One record != one sample 

In [None]:
train_data[train_data.id == 1]

## How to approach Feature Engineering

### Fourier Transform

One of the prominent methods to approach signal data is to apply forurier transformation in the data. The Fourier transformed data can be used for training a model.

In [None]:
fs = 5 #sampling frequency
fmax = 25 #sampling period
dt = 1/fs #length of signal
n = 75

def fft_features(data_set : pd.DataFrame) -> np.ndarray:
    """ Convert the dataset to fourier transfomed
        :params data_set: original collider params data
        :returns ft_data: Fourier transformed data
        #Reference - https://dacon.io/competitions/official/235614/codeshare/1174
    """
    ft_data = list()
    
    features = ["S1","S2","S3", "S4"]
    
    id_set = list(data_set.id.unique())
    
    for ids in tqdm(id_set):
        s1_fft = np.fft.fft(data_set[data_set.id==ids]['S1'].values)*dt
        s2_fft = np.fft.fft(data_set[data_set.id==ids]['S2'].values)*dt
        s3_fft = np.fft.fft(data_set[data_set.id==ids]['S3'].values)*dt
        s4_fft = np.fft.fft(data_set[data_set.id==ids]['S4'].values)*dt
        
        ft_data.append(np.concatenate([np.abs(s1_fft[0:int(n/2+1)]),
                                       np.abs(s2_fft[0:int(n/2+1)]),
                                       np.abs(s3_fft[0:int(n/2+1)]),
                                       np.abs(s4_fft[0:int(n/2+1)])]))
    
    return np.array(ft_data)

In [None]:
train_fft = fft_features(train_data)

### Alternative Fature Engineering

An alternative approach in feature engineering is to aggregate the features and compute key statistics such as mean, median, standard deviation, minimum value, and skew.

In [None]:
def generate_agg_feats(data_set : pd.DataFrame) -> pd.DataFrame:
    """ Create aggrage features from the data
        :param data_set: Base data as DataFrame
        :returns agg_data: Aggragated DataFrame
    """
    
    max_feats = data_set.groupby(['id']).max().add_suffix('_max').iloc[:,1:]
    min_feats = data_set.groupby(['id']).min().add_suffix('_min').iloc[:,1:]
    mean_feats = data_set.groupby(['id']).mean().add_suffix('_mean').iloc[:,1:]
    std_feats = data_set.groupby(['id']).std().add_suffix('_std').iloc[:,1:]
    median_feats = data_set.groupby(['id']).median().add_suffix('_median').iloc[:,1:]
    skew_feats = data_set.groupby(['id']).skew().add_suffix('_skew').iloc[:,1:]
    
    agg_data = pd.concat([max_feats,min_feats,
                          mean_feats,std_feats,median_feats,skew_feats],
                        axis=1)
    
    return agg_data

In [None]:
agg_train = generate_agg_feats(train_data)
agg_train.shape

## Challenge

Build regression model for the collider parameter detection!!

## Reference

PHM 2018 Data - https://www.phmsociety.org/events/conference/phm/18/data-challenge

Hydrogen Collider Data - https://www.kaggle.com/jaganadhg/atomicai-starter 

Electricity Usage Data - https://www.kaggle.com/jaganadhg/house-hold-energy-data 

In [None]:
Reference