## **tsfreshによる特徴量抽出**
- tsfreshとは，時系列データの特徴を自動的に計算するライブラリーになります．
    - 抽出する特徴量は以下の公式ドキュメントに記載されています．<br>
    https://tsfresh.readthedocs.io/en/latest/text/list_of_features.html
        - また，それぞれの特徴量計算において，パラメータが複数設定することができるものに関しては，複数計算されます．
    
- tsfreshの推奨される使い方は以下の公式ドキュメントに記載されています．<br>
https://tsfresh.readthedocs.io/en/latest/text/feature_filtering.html
    1. **Feature extraction**
        - まずは`tsfresh.feature_extraction.feature_calculators`で包括的に特徴量を自動生成します．<br>
          生のデータから集約された特徴量へと変換します．
    2. **Feature significance testing**
        - 次に`tsfresh.feature_selection.significance_tests module`で個々の特徴量が有効かどうかを独立に統計的な検定してします．<br>
        これを使う際には，ターゲットのラベル情報も与えてやる必要があります．

上記1&2を同時に実行してくれるのが`tsfresh.convenience.relevant_extraction.extract_relevant_features`になります．

In [1]:
# ライブラリー
from tsfresh import extract_features, select_features, extract_relevant_features
from tsfresh.feature_extraction import settings
from tsfresh.utilities.dataframe_functions import impute

import pandas as pd

import warnings
warnings.filterwarnings('ignore')

### **データセット**
- データセットはtsfreshに用意されているデータを使うことにします．
    - 系列が8つ
        - id, time + 6系列

In [2]:
from tsfresh.examples.robot_execution_failures import download_robot_execution_failures, load_robot_execution_failures

download_robot_execution_failures()
df, y = load_robot_execution_failures()

print("shape:", df.shape)
display(df.head())

shape: (1320, 8)


Unnamed: 0,id,time,F_x,F_y,F_z,T_x,T_y,T_z
0,1,0,-1,-1,63,-3,-1,0
1,1,1,0,0,62,-3,-1,0
2,1,2,-1,-1,61,-3,0,0
3,1,3,-1,-1,63,-2,-1,0
4,1,4,-1,-1,63,-3,-1,0


### **tsfreshによる特徴量抽出**
#### 1. Feature extraction
- 1つの系列あたり763個の特徴量が新しく生成されました．
- 生成される特徴量の名称はい以下のようになっています．

    `{time_series_name}__{feature_name}__{parameter name 1}_{parameter value 1}__[..]`<br>
    -->`{元の系列のカラム名}__{生成される特徴量の関数名}__{パラメータ名}__{パラメータが取る値}__[..]`

In [3]:
X_extracted = extract_features(df, column_id='id', column_sort='time')
X_extracted.head(10)

Feature Extraction: 100%|██████████| 10/10 [00:11<00:00,  1.18s/it]


variable,F_x__abs_energy,F_x__absolute_sum_of_changes,"F_x__agg_autocorrelation__f_agg_""mean""__maxlag_40","F_x__agg_autocorrelation__f_agg_""median""__maxlag_40","F_x__agg_autocorrelation__f_agg_""var""__maxlag_40","F_x__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""max""","F_x__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""mean""","F_x__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""min""","F_x__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""var""","F_x__agg_linear_trend__attr_""intercept""__chunk_len_50__f_agg_""max""",...,T_z__symmetry_looking__r_0.9500000000000001,T_z__time_reversal_asymmetry_statistic__lag_1,T_z__time_reversal_asymmetry_statistic__lag_2,T_z__time_reversal_asymmetry_statistic__lag_3,T_z__value_count__value_-1,T_z__value_count__value_0,T_z__value_count__value_1,T_z__variance,T_z__variance_larger_than_standard_deviation,T_z__variation_coefficient
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,14.0,2.0,-0.106351,-0.07206633,0.016879,0.0,-0.9,-1.0,0.09,,...,0.0,0.0,0.0,0.0,0.0,15.0,0.0,0.0,0.0,
2,25.0,14.0,-0.039098,-0.04935275,0.08879,0.0,-0.7,-3.0,0.81,,...,1.0,0.0,0.0,0.0,4.0,11.0,0.0,0.195556,0.0,-1.658312
3,12.0,10.0,-0.029815,2.6020850000000003e-17,0.105435,1.0,-0.5,-1.0,0.45,,...,1.0,0.0,-0.090909,0.0,4.0,11.0,0.0,0.195556,0.0,-1.658312
4,16.0,17.0,-0.049773,-0.06417112,0.14358,1.0,-0.4,-2.0,1.24,,...,1.0,0.0,-0.181818,0.0,6.0,8.0,1.0,0.355556,0.0,-1.788854
5,17.0,13.0,-0.061467,-0.05172414,0.052642,2.0,-0.5,-2.0,1.05,,...,1.0,-0.076923,-0.090909,-0.222222,4.0,9.0,2.0,0.382222,0.0,-4.636809
6,39.0,24.0,-0.05776,0.0,0.132493,1.0,-1.1,-3.0,1.89,,...,1.0,-0.153846,0.0,0.111111,4.0,10.0,1.0,0.293333,0.0,-2.708013
7,21.0,13.0,-0.213085,-0.08561644,0.77513,0.0,-1.1,-3.0,0.69,,...,1.0,0.153846,0.0,0.222222,2.0,12.0,1.0,0.195556,0.0,-6.63325
8,26.0,16.0,-0.002893,-0.158046,0.299821,2.0,-0.7,-2.0,1.41,,...,1.0,-0.461538,-1.090909,-1.444444,5.0,7.0,2.0,1.066667,1.0,
9,24.0,20.0,-0.037626,0.07211538,0.098712,2.0,-0.7,-3.0,1.61,,...,1.0,-0.153846,-0.181818,0.888889,4.0,6.0,4.0,1.093333,1.0,5.228129
10,14.0,2.0,-0.042665,-0.07206633,0.00593,-1.0,-1.0,-1.0,0.0,,...,0.0,0.0,0.0,0.0,0.0,15.0,0.0,0.0,0.0,


#### 2. Feature significance testing
- tsfreshでは，個々の特徴量に対して，有意性の検定を行います．
- 方法は以下の4種類に分かれます．
    - 1. Target and feature are both binary
        - Fisher's exact test
    - 2. Target is binary and feature real
        - Mann-Whitney U test or Kolmogorov-Smirnov test
    - 3. Target is real and the feature is binary
        - Kolmogorov-Smirnov test
    - 4. Target and feature are both real
        - Kendall’s tau

In [4]:
X_extracted = impute(X_extracted)
X_selected = select_features(X_extracted, y)
X_selected.head(10)

variable,F_x__value_count__value_-1,F_x__abs_energy,F_x__range_count__max_1__min_-1,F_y__abs_energy,T_y__standard_deviation,T_y__variance,"F_x__fft_coefficient__attr_""abs""__coeff_1","T_y__fft_coefficient__attr_""abs""__coeff_1",T_y__abs_energy,F_z__standard_deviation,...,"T_x__agg_linear_trend__attr_""intercept""__chunk_len_5__f_agg_""min""",T_x__number_peaks__n_1,T_y__number_cwt_peaks__n_1,T_y__count_below__t_0,"T_x__change_quantiles__f_agg_""var""__isabs_True__qh_0.2__ql_0.0","F_z__change_quantiles__f_agg_""mean""__isabs_True__qh_1.0__ql_0.8",T_x__quantile__q_0.1,F_y__has_duplicate_max,"F_y__cwt_coefficients__coeff_14__w_5__widths_(2, 5, 10, 20)","F_y__cwt_coefficients__coeff_13__w_2__widths_(2, 5, 10, 20)"
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,14.0,14.0,15.0,13.0,0.471405,0.222222,1.0,1.165352,10.0,1.203698,...,-3.0,1.0,4.0,1.0,0.0,0.0,-3.0,1.0,-0.751682,-0.310265
2,7.0,25.0,13.0,76.0,2.054805,4.222222,0.624118,6.020261,90.0,4.333846,...,-4.166667,4.0,4.0,0.933333,0.0,1.0,-9.2,1.0,0.057818,-0.202951
3,11.0,12.0,14.0,40.0,1.768867,3.128889,2.203858,8.235442,103.0,4.616877,...,-5.833333,6.0,3.0,0.866667,0.0,3.0,-6.6,0.0,0.912474,0.539121
4,5.0,16.0,10.0,60.0,2.669998,7.128889,0.844394,12.067855,124.0,3.833188,...,-9.333333,5.0,5.0,0.733333,0.0,0.0,-9.0,0.0,-0.609735,-2.64139
5,9.0,17.0,13.0,46.0,2.039608,4.16,2.730599,6.44533,180.0,4.841487,...,-11.833333,5.0,5.0,0.933333,0.0,0.0,-9.6,0.0,0.072771,0.591927
6,6.0,39.0,7.0,88.0,2.080598,4.328889,2.00182,2.82744,225.0,3.047768,...,-11.5,3.0,3.0,1.0,0.0,0.0,-12.0,1.0,0.475583,-0.600927
7,8.0,21.0,13.0,27.0,1.892676,3.582222,1.133819,12.822865,234.0,5.243409,...,-10.0,5.0,4.0,1.0,0.0,3.0,-10.0,0.0,-1.862009,-1.582648
8,8.0,26.0,9.0,24.0,2.445858,5.982222,2.09052,11.28589,213.0,4.364503,...,-12.333333,4.0,3.0,0.866667,0.0,0.0,-12.0,0.0,-1.32128,-2.080054
9,7.0,24.0,12.0,60.0,1.557776,2.426667,0.866097,4.619776,253.0,4.027682,...,-15.166667,3.0,3.0,1.0,2.25,6.333333,-14.2,0.0,-1.141088,-0.926589
10,14.0,14.0,15.0,14.0,0.596285,0.355556,1.0,1.864141,12.0,0.679869,...,-3.0,2.0,3.0,1.0,0.0,0.0,-3.0,0.0,-1.038164,-0.965069


#### **上記1&2を同時に実行する場合**

In [5]:
X_extracted = extract_relevant_features(df, y, column_id='id', column_sort='time')
X_extracted.head(10)

Feature Extraction: 100%|██████████| 10/10 [00:13<00:00,  1.30s/it]


variable,F_x__value_count__value_-1,F_x__abs_energy,F_x__range_count__max_1__min_-1,F_y__abs_energy,T_y__standard_deviation,T_y__variance,"F_x__fft_coefficient__attr_""abs""__coeff_1","T_y__fft_coefficient__attr_""abs""__coeff_1",T_y__abs_energy,F_z__standard_deviation,...,"T_x__agg_linear_trend__attr_""intercept""__chunk_len_5__f_agg_""min""",T_x__number_peaks__n_1,T_y__number_cwt_peaks__n_1,T_y__count_below__t_0,"T_x__change_quantiles__f_agg_""var""__isabs_True__qh_0.2__ql_0.0","F_z__change_quantiles__f_agg_""mean""__isabs_True__qh_1.0__ql_0.8",T_x__quantile__q_0.1,F_y__has_duplicate_max,"F_y__cwt_coefficients__coeff_14__w_5__widths_(2, 5, 10, 20)","F_y__cwt_coefficients__coeff_13__w_2__widths_(2, 5, 10, 20)"
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,14.0,14.0,15.0,13.0,0.471405,0.222222,1.0,1.165352,10.0,1.203698,...,-3.0,1.0,4.0,1.0,0.0,0.0,-3.0,1.0,-0.751682,-0.310265
2,7.0,25.0,13.0,76.0,2.054805,4.222222,0.624118,6.020261,90.0,4.333846,...,-4.166667,4.0,4.0,0.933333,0.0,1.0,-9.2,1.0,0.057818,-0.202951
3,11.0,12.0,14.0,40.0,1.768867,3.128889,2.203858,8.235442,103.0,4.616877,...,-5.833333,6.0,3.0,0.866667,0.0,3.0,-6.6,0.0,0.912474,0.539121
4,5.0,16.0,10.0,60.0,2.669998,7.128889,0.844394,12.067855,124.0,3.833188,...,-9.333333,5.0,5.0,0.733333,0.0,0.0,-9.0,0.0,-0.609735,-2.64139
5,9.0,17.0,13.0,46.0,2.039608,4.16,2.730599,6.44533,180.0,4.841487,...,-11.833333,5.0,5.0,0.933333,0.0,0.0,-9.6,0.0,0.072771,0.591927
6,6.0,39.0,7.0,88.0,2.080598,4.328889,2.00182,2.82744,225.0,3.047768,...,-11.5,3.0,3.0,1.0,0.0,0.0,-12.0,1.0,0.475583,-0.600927
7,8.0,21.0,13.0,27.0,1.892676,3.582222,1.133819,12.822865,234.0,5.243409,...,-10.0,5.0,4.0,1.0,0.0,3.0,-10.0,0.0,-1.862009,-1.582648
8,8.0,26.0,9.0,24.0,2.445858,5.982222,2.09052,11.28589,213.0,4.364503,...,-12.333333,4.0,3.0,0.866667,0.0,0.0,-12.0,0.0,-1.32128,-2.080054
9,7.0,24.0,12.0,60.0,1.557776,2.426667,0.866097,4.619776,253.0,4.027682,...,-15.166667,3.0,3.0,1.0,2.25,6.333333,-14.2,0.0,-1.141088,-0.926589
10,14.0,14.0,15.0,14.0,0.596285,0.355556,1.0,1.864141,12.0,0.679869,...,-3.0,2.0,3.0,1.0,0.0,0.0,-3.0,0.0,-1.038164,-0.965069


### オリジナルの特徴量計算のパラメータを設定したい場合
- tsfreshではデフォルトで複数パラメータが設定されているが，独自で特定のパターンを試行したい場合にそれを設定する方法について．<br>
https://tsfresh.readthedocs.io/en/latest/text/feature_extraction_settings.html

- パラメータ設定のために辞書を用意する．
- 特徴量計算のための関数は下記を参考にして下さい．<br>
https://tsfresh.readthedocs.io/en/latest/text/list_of_features.html

- 今回は"abs_energy"と"agg_autocorrelation"をカスタマイズする場合を考えます．
    - まず，"abs_energy"については何もパラメータにを取らないため，値にはNoneを指定します．
    - 次に，"agg_autocorrelation"ですが，パラメータとして2つ持っています．
        -  {"f_agg": x, "maxlag", n}
            - "f_agg": (e.g. "mean", "var", "std", "median")
            - "maxlag": 数値
        - 複数渡す場合には，リストの中に辞書で指定します．
            - [{"f_agg": "mean", "maxlag": 10}, {"f_agg": "var", "maxlag": 10}]

In [6]:
settings = {
    "abs_energy": None,
    "agg_autocorrelation": [
        {"f_agg": "mean", "maxlag": 10}, 
        {"f_agg": "var", "maxlag": 10}
    ]
}

# default_fc_parametersの変数に上記settingsを渡す
# オリジナル6系列*3=18個のカラムが新規で生成されています
extract_features(df, column_id='id', column_sort='time', default_fc_parameters=settings)

Feature Extraction: 100%|██████████| 10/10 [00:00<00:00, 95.45it/s]


variable,F_x__abs_energy,"F_x__agg_autocorrelation__f_agg_""mean""__maxlag_10","F_x__agg_autocorrelation__f_agg_""var""__maxlag_10",F_y__abs_energy,"F_y__agg_autocorrelation__f_agg_""mean""__maxlag_10","F_y__agg_autocorrelation__f_agg_""var""__maxlag_10",F_z__abs_energy,"F_z__agg_autocorrelation__f_agg_""mean""__maxlag_10","F_z__agg_autocorrelation__f_agg_""var""__maxlag_10",T_x__abs_energy,"T_x__agg_autocorrelation__f_agg_""mean""__maxlag_10","T_x__agg_autocorrelation__f_agg_""var""__maxlag_10",T_y__abs_energy,"T_y__agg_autocorrelation__f_agg_""mean""__maxlag_10","T_y__agg_autocorrelation__f_agg_""var""__maxlag_10",T_z__abs_energy,"T_z__agg_autocorrelation__f_agg_""mean""__maxlag_10","T_z__agg_autocorrelation__f_agg_""var""__maxlag_10"
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
1,14.0,-0.061392,0.001609,13.0,-0.123987,0.007357,58678.0,-0.099196,0.089351,125.0,-0.142478,0.005302,10.0,-0.109317,0.214066,0.0,0.000000,0.000000
2,25.0,-0.043087,0.123896,76.0,-0.064303,0.071579,58190.0,-0.069212,0.064502,363.0,-0.069323,0.062164,90.0,-0.048264,0.019147,4.0,-0.045854,0.140164
3,12.0,-0.065179,0.131788,40.0,-0.074372,0.104138,56379.0,-0.049129,0.053616,344.0,-0.075855,0.075284,103.0,-0.102358,0.055735,4.0,-0.071814,0.122658
4,16.0,-0.042844,0.154795,60.0,-0.116249,0.089762,58253.0,-0.107814,0.092737,763.0,-0.126903,0.120203,124.0,-0.094291,0.112590,7.0,-0.102502,0.152445
5,17.0,-0.030882,0.065966,46.0,-0.016415,0.097687,55437.0,-0.137316,0.040002,849.0,-0.015876,0.099820,180.0,-0.045629,0.077288,6.0,-0.045927,0.139143
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
84,96833.0,-0.185616,0.534576,42780.0,-0.183280,0.474837,8870205.0,-0.164312,0.460855,1825597.0,-0.141968,0.392032,171261.0,-0.131743,0.133421,4988.0,-0.215347,0.473148
85,1683.0,-0.073497,0.339056,1523.0,-0.047347,0.149732,15083.0,-0.051402,0.304978,18023.0,-0.100098,0.432792,503.0,-0.069182,0.108132,250.0,-0.089653,0.388861
86,83497.0,-0.080953,0.371258,21064.0,-0.068996,0.348931,548520.0,-0.084549,0.376077,67981.0,-0.048548,0.336085,118013.0,-0.081471,0.355132,885.0,-0.129599,0.275626
87,1405437.0,-0.101384,0.267173,308658.0,-0.103966,0.191905,13953821.0,-0.120072,0.209294,247081.0,-0.033131,0.159993,2430295.0,-0.104072,0.377788,16513.0,-0.074082,0.380337
