Dans ce notebook, on applique une régression logistique aux fichiers :  
- xtrain_features.csv : fichier d'extraction des caractéristiques/features des séries temporelles initiales (xtrain.csv) 
- xtrain_features_diff.csv : fichier d'extraction des caractéristiques/features des séries temporelles différenciées

In [26]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import cohen_kappa_score, make_scorer 
from sklearn.model_selection import GridSearchCV

In [27]:
ytrain = pd.read_csv("ytrain.csv")
ytrain

Unnamed: 0,ID,TARGET
0,0,0
1,1,0
2,2,0
3,3,0
4,4,1
...,...,...
16630,16630,0
16631,16631,0
16632,16632,1
16633,16633,0


**Sur les features des séries temporelles**

In [28]:
df = pd.read_csv("xtrain_features.csv")
xtrain_features = df.set_index("Unnamed: 0")
xtrain_features

Unnamed: 0_level_0,timeseries__sample_entropy,timeseries__approximate_entropy__m_2__r_0.1,"timeseries__change_quantiles__f_agg_""mean""__isabs_True__qh_1.0__ql_0.2","timeseries__change_quantiles__f_agg_""mean""__isabs_False__qh_1.0__ql_0.2","timeseries__change_quantiles__f_agg_""var""__isabs_False__qh_1.0__ql_0.8","timeseries__change_quantiles__f_agg_""var""__isabs_True__qh_1.0__ql_0.8","timeseries__agg_linear_trend__attr_""slope""__chunk_len_10__f_agg_""max""","timeseries__agg_linear_trend__attr_""slope""__chunk_len_5__f_agg_""max""","timeseries__change_quantiles__f_agg_""mean""__isabs_False__qh_1.0__ql_0.0","timeseries__change_quantiles__f_agg_""mean""__isabs_True__qh_1.0__ql_0.0",...,timeseries__number_cwt_peaks__n_1,timeseries__autocorrelation__lag_5,"timeseries__fft_coefficient__attr_""angle""__coeff_3","timeseries__agg_linear_trend__attr_""intercept""__chunk_len_5__f_agg_""min""",timeseries__ratio_beyond_r_sigma__r_1.5,"timeseries__agg_linear_trend__attr_""stderr""__chunk_len_5__f_agg_""mean""","timeseries__fft_coefficient__attr_""angle""__coeff_13",timeseries__energy_ratio_by_chunks__num_segments_10__segment_focus_2,timeseries__symmetry_looking__r_0.30000000000000004,timeseries__energy_ratio_by_chunks__num_segments_10__segment_focus_5
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0.0,0.193750,0.226090,0.324750,0.324750,0.129581,0.129581,3.540994,1.838641,0.344697,0.344697,...,9.0,0.754340,110.750247,-0.651201,0.10,0.137178,132.245979,0.020893,1.0,0.077261
1.0,0.160582,0.190049,0.484414,0.484414,3.774767,3.774767,4.420701,2.124722,0.409601,0.409601,...,6.0,0.661429,100.708984,-1.861140,0.16,0.276853,115.310995,0.004118,1.0,0.038834
2.0,0.203912,0.275311,0.677695,0.677695,0.559479,0.559479,6.620397,3.245355,0.668579,0.668579,...,7.0,0.750215,94.359401,0.224009,0.16,0.121779,138.233171,0.018934,1.0,0.073612
3.0,0.160343,0.130018,0.281847,0.281847,0.092363,0.092363,2.902878,1.462427,0.349461,0.349461,...,7.0,0.689245,99.799916,1.999359,0.16,0.108171,141.479963,0.045261,1.0,0.108472
4.0,0.157629,0.157558,0.467135,0.467135,0.581337,0.581337,4.448250,1.968794,0.389578,0.389578,...,8.0,0.741157,87.816823,-1.581613,0.10,0.145201,145.515177,0.003474,1.0,0.059801
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16630.0,0.149940,0.151933,0.822118,0.822118,0.338657,0.338657,8.393611,4.220648,0.710134,0.710134,...,9.0,0.844717,99.744456,-3.691673,0.04,0.241070,121.426973,0.003067,1.0,0.096094
16631.0,0.219054,0.233640,0.454416,0.454416,0.119214,0.119214,4.976133,2.292109,0.431743,0.431743,...,6.0,0.779466,94.971796,-0.974122,0.10,0.151793,134.218808,0.014791,1.0,0.057002
16632.0,0.194156,0.222595,0.303025,0.303025,0.005432,0.005432,2.790752,1.449439,0.305692,0.305692,...,8.0,0.757474,102.944270,0.760490,0.06,0.107374,142.346451,0.019070,1.0,0.097794
16633.0,0.108854,0.180572,0.455705,0.455705,2.750650,2.750650,4.230573,1.630265,0.500315,0.500315,...,5.0,0.583246,77.482048,5.344467,0.06,0.125700,138.968044,0.051779,1.0,0.095051


On normalise les données

In [29]:
sc = StandardScaler()
xtrain_features_std = sc.fit_transform(xtrain_features)
xtrain_features_std

array([[ 0.09876991,  0.05750557, -1.30703775, ..., -0.05442882,
         0.01733959, -0.5350033 ],
       [-0.59696466, -0.66249312, -0.42305783, ..., -1.21495815,
         0.01733959, -2.16987914],
       [ 0.31192311,  1.04084171,  0.64703754, ..., -0.18994327,
         0.01733959, -0.69025778],
       ...,
       [ 0.10728103, -0.01231134, -1.42731834, ..., -0.18055961,
         0.01733959,  0.33856764],
       [-1.68197541, -0.85182785, -0.5820098 , ...,  2.08239714,
         0.01733959,  0.22187424],
       [-0.01646946, -1.21125232,  0.5716369 , ..., -0.93930891,
         0.01733959,  1.01655609]])

Grille de recherche par validation croisée pour sélectionner les paramètres

In [30]:
params = {
    "penalty" : ["l2", "l1", "elasticnet", "none"],
    "solver" : ["newton-cg", "lbfgs", "liblinear", "sag", "saga"]
}

model = LogisticRegression()
grid = GridSearchCV(model, param_grid=params, cv=5, scoring = make_scorer(cohen_kappa_score))

In [31]:
grid.fit(xtrain_features_std,ytrain['TARGET'])

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

Traceback (most recent call last):
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 443, in _check_solver
    raise ValueError("Solver %s supports only 'l2' or 'none' penalties, "
ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty.

Traceback (most recent call last):
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.

Traceback (most recent call last):
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 443, in _check_solver
    raise ValueError("Solver %s supports only 'l2' or 'none' penalties, "
ValueError: Solver sag supports only 'l2' or 'none' penalties, got l1 penalty.

Traceback (most recent call last):
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.pe

Traceback (most recent call last):
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 443, in _check_solver
    raise ValueError("Solver %s supports only 'l2' or 'none' penalties, "
ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got elasticnet penalty.

Traceback (most recent call last):
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 1306, in fit
    solver = _check_solver(self.solve

Traceback (most recent call last):
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 443, in _check_solver
    raise ValueError("Solver %s supports only 'l2' or 'none' penalties, "
ValueError: Solver sag supports only 'l2' or 'none' penalties, got elasticnet penalty.

Traceback (most recent call last):
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver,

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt



GridSearchCV(cv=5, estimator=LogisticRegression(),
             param_grid={'penalty': ['l2', 'l1', 'elasticnet', 'none'],
                         'solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag',
                                    'saga']},
             scoring=make_scorer(cohen_kappa_score))

In [32]:
grid.best_params_

{'penalty': 'none', 'solver': 'newton-cg'}

In [33]:
grid.coef_

AttributeError: 'GridSearchCV' object has no attribute 'coef_'

**Résultats :** {'penalty': 'none', 'solver': 'newton-cg'}

In [8]:
grid.best_score_ # 0.15568513757882108

0.15568513757882108

Prédiction sur le jeu de données xtest_features

In [9]:
df = pd.read_csv("xtest_features.csv")
xtest_features = df.set_index("Unnamed: 0")
xtest_features

Unnamed: 0_level_0,timeseries__sample_entropy,timeseries__approximate_entropy__m_2__r_0.1,"timeseries__change_quantiles__f_agg_""mean""__isabs_True__qh_1.0__ql_0.2","timeseries__change_quantiles__f_agg_""mean""__isabs_False__qh_1.0__ql_0.2","timeseries__change_quantiles__f_agg_""var""__isabs_False__qh_1.0__ql_0.8","timeseries__change_quantiles__f_agg_""var""__isabs_True__qh_1.0__ql_0.8","timeseries__agg_linear_trend__attr_""slope""__chunk_len_10__f_agg_""max""","timeseries__agg_linear_trend__attr_""slope""__chunk_len_5__f_agg_""max""","timeseries__change_quantiles__f_agg_""mean""__isabs_False__qh_1.0__ql_0.0","timeseries__change_quantiles__f_agg_""mean""__isabs_True__qh_1.0__ql_0.0",...,timeseries__number_cwt_peaks__n_1,timeseries__autocorrelation__lag_5,"timeseries__fft_coefficient__attr_""angle""__coeff_3","timeseries__agg_linear_trend__attr_""intercept""__chunk_len_5__f_agg_""min""",timeseries__ratio_beyond_r_sigma__r_1.5,"timeseries__agg_linear_trend__attr_""stderr""__chunk_len_5__f_agg_""mean""","timeseries__fft_coefficient__attr_""angle""__coeff_13",timeseries__energy_ratio_by_chunks__num_segments_10__segment_focus_2,timeseries__symmetry_looking__r_0.30000000000000004,timeseries__energy_ratio_by_chunks__num_segments_10__segment_focus_5
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
16635.0,0.121607,0.123529,0.753274,0.753274,7.288189,7.288189,6.340764,2.610259,0.698793,0.698793,...,6.0,0.442966,80.493517,-0.534596,0.10,0.494944,144.399144,0.020565,1.0,0.027834
16636.0,0.223144,0.292117,0.350732,0.350732,0.349094,0.349094,3.355698,1.619677,0.339123,0.339123,...,6.0,0.733578,87.009202,0.491096,0.10,0.056686,144.374336,0.021258,1.0,0.085178
16637.0,0.149812,0.229125,0.503895,0.503895,0.201876,0.201876,5.333019,2.432423,0.482505,0.482505,...,8.0,0.765101,101.299117,-0.590231,0.12,0.180203,139.423089,0.015585,1.0,0.055981
16638.0,0.162056,0.236372,0.396272,0.396272,0.035306,0.035306,4.271655,2.124296,0.406558,0.406558,...,8.0,0.808433,102.486897,0.466630,0.06,0.109715,133.964808,0.020946,1.0,0.117782
16639.0,0.154151,0.306863,0.444043,0.444043,0.088617,0.088617,4.335870,2.266393,0.484373,0.484373,...,7.0,0.776346,98.181019,8.477647,0.12,0.269566,157.400097,0.049494,1.0,0.134362
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28599.0,0.158224,0.248356,0.459338,0.459338,0.024556,0.024556,4.724064,2.388546,0.419595,0.419595,...,7.0,0.842942,97.440805,0.773260,0.04,0.220562,136.272243,0.014415,1.0,0.137223
28600.0,0.164472,0.194955,0.546620,0.546620,0.050300,0.050300,5.951148,2.990079,0.480130,0.480130,...,7.0,0.827476,104.023770,-2.919042,0.00,0.264601,155.559212,0.002979,1.0,0.093136
28601.0,0.153122,0.151278,0.432969,0.432969,0.091960,0.091960,4.836023,2.523213,0.504249,0.504249,...,7.0,0.789417,95.191904,1.126249,0.10,0.141956,137.851779,0.030784,1.0,0.086302
28602.0,0.207014,0.241513,0.993444,0.993444,0.463323,0.463323,9.649226,4.744064,0.836458,0.836458,...,9.0,0.801192,99.682852,-3.537845,0.10,0.234130,142.249431,0.005540,1.0,0.067937


On normalise les données

In [10]:
xtest_std_features = sc.fit_transform(xtest_features)
xtest_std_features

array([[-1.59736193, -2.32638984,  1.26625645, ..., -0.05993097,
         0.00914091, -2.89532435],
       [ 0.3529038 ,  1.10504612, -1.0491533 , ..., -0.00641539,
         0.00914091, -0.24626924],
       [-1.05561021, -0.1770868 , -0.16816508, ..., -0.44415428,
         0.00914091, -1.59505586],
       ...,
       [-0.99203085, -1.76157378, -0.57612596, ...,  0.72863751,
         0.00914091, -0.19436278],
       [ 0.04309791,  0.07506964,  2.64770396, ..., -1.21923378,
         0.00914091, -1.04274669],
       [ 0.3529038 , -0.27475852, -0.72190654, ..., -0.85858231,
         0.00914091, -0.28110222]])

Prédiction de ytest

In [11]:
y_test_pred_features = grid.predict(xtest_std_features)
y_test_pred_features

array([0, 0, 0, ..., 0, 0, 0], dtype=int64)

In [12]:
ytest_features = pd.DataFrame(np.transpose([y_test_pred_features]),columns=['TARGET'], index = xtest_features.index)
ytest_features

Unnamed: 0_level_0,TARGET
Unnamed: 0,Unnamed: 1_level_1
16635.0,0
16636.0,0
16637.0,0
16638.0,0
16639.0,0
...,...
28599.0,0
28600.0,0
28601.0,0
28602.0,0


In [13]:
ytest_features.to_csv("ytest_RegLogistique_features.csv") # 0.1830914069408286

**Sur les features des différenciations des séries temporelles**

In [14]:
df = pd.read_csv("xtrain_features_diff.csv")
xtrain_features_diff = df.set_index("Unnamed: 0")
xtrain_features_diff

Unnamed: 0_level_0,"timeseries__change_quantiles__f_agg_""var""__isabs_True__qh_0.8__ql_0.0","timeseries__change_quantiles__f_agg_""var""__isabs_True__qh_0.8__ql_0.2","timeseries__change_quantiles__f_agg_""var""__isabs_False__qh_0.8__ql_0.2","timeseries__change_quantiles__f_agg_""var""__isabs_False__qh_0.8__ql_0.0","timeseries__change_quantiles__f_agg_""mean""__isabs_True__qh_0.8__ql_0.2",timeseries__number_crossing_m__m_1,timeseries__quantile__q_0.8,"timeseries__change_quantiles__f_agg_""mean""__isabs_True__qh_0.8__ql_0.0","timeseries__change_quantiles__f_agg_""mean""__isabs_True__qh_0.8__ql_0.4","timeseries__change_quantiles__f_agg_""var""__isabs_False__qh_0.8__ql_0.4",...,"timeseries__cwt_coefficients__coeff_5__w_10__widths_(2, 5, 10, 20)",timeseries__fourier_entropy__bins_10,timeseries__fourier_entropy__bins_3,"timeseries__cwt_coefficients__coeff_12__w_5__widths_(2, 5, 10, 20)",timeseries__spkt_welch_density__coeff_2,"timeseries__change_quantiles__f_agg_""mean""__isabs_False__qh_0.4__ql_0.0","timeseries__change_quantiles__f_agg_""var""__isabs_True__qh_0.4__ql_0.2","timeseries__fft_coefficient__attr_""angle""__coeff_9","timeseries__cwt_coefficients__coeff_2__w_2__widths_(2, 5, 10, 20)","timeseries__cwt_coefficients__coeff_11__w_5__widths_(2, 5, 10, 20)"
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0.0,0.011913,0.009518,0.016854,0.021102,0.085683,12.0,0.409775,0.096660,0.152824,0.032779,...,0.622376,2.002422,0.962268,-0.575716,0.427483,0.004635,0.000007,-107.124646,-0.595981,-0.153257
1.0,0.010699,0.004804,0.014141,0.027789,0.103989,4.0,0.406532,0.131954,0.101032,0.013038,...,0.172836,1.561577,0.717435,0.431913,1.169228,-0.012734,0.000044,167.456367,0.197195,0.300966
2.0,0.110421,0.099061,0.262434,0.246618,0.419598,22.0,1.291991,0.373620,0.420704,0.276465,...,1.089434,1.765863,0.784168,-0.027041,0.685523,-0.029013,0.000008,-165.251476,0.873942,-0.000048
3.0,0.017951,0.012952,0.061671,0.066428,0.222470,2.0,0.606453,0.220386,0.154025,0.028123,...,1.346111,1.822475,0.784168,0.167462,0.009206,-0.024982,0.000000,156.557516,0.390120,0.455975
4.0,0.010034,0.009394,0.024372,0.023386,0.127646,7.0,0.478790,0.117257,0.087549,0.010716,...,-0.162635,2.247928,1.097032,0.254621,0.065976,0.002656,0.000000,63.063402,0.038476,0.225085
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16630.0,0.104233,0.081901,0.171924,0.226722,0.300127,18.0,1.113933,0.350125,0.252433,0.065014,...,-0.774115,1.975954,0.917357,-0.534530,3.059265,0.022851,0.001003,113.225896,0.202075,-0.637368
16631.0,0.019013,0.010181,0.038993,0.059355,0.170019,12.0,0.595735,0.201133,0.148558,0.026933,...,0.292279,2.144362,1.020961,0.047114,0.699950,-0.013463,0.001746,32.218127,0.492781,0.227888
16632.0,0.004585,0.003518,0.014813,0.013330,0.107714,10.0,0.403754,0.093978,0.126781,0.019288,...,0.789544,1.519717,0.600483,1.123892,1.540523,0.000266,0.000042,-123.262751,0.677987,0.876644
16633.0,0.016188,0.016046,0.029296,0.025343,0.115497,4.0,0.523598,0.097584,0.106815,0.013178,...,0.826462,1.451389,0.443307,-0.838888,1.367637,-0.005065,0.000000,55.740901,1.642314,-0.893893


On normalise les données

In [15]:
sc = StandardScaler()
xtrain_features_diff_std = sc.fit_transform(xtrain_features_diff)
xtrain_features_diff_std

array([[-0.91093019, -0.86195538, -0.95792964, ..., -1.0229813 ,
        -1.21178181, -0.3403365 ],
       [-0.93930054, -0.98189416, -0.98567033, ...,  1.63424984,
        -0.27943306,  0.13098208],
       [ 1.39126126,  1.4158722 ,  1.55351706, ..., -1.58549807,
         0.51605683, -0.181361  ],
       ...,
       [-1.08217686, -1.01459024, -0.97879774, ..., -1.17915625,
         0.28571992,  0.72832801],
       [-0.8110073 , -0.69590073, -0.83068242, ...,  0.55313424,
         1.41924965, -1.10884784],
       [-0.21801576, -0.05527985, -0.06125305, ...,  0.5890929 ,
        -0.42367994,  2.00412651]])

Grille de recherche par validation croisée pour sélectionner les paramètres

In [16]:
params = {
    "penalty" : ["l2", "l1", "elasticnet", "none"],
    "solver" : ["newton-cg", "lbfgs", "liblinear", "sag", "saga"]
}

model_diff = LogisticRegression()
grid_diff = GridSearchCV(model_diff, param_grid=params, cv=5, scoring = make_scorer(cohen_kappa_score))

In [17]:
grid_diff.fit(xtrain_features_diff_std,ytrain['TARGET'])

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

Traceback (most recent call last):
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 443, in _check_solver
    raise ValueError("Solver %s supports only 'l2' or 'none' penalties, "
ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty.

Traceback (most recent call last):
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.

Traceback (most recent call last):
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 443, in _check_solver
    raise ValueError("Solver %s supports only 'l2' or 'none' penalties, "
ValueError: Solver newton-cg supports only 'l2' or 'none' penalties, got elasticnet penalty.

Traceback (most recent call last):
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 1306, in fit
    solver = _check_solver(self.s

Traceback (most recent call last):
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 443, in _check_solver
    raise ValueError("Solver %s supports only 'l2' or 'none' penalties, "
ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got elasticnet penalty.

Traceback (most recent call last):
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 1306, in fit
    solver = _check_solver(self.solve

Traceback (most recent call last):
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 443, in _check_solver
    raise ValueError("Solver %s supports only 'l2' or 'none' penalties, "
ValueError: Solver sag supports only 'l2' or 'none' penalties, got elasticnet penalty.

Traceback (most recent call last):
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver,

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
Traceback (most recent call last):
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 1306, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "C:\Users\fanny\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 454, in _check_solver
    raise ValueError(
ValueError: penalty='none' is not supported for the liblinear solver

Traceback (most recent call last):
  File "C

GridSearchCV(cv=5, estimator=LogisticRegression(),
             param_grid={'penalty': ['l2', 'l1', 'elasticnet', 'none'],
                         'solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag',
                                    'saga']},
             scoring=make_scorer(cohen_kappa_score))

In [18]:
grid_diff.best_params_

{'penalty': 'none', 'solver': 'newton-cg'}

**Résultats :** {'penalty': 'none', 'solver': 'newton-cg'}

In [19]:
grid_diff.best_score_ # 0.1904532935305034

0.1904532935305034

Prédiction sur le jeu de données xtest_features_diff

In [20]:
df = pd.read_csv("xtest_features_diff.csv")
xtest_features_diff = df.set_index("Unnamed: 0")
xtest_features_diff

Unnamed: 0_level_0,"timeseries__change_quantiles__f_agg_""var""__isabs_True__qh_0.8__ql_0.0","timeseries__change_quantiles__f_agg_""var""__isabs_True__qh_0.8__ql_0.2","timeseries__change_quantiles__f_agg_""var""__isabs_False__qh_0.8__ql_0.2","timeseries__change_quantiles__f_agg_""var""__isabs_False__qh_0.8__ql_0.0","timeseries__change_quantiles__f_agg_""mean""__isabs_True__qh_0.8__ql_0.2",timeseries__number_crossing_m__m_1,timeseries__quantile__q_0.8,"timeseries__change_quantiles__f_agg_""mean""__isabs_True__qh_0.8__ql_0.0","timeseries__change_quantiles__f_agg_""mean""__isabs_True__qh_0.8__ql_0.4","timeseries__change_quantiles__f_agg_""var""__isabs_False__qh_0.8__ql_0.4",...,"timeseries__cwt_coefficients__coeff_5__w_10__widths_(2, 5, 10, 20)",timeseries__fourier_entropy__bins_10,timeseries__fourier_entropy__bins_3,"timeseries__cwt_coefficients__coeff_12__w_5__widths_(2, 5, 10, 20)",timeseries__spkt_welch_density__coeff_2,"timeseries__change_quantiles__f_agg_""mean""__isabs_False__qh_0.4__ql_0.0","timeseries__change_quantiles__f_agg_""var""__isabs_True__qh_0.4__ql_0.2","timeseries__fft_coefficient__attr_""angle""__coeff_9","timeseries__cwt_coefficients__coeff_2__w_2__widths_(2, 5, 10, 20)","timeseries__cwt_coefficients__coeff_11__w_5__widths_(2, 5, 10, 20)"
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
16635.0,0.008752,0.012270,0.024944,0.016634,0.117567,8.0,0.736378,0.089920,0.160402,0.041839,...,1.115053,0.606025,0.334221,0.128814,0.067887,-0.005516,0.000000e+00,165.113017,-0.140541,0.505175
16636.0,0.026058,0.022927,0.061538,0.065117,0.206561,6.0,0.569950,0.197725,0.227904,0.067881,...,0.683538,1.746093,0.703843,0.631488,0.297350,-0.014695,3.381593e-04,73.908894,0.281635,0.551133
16637.0,0.081459,0.056519,0.310188,0.196602,0.504789,11.0,0.915699,0.341397,0.448457,0.243881,...,0.492381,1.070484,0.443307,-0.237142,0.522301,-0.009791,0.000000e+00,69.659195,0.804684,-0.161567
16638.0,0.022529,0.016620,0.048234,0.063532,0.183571,10.0,0.603463,0.203350,0.102586,0.009669,...,0.475667,1.703073,1.076819,-0.172636,1.072110,-0.044528,1.200986e-04,11.933781,0.525607,0.002843
16639.0,0.022949,0.018739,0.030336,0.030826,0.113333,11.0,0.761016,0.097214,0.155947,0.052343,...,1.377251,2.022192,0.900724,0.442644,1.302701,-0.002340,6.097713e-07,-46.031721,0.747671,0.683450
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28599.0,0.021468,0.015385,0.064163,0.044485,0.221287,12.0,0.538475,0.151748,0.174548,0.031572,...,-0.030624,1.987670,0.997874,0.575027,2.276828,0.002748,0.000000e+00,-158.019629,0.131963,0.665400
28600.0,0.012762,0.008779,0.020907,0.022470,0.110131,10.0,0.576433,0.099767,0.157847,0.034739,...,-0.868285,2.120663,1.074092,-1.356811,7.986362,-0.006617,5.185479e-05,-7.915066,-0.234166,-1.098350
28601.0,0.045381,0.035189,0.101397,0.109663,0.270537,9.0,0.875044,0.253679,0.196123,0.047471,...,1.105994,1.512379,0.443307,-0.345402,2.958113,0.026155,1.102795e-03,-65.586543,0.081615,0.225278
28602.0,0.087013,0.096726,0.260845,0.227008,0.407345,20.0,1.367181,0.375643,0.452461,0.242793,...,-0.169060,2.043122,0.973253,0.583145,0.355878,0.107948,0.000000e+00,-7.829579,0.229572,0.241095


On normalise les données

In [21]:
xtest_std_features_diff = sc.fit_transform(xtest_features_diff)
xtest_std_features_diff

array([[-1.02506204, -0.84410514, -0.92473679, ...,  1.58514438,
        -0.72996799,  0.40827598],
       [-0.63484075, -0.58271917, -0.57018624, ...,  0.71503968,
        -0.18740784,  0.46148086],
       [ 0.61434394,  0.24113421,  1.8389124 , ...,  0.67449675,
         0.4847905 , -0.36360094],
       ...,
       [-0.19915082, -0.28200734, -0.18399808, ..., -0.61577334,
        -0.4444633 ,  0.08424342],
       [ 0.73958501,  1.22722665,  1.3608404 , ..., -0.06476091,
        -0.2543169 ,  0.10255433],
       [-0.68775498, -0.31492364, -0.44427663, ...,  1.46262157,
        -0.53165301, -0.33760514]])

Prédiction de ytest

In [22]:
y_test_pred_features_diff = grid_diff.predict(xtest_std_features_diff)
y_test_pred_features_diff

array([0, 0, 0, ..., 1, 0, 0], dtype=int64)

In [23]:
ytest_features_diff = pd.DataFrame(np.transpose([y_test_pred_features_diff]),columns=['TARGET'], index = xtest_features_diff.index)
ytest_features_diff

Unnamed: 0_level_0,TARGET
Unnamed: 0,Unnamed: 1_level_1
16635.0,0
16636.0,0
16637.0,0
16638.0,1
16639.0,0
...,...
28599.0,0
28600.0,0
28601.0,1
28602.0,0


In [25]:
ytest_features_diff.value_counts()

TARGET
0         10125
1          1844
dtype: int64

In [24]:
ytest_features_diff.to_csv("ytest_RegLogistique_features_diff.csv") # 0.2622228632005762