# Tsfresh (Time Series Feature Extractor)

The idea of this notebook is providing a quick way to obtain strong baselines with a library to automatically extract time-series features


From time-series data like:
<img src="images/introduction_ts_exa.png" width="400" height="200">

Extracts features like:
<img src="images/introduction_ts_exa_features.png" width="400" height="200">

- Relativelly simple to use
- Extract +700 features with few lines of code
- Feature prunning (if needed) with Benjamini-Yekutieli procedure.

In [2]:
import pandas as pd
import numpy as np
import tsfresh
from tsfresh.feature_extraction import extract_features
from tsfresh.feature_extraction import MinimalFCParameters, ComprehensiveFCParameters, EfficientFCParameters
from tsfresh.utilities.dataframe_functions import make_forecasting_frame, roll_time_series

In [3]:
df = pd.read_csv("./datasets/tutorial_sleep_training_data.csv.gz")
# Removing Nan values from HR
df = df.dropna()

print("Original Dataframe has %d rows" % df.shape[0])

Original Dataframe has 205361 rows


In [3]:
df_small = df[df["pid"].isin([1, 16])].dropna()
df_small

Unnamed: 0,time,act,sleep_phase,hr,pid
29,29,0.0,0.0,73.0,1
59,59,0.0,0.0,75.0,1
89,89,0.0,0.0,76.0,1
119,119,0.0,0.0,75.0,1
149,149,85.0,0.0,80.0,1
...,...,...,...,...,...
65999,27629,0.0,2.0,66.0,16
66029,27659,0.0,2.0,70.0,16
66059,27689,19.0,0.0,96.0,16
66089,27719,43.0,0.0,89.0,16


In [4]:
df_extracted_features = tsfresh.extract_features(df_small[["time", "pid", "act"]],
                                                 column_id="pid", 
                                                 column_sort="time")

Feature Extraction: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.68it/s]


In [5]:
df_extracted_features

Unnamed: 0,act__variance_larger_than_standard_deviation,act__has_duplicate_max,act__has_duplicate_min,act__has_duplicate,act__sum_values,act__abs_energy,act__mean_abs_change,act__mean_change,act__mean_second_derivative_central,act__median,...,act__permutation_entropy__dimension_6__tau_1,act__permutation_entropy__dimension_7__tau_1,act__query_similarity_count__query_None__threshold_0.0,"act__matrix_profile__feature_""min""__threshold_0.98","act__matrix_profile__feature_""max""__threshold_0.98","act__matrix_profile__feature_""mean""__threshold_0.98","act__matrix_profile__feature_""median""__threshold_0.98","act__matrix_profile__feature_""25""__threshold_0.98","act__matrix_profile__feature_""75""__threshold_0.98",act__mean_n_absolute_max__number_of_maxima_7
1,1.0,0.0,1.0,1.0,15845.0,2334611.0,15.110329,0.000782,0.000392,0.0,...,2.692801,3.131848,,1.924333,11.661904,5.445419,5.625046,4.562515,6.524602,312.0
16,1.0,0.0,1.0,1.0,3688.0,708876.0,6.247835,0.156926,0.055255,0.0,...,0.801398,0.966575,,3.728131,13.168912,9.26142,9.63258,7.906907,10.772115,257.857143


In [6]:
df_rolled = roll_time_series(df_small[["time", "hr", "act", "pid", "sleep_phase"]], 
                             column_id="pid", 
                             min_timeshift=1, 
                             max_timeshift=3,
                             n_jobs=3)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["sort"] = range(df.shape[0])
Rolling: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:01<00:00, 14.64it/s]


In [7]:
df_rolled.head(15)

Unnamed: 0,time,hr,act,pid,sleep_phase,sort,id
0,29,73.0,0.0,1,0.0,0,"(1, 1)"
1,59,75.0,0.0,1,0.0,1,"(1, 1)"
4,29,73.0,0.0,1,0.0,0,"(1, 2)"
5,59,75.0,0.0,1,0.0,1,"(1, 2)"
6,89,76.0,0.0,1,0.0,2,"(1, 2)"
10,29,73.0,0.0,1,0.0,0,"(1, 3)"
11,59,75.0,0.0,1,0.0,1,"(1, 3)"
12,89,76.0,0.0,1,0.0,2,"(1, 3)"
13,119,75.0,0.0,1,0.0,3,"(1, 3)"
18,59,75.0,0.0,1,0.0,1,"(1, 4)"


In [9]:
df_rolled = roll_time_series(df_small[["time", "hr", "act", "pid", "sleep_phase"]], 
                             column_id="pid", 
                             min_timeshift=0, 
                             max_timeshift=12,
                             n_jobs=3)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["sort"] = range(df.shape[0])
Rolling: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 15.49it/s]


In [10]:
df_rolled.head(15)

Unnamed: 0,time,hr,act,pid,sleep_phase,sort,id
4472,29,73.0,0.0,1,0.0,0,"(1, 0)"
4474,29,73.0,0.0,1,0.0,0,"(1, 1)"
4475,59,75.0,0.0,1,0.0,1,"(1, 1)"
4478,29,73.0,0.0,1,0.0,0,"(1, 2)"
4479,59,75.0,0.0,1,0.0,1,"(1, 2)"
4480,89,76.0,0.0,1,0.0,2,"(1, 2)"
4484,29,73.0,0.0,1,0.0,0,"(1, 3)"
4485,59,75.0,0.0,1,0.0,1,"(1, 3)"
4486,89,76.0,0.0,1,0.0,2,"(1, 3)"
4487,119,75.0,0.0,1,0.0,3,"(1, 3)"


In [11]:
df_extracted_features = tsfresh.extract_features(df_rolled[["id", "time", "act", "hr"]], 
                                                 column_id="id",
                                                 column_sort="time",
                                                 default_fc_parameters=MinimalFCParameters(),
                                                 n_jobs=3)

Feature Extraction: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 19.78it/s]


In [12]:
df_features = df_extracted_features.reset_index().rename(columns={"level_0":"pid", "level_1": "time"})
df_features

Unnamed: 0,pid,time,act__sum_values,act__median,act__mean,act__length,act__standard_deviation,act__variance,act__root_mean_square,act__maximum,...,hr__sum_values,hr__median,hr__mean,hr__length,hr__standard_deviation,hr__variance,hr__root_mean_square,hr__maximum,hr__absolute_maximum,hr__minimum
0,1,0,0.0,0.0,0.000000,1.0,0.000000,0.000000,0.000000,0.0,...,73.0,73.0,73.000000,1.0,0.000000,0.000000,73.000000,73.0,73.0,73.0
1,1,1,0.0,0.0,0.000000,2.0,0.000000,0.000000,0.000000,0.0,...,148.0,74.0,74.000000,2.0,1.000000,1.000000,74.006756,75.0,75.0,73.0
2,1,2,0.0,0.0,0.000000,3.0,0.000000,0.000000,0.000000,0.0,...,224.0,75.0,74.666667,3.0,1.247219,1.555556,74.677083,76.0,76.0,73.0
3,1,3,0.0,0.0,0.000000,4.0,0.000000,0.000000,0.000000,0.0,...,299.0,75.0,74.750000,4.0,1.089725,1.187500,74.757943,76.0,76.0,73.0
4,1,4,85.0,0.0,17.000000,5.0,34.000000,1156.000000,38.013156,85.0,...,379.0,75.0,75.800000,5.0,2.315167,5.360000,75.835348,80.0,80.0,73.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2199,16,920,0.0,0.0,0.000000,13.0,0.000000,0.000000,0.000000,0.0,...,837.0,65.0,64.384615,13.0,1.688808,2.852071,64.406760,67.0,67.0,60.0
2200,16,921,0.0,0.0,0.000000,13.0,0.000000,0.000000,0.000000,0.0,...,843.0,65.0,64.846154,13.0,2.247944,5.053254,64.885106,70.0,70.0,60.0
2201,16,922,19.0,0.0,1.461538,13.0,5.062918,25.633136,5.269652,19.0,...,879.0,65.0,67.615385,13.0,8.380733,70.236686,68.132789,96.0,96.0,63.0
2202,16,923,62.0,0.0,4.769231,13.0,12.134844,147.254438,13.038405,43.0,...,904.0,65.0,69.538462,13.0,10.035440,100.710059,70.258862,96.0,96.0,63.0


In [13]:
df_features[df_features["pid"] == 1]

Unnamed: 0,pid,time,act__sum_values,act__median,act__mean,act__length,act__standard_deviation,act__variance,act__root_mean_square,act__maximum,...,hr__sum_values,hr__median,hr__mean,hr__length,hr__standard_deviation,hr__variance,hr__root_mean_square,hr__maximum,hr__absolute_maximum,hr__minimum
0,1,0,0.0,0.0,0.000000,1.0,0.000000,0.000000,0.000000,0.0,...,73.0,73.0,73.000000,1.0,0.000000,0.000000,73.000000,73.0,73.0,73.0
1,1,1,0.0,0.0,0.000000,2.0,0.000000,0.000000,0.000000,0.0,...,148.0,74.0,74.000000,2.0,1.000000,1.000000,74.006756,75.0,75.0,73.0
2,1,2,0.0,0.0,0.000000,3.0,0.000000,0.000000,0.000000,0.0,...,224.0,75.0,74.666667,3.0,1.247219,1.555556,74.677083,76.0,76.0,73.0
3,1,3,0.0,0.0,0.000000,4.0,0.000000,0.000000,0.000000,0.0,...,299.0,75.0,74.750000,4.0,1.089725,1.187500,74.757943,76.0,76.0,73.0
4,1,4,85.0,0.0,17.000000,5.0,34.000000,1156.000000,38.013156,85.0,...,379.0,75.0,75.800000,5.0,2.315167,5.360000,75.835348,80.0,80.0,73.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1274,1,1274,303.0,0.0,23.307692,13.0,53.261885,2836.828402,58.138429,194.0,...,875.0,67.0,67.307692,13.0,8.533956,72.828402,67.846546,87.0,87.0,44.0
1275,1,1275,362.0,0.0,27.846154,13.0,53.595129,2872.437870,60.397402,194.0,...,876.0,67.0,67.384615,13.0,8.553348,73.159763,67.925298,87.0,87.0,44.0
1276,1,1276,368.0,0.0,28.307692,13.0,53.378741,2849.289941,60.420323,194.0,...,879.0,67.0,67.615385,13.0,8.580286,73.621302,68.157623,87.0,87.0,44.0
1277,1,1277,368.0,0.0,28.307692,13.0,53.378741,2849.289941,60.420323,194.0,...,886.0,68.0,68.153846,13.0,8.742875,76.437870,68.712332,87.0,87.0,44.0


In [None]:
# Do we have the same number of rows?
df_small[df_small["pid"] == 1][["sleep_phase"]]

In [4]:
# So we apply this transformation/feature extraction to the whole dataset:
df_rolled = roll_time_series(df[["time", "hr", "act", "pid", "sleep_phase"]].dropna().copy(), 
                             column_id="pid", 
                             min_timeshift=0, 
                             max_timeshift=12,
                             n_jobs=3)


Rolling: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:49<00:00,  3.33s/it]


In [5]:
print("Number of rows: ", df_rolled.shape[0])
df_rolled.head()

Number of rows:  2654093


Unnamed: 0,time,hr,act,pid,sleep_phase,sort,id
0,29,73.0,0.0,1,0.0,0,"(1, 0)"
200,29,73.0,0.0,1,0.0,0,"(1, 1)"
201,59,75.0,0.0,1,0.0,1,"(1, 1)"
600,29,73.0,0.0,1,0.0,0,"(1, 2)"
601,59,75.0,0.0,1,0.0,1,"(1, 2)"


# First feature set -- Windows of raw signals

In [53]:
def make_cols(l, size=13):
    
    l = np.array(l)
    # print(len(l))
    
    # This will pad L with 0s at the end
    # l = np.pad(l, (0, size-len(l)),  "constant")
    
    # this will pad with 0s at the begining
    l = np.pad(l, (size-len(l), 0),  "constant")
    return l

def get_raw_win_features(df, winsize=13):
    
    df_acts = df[["pid", "id", "act"]].groupby('id')["act"].apply(lambda x: make_cols(x, size=winsize))
    df_hrs = df[["pid", "id", "hr"]].groupby('id')["hr"].apply(lambda x: make_cols(x, size=winsize))
    
    df_acts = df_acts.apply(pd.Series).rename(columns=dict([(i, "act_%d" % i) for i in range(winsize)]))
    df_hrs = df_hrs.apply(pd.Series).rename(columns=dict([(i, "hr_%d" % i) for i in range(winsize)]))
    
    df_new = pd.concat((df_acts, df_hrs), axis=1).reset_index()
    
    # Get the sorted unique ids for this dataframe
    df_uniqueids_sorted = df[["pid", "id", "time", "sort", "sleep_phase"]].groupby(["pid", "time"]).first().reset_index()
    
    return pd.merge(df_uniqueids_sorted, df_new, on="id")


In [54]:
df_features_raw_win = get_raw_win_features(df_rolled)
df_features_raw_win

Unnamed: 0,pid,time,id,sort,sleep_phase,act_0,act_1,act_2,act_3,act_4,...,hr_3,hr_4,hr_5,hr_6,hr_7,hr_8,hr_9,hr_10,hr_11,hr_12
0,1,29,"(1, 0)",0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,73.0
1,1,59,"(1, 1)",1,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,73.0,75.0
2,1,89,"(1, 2)",2,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,73.0,75.0,76.0
3,1,119,"(1, 3)",3,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,73.0,75.0,76.0,75.0
4,1,149,"(1, 4)",4,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,73.0,75.0,76.0,75.0,80.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
205356,1647,31199,"(1647, 1039)",205356,0.0,0.0,0.0,0.0,0.0,16.0,...,58.0,60.0,58.0,56.0,61.0,60.0,56.0,66.0,60.0,57.0
205357,1647,31229,"(1647, 1040)",205357,0.0,0.0,0.0,0.0,16.0,0.0,...,60.0,58.0,56.0,61.0,60.0,56.0,66.0,60.0,57.0,60.0
205358,1647,31259,"(1647, 1041)",205358,0.0,0.0,0.0,16.0,0.0,0.0,...,58.0,56.0,61.0,60.0,56.0,66.0,60.0,57.0,60.0,61.0
205359,1647,31289,"(1647, 1042)",205359,0.0,0.0,16.0,0.0,0.0,0.0,...,56.0,61.0,60.0,56.0,66.0,60.0,57.0,60.0,61.0,66.0


In [65]:
df_features_raw_win["act_12"].head(20)

0      0.0
1      0.0
2      0.0
3      0.0
4     85.0
5      0.0
6      0.0
7      0.0
8      0.0
9      0.0
10     0.0
11     0.0
12     0.0
13    66.0
14     0.0
15     0.0
16     2.0
17     0.0
18     5.0
19    35.0
Name: act_12, dtype: float64

In [66]:
df.head(20)

Unnamed: 0,time,act,sleep_phase,hr,pid
29,29,0.0,0.0,73.0,1
59,59,0.0,0.0,75.0,1
89,89,0.0,0.0,76.0,1
119,119,0.0,0.0,75.0,1
149,149,85.0,0.0,80.0,1
179,179,0.0,0.0,77.0,1
209,209,0.0,0.0,77.0,1
239,239,0.0,0.0,77.0,1
269,269,0.0,0.0,77.0,1
299,299,0.0,0.0,77.0,1


In [61]:
df_features_raw_win.to_csv("./datasets/df_raw_features.tar.gz", index=False)

In [4]:
# So we apply this transformation/feature extraction to the whole dataset:

# (small version for only 10 subjects)
# df_small = df[df["pid"].isin(df["pid"].unique()[:10])].dropna()
# df_rolled = roll_time_series(df_small[["time", "hr", "act", "pid", "sleep_phase"]].copy(), 
#                              column_id="pid", 
#                              min_timeshift=0,
#                              max_timeshift=12,
#                              n_jobs=3)


Rolling: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:04<00:00,  3.64it/s]


# Second feature set -- Features extracted from each window with TSFresh

In [4]:
# WARNING: This cell tool 110 minutes to run
settings = EfficientFCParameters()
del settings["friedrich_coefficients"]
del settings["max_langevin_fixed_point"]

df_extracted_features = tsfresh.extract_features(df_rolled[["id", "sort", "act", "hr"]], 
                                                 column_id="id",
                                                 column_sort="sort",
                                                 default_fc_parameters=settings,
                                                 n_jobs=3)

df_features = df_extracted_features.reset_index().rename(columns={"level_0":"pid", "level_1": "time"})
df_features.to_csv("datasets/df_tsfresh_features.tar.gz", index=False)

Feature Extraction: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [1:10:32<00:00, 282.18s/it]


In [11]:
# Merge all data

# Reset time col
dfnan = df.dropna().copy()
dfnan["time"] = 1
dfnan["time"] = dfnan.groupby("pid")["time"].cumsum()


df_merged = pd.merge(dfnan, df_features)
df_merged.to_csv("datasets/df_tsfresh_features.tar.gz", index=False)

# Pycaret

In [366]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
from sklearn.metrics import f1_score, matthews_corrcoef

In [363]:
model = LogisticRegression()
model.fit(X.values[:1000], Y["sleep"].values[:1000])

pred = model.predict(X.values[5000:10000])
f1_score(Y["sleep"].values[5000:10000], pred)

0.8585648148148148

---
# Open Parenthesis
- Is F1 a good metric to use here?
- Is F1 score a good metric in general?

See https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-019-6413-7

In [368]:
f1_score?

In [364]:
f1_score(~Y["sleep"].values[5000:10000], ~pred)

0.10147058823529412

In [367]:
matthews_corrcoef(~Y["sleep"].values[5000:10000], ~pred), matthews_corrcoef(Y["sleep"].values[5000:10000], pred)

(0.20050367614978312, 0.20050367614978312)

# Close parenthesis
---

In [351]:
ngrps = 5
pid_grp = {}

i = 0
for pid in X["pid"].unique():
    pid_grp[pid] = i
    i = (i+1) % ngrps

pid_grp

{1: 0,
 16: 1,
 21: 2,
 28: 3,
 33: 4,
 36: 0,
 46: 1,
 50: 2,
 52: 3,
 74: 4,
 107: 0,
 111: 1,
 120: 2,
 121: 3,
 125: 4,
 133: 0,
 138: 1,
 144: 2,
 152: 3,
 155: 4,
 159: 0,
 167: 1,
 171: 2,
 193: 3,
 197: 4,
 220: 0,
 251: 1,
 271: 2,
 275: 3,
 282: 4,
 286: 0,
 292: 1,
 295: 2,
 299: 3,
 301: 4,
 306: 0,
 318: 1,
 323: 2,
 332: 3,
 339: 4,
 374: 0,
 380: 1,
 382: 2,
 386: 3,
 392: 4,
 393: 0,
 402: 1,
 423: 2,
 427: 3,
 435: 4,
 443: 0,
 445: 1,
 459: 2,
 470: 3,
 474: 4,
 476: 0,
 495: 1,
 499: 2,
 501: 3,
 509: 4,
 518: 0,
 522: 1,
 526: 2,
 528: 3,
 529: 4,
 534: 0,
 545: 1,
 550: 2,
 554: 3,
 555: 4,
 558: 0,
 589: 1,
 604: 2,
 612: 3,
 626: 4,
 632: 0,
 640: 1,
 657: 2,
 664: 3,
 677: 4,
 686: 0,
 688: 1,
 694: 2,
 702: 3,
 711: 4,
 712: 0,
 715: 1,
 716: 2,
 727: 3,
 728: 4,
 762: 0,
 768: 1,
 782: 2,
 784: 3,
 791: 4,
 796: 0,
 801: 1,
 804: 2,
 807: 3,
 811: 4,
 812: 0,
 813: 1,
 852: 2,
 860: 3,
 864: 4,
 884: 0,
 889: 1,
 892: 2,
 893: 3,
 899: 4,
 908: 0,
 912: 1,
 91

In [356]:
X["grp"] = X["pid"].apply(lambda x: pid_grp[x])
X

Unnamed: 0,pid,time,mean,median,std,var,skew,kurt,max,min,count,sum,grp
0,1,29,14.166667,0.0,34.701105,1204.166667,2.449490,6.000000,85.0,0.0,6.0,85.0,0
1,1,59,12.142857,0.0,32.126980,1032.142857,2.645751,7.000000,85.0,0.0,7.0,85.0,0
2,1,89,10.625000,0.0,30.052038,903.125000,2.828427,8.000000,85.0,0.0,8.0,85.0,0
3,1,119,9.444444,0.0,28.333333,802.777778,3.000000,9.000000,85.0,0.0,9.0,85.0,0
4,1,149,8.500000,0.0,26.879360,722.500000,3.162278,10.000000,85.0,0.0,10.0,85.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
205356,1647,6231234,3.500000,0.0,6.819091,46.500000,2.262781,5.278992,21.0,0.0,10.0,35.0,4
205357,1647,6231264,3.888889,0.0,7.114149,50.611111,2.113952,4.568488,21.0,0.0,9.0,35.0,4
205358,1647,6231294,4.375000,0.0,7.443837,55.410714,1.952574,3.855072,21.0,0.0,8.0,35.0,4
205359,1647,6231324,5.000000,0.0,7.810250,61.000000,1.774885,3.142972,21.0,0.0,7.0,35.0,4


In [361]:
cross_val_score(LogisticRegression(), X.values, Y["sleep"].values, groups=X["grp"], scoring="f1")

array([0.84624756, 0.86708327, 0.86441916, 0.85149739, 0.84328028])

In [373]:
cross_val_score(LogisticRegression(), X.values, Y["sleep"].values, groups=X["grp"], scoring="matthews_corrcoef")

array([0.43722301, 0.46052623, 0.43611178, 0.35579605, 0.28850433])

# Ideas to improve??

In [375]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler


pipe = make_pipeline(StandardScaler(), LogisticRegression())
cross_val_score(pipe, X.values, Y["sleep"].values, groups=X["grp"], scoring="matthews_corrcoef")

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

array([0.4847928 , 0.53811454, 0.58032939, 0.53635802, 0.49203124])