# Modelling most important features

## Model Selection

In [1]:
from data import load_data

train_data, test_data = load_data()

unique_patients = test_data['p_num'].unique()
train_data = train_data[train_data['p_num'].isin(unique_patients)]
test_data = test_data[test_data['p_num'].isin(unique_patients)]

train_data.head()

Unnamed: 0_level_0,p_num,time,bg-5:55,bg-5:50,bg-5:45,bg-5:40,bg-5:35,bg-5:30,bg-5:25,bg-5:20,...,activity-0:40,activity-0:35,activity-0:30,activity-0:25,activity-0:20,activity-0:15,activity-0:10,activity-0:05,activity-0:00,bg+1:00
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
p01_0,p01,06:10:00,,,9.6,,,9.7,,,...,,,,,,,,,,13.4
p01_1,p01,06:25:00,,,9.7,,,9.2,,,...,,,,,,,,,,12.8
p01_2,p01,06:40:00,,,9.2,,,8.7,,,...,,,,,,,,,,15.5
p01_3,p01,06:55:00,,,8.7,,,8.4,,,...,,,,,,,,,,14.8
p01_4,p01,07:10:00,,,8.4,,,8.1,,,...,,,,,,,,,,12.7


## Loop through unique patients

In [2]:
from sklearn.model_selection import train_test_split
from notebooks.helpers.LazyPredict import get_lazy_regressor
from pipelines import pipeline

lazy_predict_results = {}

for patient in unique_patients:
    patient_train_data = train_data[train_data['p_num'] == patient]
    patient_train_data_transformed = pipeline.fit_transform(patient_train_data)

    X = patient_train_data_transformed.drop(columns=['bg+1:00'])
    y = patient_train_data_transformed['bg+1:00']

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, shuffle=True, stratify=None)

    reg = get_lazy_regressor()
    models = reg.fit(X_train, X_test, y_train, y_test)
    lazy_predict_results[patient] = models

 97%|█████████▋| 38/39 [01:21<00:01,  1.26s/it]

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002609 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 28241
[LightGBM] [Info] Number of data points in the train set: 13228, number of used features: 148
[LightGBM] [Info] Start training from score 8.842341


100%|██████████| 39/39 [01:21<00:00,  2.10s/it]
 97%|█████████▋| 38/39 [03:06<00:05,  5.00s/it]

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.006939 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 30335
[LightGBM] [Info] Number of data points in the train set: 27121, number of used features: 148
[LightGBM] [Info] Start training from score 9.414307


100%|██████████| 39/39 [03:07<00:00,  4.81s/it]
 97%|█████████▋| 38/39 [03:11<00:05,  5.29s/it]

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.005236 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 29553
[LightGBM] [Info] Number of data points in the train set: 27035, number of used features: 148
[LightGBM] [Info] Start training from score 7.804179


100%|██████████| 39/39 [03:12<00:00,  4.94s/it]
 97%|█████████▋| 38/39 [01:25<00:01,  1.53s/it]

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002457 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 27270
[LightGBM] [Info] Number of data points in the train set: 13974, number of used features: 148
[LightGBM] [Info] Start training from score 8.325391


100%|██████████| 39/39 [01:26<00:00,  2.22s/it]
 97%|█████████▋| 38/39 [01:15<00:01,  1.37s/it]

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002718 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 27979
[LightGBM] [Info] Number of data points in the train set: 12911, number of used features: 148
[LightGBM] [Info] Start training from score 9.148095


100%|██████████| 39/39 [01:16<00:00,  1.97s/it]
 97%|█████████▋| 38/39 [03:17<00:05,  5.26s/it]

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.003937 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 28751
[LightGBM] [Info] Number of data points in the train set: 25469, number of used features: 148
[LightGBM] [Info] Start training from score 6.545308


100%|██████████| 39/39 [03:18<00:00,  5.10s/it]
 97%|█████████▋| 38/39 [03:22<00:06,  6.84s/it]

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.005848 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 29872
[LightGBM] [Info] Number of data points in the train set: 25909, number of used features: 148
[LightGBM] [Info] Start training from score 9.230620


100%|██████████| 39/39 [03:24<00:00,  5.24s/it]
 97%|█████████▋| 38/39 [04:56<00:06,  6.03s/it]

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.004212 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 30184
[LightGBM] [Info] Number of data points in the train set: 28339, number of used features: 148
[LightGBM] [Info] Start training from score 8.070095


100%|██████████| 39/39 [04:57<00:00,  7.63s/it]
 95%|█████████▍| 37/39 [00:36<00:01,  1.42it/s]

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.004402 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 20496
[LightGBM] [Info] Number of data points in the train set: 8156, number of used features: 148
[LightGBM] [Info] Start training from score 8.158765


100%|██████████| 39/39 [00:37<00:00,  1.03it/s]
 97%|█████████▋| 38/39 [00:40<00:00,  2.00it/s]

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.003569 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 19622
[LightGBM] [Info] Number of data points in the train set: 7124, number of used features: 148
[LightGBM] [Info] Start training from score 8.328961


100%|██████████| 39/39 [00:41<00:00,  1.05s/it]
 97%|█████████▋| 38/39 [00:31<00:00,  2.32it/s]

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002733 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 22205
[LightGBM] [Info] Number of data points in the train set: 6540, number of used features: 148
[LightGBM] [Info] Start training from score 10.089537


100%|██████████| 39/39 [00:32<00:00,  1.20it/s]
 97%|█████████▋| 38/39 [00:37<00:00,  2.05it/s]

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002619 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 26926
[LightGBM] [Info] Number of data points in the train set: 6913, number of used features: 148
[LightGBM] [Info] Start training from score 8.449511


100%|██████████| 39/39 [00:37<00:00,  1.03it/s]
 97%|█████████▋| 38/39 [00:31<00:00,  2.47it/s]

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.003847 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 26565
[LightGBM] [Info] Number of data points in the train set: 6249, number of used features: 148
[LightGBM] [Info] Start training from score 10.985737


100%|██████████| 39/39 [00:31<00:00,  1.23it/s]
 97%|█████████▋| 38/39 [00:26<00:00,  2.89it/s]

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.004573 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 23841
[LightGBM] [Info] Number of data points in the train set: 5587, number of used features: 148
[LightGBM] [Info] Start training from score 8.017398


100%|██████████| 39/39 [00:27<00:00,  1.41it/s]
 97%|█████████▋| 38/39 [00:43<00:00,  1.90it/s]

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.003530 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 27461
[LightGBM] [Info] Number of data points in the train set: 7459, number of used features: 148
[LightGBM] [Info] Start training from score 7.935782


100%|██████████| 39/39 [00:43<00:00,  1.13s/it]


In [1]:
for p_num in lazy_predict_results.keys():
    print(f'Patient: {p_num}')
    models = lazy_predict_results[p_num]
    display(models[0].head(15))


NameError: name 'lazy_predict_results' is not defined