# Predict Incident Volume

The support analyst received an IT incident report and they have to assign resolution priority for this incident but sometimes it is difficult to make this assignment and often it is very subjective. 

Resolution priority of IT Incident will drive further reaction and resolution time SLA from operational teams who will be involved as they have to prioritize and queue many incidents for resolution order. 

For reference we have 5 priorities for Incidents: 

- Global Crisis resolution time 4 Hours 

- Major Incident resolution time 24 hours 

- Minor application or critical individual user problem — resolution time 72 hours 

- Non critical user problem — resolution time 5 business days 

- User request/how to — resolution time 10 business days 

## Imports

In [1]:
import os
os.system("pip install katonic[ml]")

In [2]:
import warnings
warnings.filterwarnings('ignore')

In [3]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from katonic.log.client import load_model
from katonic.ml.regression import Regressor

## Data Prepration

In [4]:
df = pd.read_csv("datasets/incident.csv")

In [5]:
print('input data size: ', df.shape[0])

input data size:  46606


In [6]:
df.head()

Unnamed: 0.1,Unnamed: 0,CI Name (aff),CI Type (aff),CI Subtype (aff),Service Component WBS (aff),Incident ID,Status,Impact,Urgency,Priority,...,Closure Code,# Related Interactions,Related Interaction,# Related Incidents,# Related Changes,Related Change,CI Name (CBy),CI Type (CBy),CI Subtype (CBy),ServiceComp WBS (CBy)
0,0,SUB000508,subapplication,Web Based Application,WBS000162,IM0000004,Closed,4.0,4,4.0,...,Other,1.0,SD0000007,2.0,,,SUB000508,subapplication,Web Based Application,WBS000162
1,1,WBA000124,application,Web Based Application,WBS000088,IM0000005,Closed,3.0,3,3.0,...,Software,1.0,SD0000011,1.0,,,WBA000124,application,Web Based Application,WBS000088
2,2,DTA000024,application,Desktop Application,WBS000092,IM0000006,Closed,3.0,3,3.0,...,No error - works as designed,1.0,SD0000017,,,,DTA000024,application,Desktop Application,WBS000092
3,3,WBA000124,application,Web Based Application,WBS000088,IM0000011,Closed,4.0,4,4.0,...,Operator error,1.0,SD0000025,,,,WBA000124,application,Web Based Application,WBS000088
4,4,WBA000124,application,Web Based Application,WBS000088,IM0000012,Closed,4.0,4,4.0,...,Other,1.0,SD0000029,,,,SUB000508,subapplication,Web Based Application,WBS000162


In [7]:
df.describe()

Unnamed: 0.1,Unnamed: 0,Impact,Priority,# Reassignments,# Related Interactions,# Related Incidents,# Related Changes
count,46606.0,46606.0,46606.0,46605.0,46492.0,1222.0,560.0
mean,23302.5,4.187894,4.179805,1.131831,1.149897,1.669394,1.058929
std,13454.13766,0.724776,0.725007,2.269774,2.556338,3.339687,0.403596
min,0.0,1.0,1.0,0.0,1.0,1.0,1.0
25%,11651.25,4.0,4.0,0.0,1.0,1.0,1.0
50%,23302.5,4.0,4.0,0.0,1.0,1.0,1.0
75%,34953.75,5.0,5.0,2.0,1.0,1.0,1.0
max,46605.0,5.0,5.0,46.0,370.0,63.0,9.0


#### Rename Columns

In [8]:
df.columns= df.columns.str.lower()

In [9]:
df.columns = df.columns.str.replace('[#,(,)]', '')
df.columns = df.columns.str.replace(' ', '_')
df.columns

Index(['unnamed:_0', 'ci_name_aff', 'ci_type_aff', 'ci_subtype_aff',
       'service_component_wbs_aff', 'incident_id', 'status', 'impact',
       'urgency', 'priority', 'category', 'km_number', 'alert_status',
       '_reassignments', 'open_time', 'reopen_time', 'resolved_time',
       'close_time', 'handle_time_hours', 'closure_code',
       '_related_interactions', 'related_interaction', '_related_incidents',
       '_related_changes', 'related_change', 'ci_name_cby', 'ci_type_cby',
       'ci_subtype_cby', 'servicecomp_wbs_cby'],
      dtype='object')

#### Time period of the dataset

In [10]:
df['open_time'] = pd.to_datetime(df['open_time'])

In [11]:
print(f"Min date from incident_forecast_data set: {df['open_time'].min().date()}")
print(f"Max date from incident_forecast_data set: {df['open_time'].max().date()}")

Min date from incident_forecast_data set: 2012-01-10
Max date from incident_forecast_data set: 2014-12-03


In [12]:
df['ci_subtype_aff'].unique()

array(['Web Based Application', 'Desktop Application',
       'Server Based Application', 'SAP', 'Client Based Application',
       'Citrix', 'Standard Application', 'Windows Server', 'Laptop',
       'Linux Server', 'no subtype', 'Monitor', 'Automation Software',
       'SAN', 'Banking Device', 'Desktop', 'Database', 'Oracle Server',
       'Keyboard', 'Printer', 'Exchange', 'System Software', 'VDI',
       'Encryption', 'Omgeving', 'MigratieDummy', 'Scanner', 'Controller',
       'DataCenterEquipment', 'KVM Switches', 'Switch',
       'Database Software', 'Network Component', 'Unix Server', 'Lines',
       'ESX Cluster', 'zOS Server', 'SharePoint Farm', 'NonStop Server',
       'Application Server', 'Security Software', 'Thin Client',
       'zOS Cluster', 'Router', 'VMWare', 'Net Device', 'Neoview Server',
       'MQ Queue Manager', 'UPS', 'Number', 'Iptelephony',
       'Windows Server in extern beheer', 'Modem', 'X86 Server',
       'ESX Server', 'Virtual Tape Server', 'IPtelephon

In [13]:
df['week_number'] = df['open_time'].dt.week

In [14]:
df.columns

Index(['unnamed:_0', 'ci_name_aff', 'ci_type_aff', 'ci_subtype_aff',
       'service_component_wbs_aff', 'incident_id', 'status', 'impact',
       'urgency', 'priority', 'category', 'km_number', 'alert_status',
       '_reassignments', 'open_time', 'reopen_time', 'resolved_time',
       'close_time', 'handle_time_hours', 'closure_code',
       '_related_interactions', 'related_interaction', '_related_incidents',
       '_related_changes', 'related_change', 'ci_name_cby', 'ci_type_cby',
       'ci_subtype_cby', 'servicecomp_wbs_cby', 'week_number'],
      dtype='object')

In [15]:
df_new = pd.DataFrame(df.groupby(['ci_subtype_aff', 'priority', 'week_number'])['incident_id'].count()).reset_index()

In [16]:
df_new

Unnamed: 0,ci_subtype_aff,priority,week_number,incident_id
0,Application Server,4.0,43,1
1,Automation Software,2.0,13,1
2,Automation Software,3.0,8,1
3,Automation Software,3.0,11,1
4,Automation Software,3.0,12,1
...,...,...,...,...
2246,zOS Cluster,5.0,50,1
2247,zOS Server,3.0,3,1
2248,zOS Server,3.0,42,3
2249,zOS Systeem,3.0,5,1


## Encoding

In [17]:
df_new['ci_subtype_aff'] = LabelEncoder().fit_transform(df_new['ci_subtype_aff'])

In [18]:
df_new['priority'] = df['priority'].astype('int')

In [19]:
df_new

Unnamed: 0,ci_subtype_aff,priority,week_number,incident_id
0,0,4,43,1
1,1,3,13,1
2,1,3,8,1
3,1,4,11,1
4,1,4,12,1
...,...,...,...,...
2246,62,5,50,1
2247,63,3,3,1
2248,63,3,42,3
2249,64,4,5,1


In [20]:
df = df_new

In [21]:
df.head()

Unnamed: 0,ci_subtype_aff,priority,week_number,incident_id
0,0,4,43,1
1,1,3,13,1
2,1,3,8,1
3,1,4,11,1
4,1,4,12,1


## Train/validation split
- As we know the test set in on the future, so we should try to simulate the same distribution on our train/validation split.
- Our train set will be the first 3-28 blocks, validation will be last 5 blocks (29-32) and test will be block 33.
- I'm leaving the first 3 months out because we use a 3 month window to generate features, so these first 3 month won't have really windowed useful features.

In [22]:
train_set = df.query('week_number >= 1 and week_number < 47').copy()
validation_set = df.query('week_number >= 47 and week_number < 52').copy()
test_set = df.query('week_number == 52').copy()

print('Train set records:', train_set.shape[0])
print('Validation set records:', validation_set.shape[0])
print('Test set records:', test_set.shape[0])
print('-------'*15)
print('Train set records: %s (%.f%% of complete data)' % (train_set.shape[0], ((train_set.shape[0]/df.shape[0])*100)))
print('Validation set records: %s (%.f%% of complete data)' % (validation_set.shape[0], ((validation_set.shape[0]/df.shape[0])*100)))

Train set records: 1907
Validation set records: 300
Test set records: 44
---------------------------------------------------------------------------------------------------------
Train set records: 1907 (85% of complete data)
Validation set records: 300 (13% of complete data)


In [23]:
# Create train and validation sets and labels. 
X_train = train_set.drop(['incident_id'], axis=1)
Y_train = train_set['incident_id'].astype(int)

X_validation = validation_set.drop(['incident_id'], axis=1)
Y_validation = validation_set['incident_id'].astype(int)

X_test = test_set.drop(['incident_id'], axis=1)

## Modelling

### Regression Model

In [24]:
exp_name = "incident-forecasting"

In [25]:
features = list(X_train.columns)
reg = Regressor(X_train,X_validation,Y_train,Y_validation, exp_name, source_name='incident-forecasting.ipynb', features=features)

2023/10/24 03:09:08 INFO mlflow.tracking.fluent: Experiment with name 'incident-forecasting' does not exist. Creating a new experiment.


#### Get registered experiment details

In [26]:
exp_id = reg.id
print("experiment name : ", reg.name)
print("experiment location : ", reg.location)
print("experiment id : ", reg.id)
print("experiment status : ", reg.stage)

experiment name :  incident-forecasting
experiment location :  s3://models/15
experiment id :  15
experiment status :  active


### RandomForest

- Parameters on the model that needs to be tuned.

In [27]:
params = {
'n_estimators': {
    'low': 80,
    'high': 120,
    'step': 10,
    'type': 'int'
    },
'criterion':{
    'values': ['mse', 'mae'],
    'type': 'categorical'
    },
'min_samples_split': {
    'low': 2,
    'high': 5,
    'type': 'int'
    },
'min_samples_leaf':{
    'low': 1,
    'high': 5,
    'type': 'int'
    }
}

In [28]:
reg.RandomForestRegressor(is_tune=True, params=params)

[I 2023-10-24 03:09:21,337] A new study created in memory with name: no-name-444253df-45d6-45af-890f-58df8efdb0f3
[I 2023-10-24 03:09:23,797] Trial 0 finished with value: 0.4027104623813258 and parameters: {'n_estimators': 100, 'criterion': 'mse', 'min_samples_split': 4, 'min_samples_leaf': 4}. Best is trial 0 with value: 0.4027104623813258.
[I 2023-10-24 03:09:27,614] Trial 1 finished with value: 0.387319817736254 and parameters: {'n_estimators': 110, 'criterion': 'mae', 'min_samples_split': 4, 'min_samples_leaf': 1}. Best is trial 0 with value: 0.4027104623813258.
[I 2023-10-24 03:09:31,176] Trial 2 finished with value: 0.4183035664360347 and parameters: {'n_estimators': 100, 'criterion': 'mae', 'min_samples_split': 3, 'min_samples_leaf': 2}. Best is trial 2 with value: 0.4183035664360347.
[I 2023-10-24 03:09:33,224] Trial 3 finished with value: 0.3985594760460286 and parameters: {'n_estimators': 80, 'criterion': 'mse', 'min_samples_split': 4, 'min_samples_leaf': 4}. Best is trial 2 

Number of finished trials:  5
Best trial:
  R2:  0.4183035664360347
  Params: 
    n_estimators: 100
    criterion: mae
    min_samples_split: 3
    min_samples_leaf: 2


### Ridge Regression

- Parameters on the model that needs to be tuned.

In [29]:
params={
'alpha':{
    'low':0.6,
    'high':1.2,
    'type':'float'
    }
}

In [30]:
reg.RidgeRegression(is_tune=True, params=params)

[I 2023-10-24 03:09:52,451] A new study created in memory with name: no-name-a887c80d-0350-45d7-b479-72d5ccacd144
[I 2023-10-24 03:09:54,171] Trial 0 finished with value: 0.027944377066504877 and parameters: {'alpha': 0.6}. Best is trial 0 with value: 0.027944377066504877.
[I 2023-10-24 03:09:55,872] Trial 1 finished with value: 0.027950160365008192 and parameters: {'alpha': 1.2}. Best is trial 1 with value: 0.027950160365008192.
[I 2023-10-24 03:09:57,666] Trial 2 finished with value: 0.02794919704513732 and parameters: {'alpha': 1.1}. Best is trial 1 with value: 0.027950160365008192.
[I 2023-10-24 03:09:59,424] Trial 3 finished with value: 0.027944377066504877 and parameters: {'alpha': 0.6}. Best is trial 1 with value: 0.027950160365008192.
[I 2023-10-24 03:10:01,165] Trial 4 finished with value: 0.027948233500085018 and parameters: {'alpha': 1.0}. Best is trial 1 with value: 0.027950160365008192.


Number of finished trials:  5
Best trial:
  R2:  0.027950160365008192
  Params: 
    alpha: 1.2


### Lasso Regression

- Parameters on the model that needs to be tuned.

In [31]:
params={
'alpha':{
    'low':0.6,
    'high':1.2,
    'type':'float'
    },
'tol':{
    'low':1e-5,
    'high':1e-4,
    'type':'uniform'
    }
}

In [32]:
reg.LassoRegression(is_tune=True, params=params)

[I 2023-10-24 03:11:04,602] A new study created in memory with name: no-name-93a040f3-6f54-42ce-be9f-67a31eacb767
[I 2023-10-24 03:11:06,340] Trial 0 finished with value: 0.03328183570288845 and parameters: {'alpha': 1.1, 'tol': 7.877230120114978e-05}. Best is trial 0 with value: 0.03328183570288845.
[I 2023-10-24 03:11:08,071] Trial 1 finished with value: 0.030977042925843024 and parameters: {'alpha': 0.6, 'tol': 8.90214578732174e-05}. Best is trial 0 with value: 0.03328183570288845.
[I 2023-10-24 03:11:09,826] Trial 2 finished with value: 0.03328184439717674 and parameters: {'alpha': 1.1, 'tol': 3.210307334493947e-05}. Best is trial 2 with value: 0.03328184439717674.
[I 2023-10-24 03:11:11,552] Trial 3 finished with value: 0.03371800345859943 and parameters: {'alpha': 1.2, 'tol': 8.343067917188055e-05}. Best is trial 3 with value: 0.03371800345859943.
[I 2023-10-24 03:11:13,311] Trial 4 finished with value: 0.03283740434727844 and parameters: {'alpha': 1.0, 'tol': 4.948271713623521e-

Number of finished trials:  5
Best trial:
  R2:  0.03371800345859943
  Params: 
    alpha: 1.2
    tol: 8.343067917188055e-05


### ElasticNet

- Parameters on the model that needs to be tuned.

In [33]:
params={
'alpha':{
    'low':0.6,
    'high':1.2,
    'type':'float'
    },
'tol':{
    'low':1e-5,
    'high':1e-4,
    'type':'uniform'
    },
'l1_ratio':{
    'low':0.3,
    'high':0.6,
    'type':'float'
    }
}

In [34]:
reg.ElasticNet(is_tune=True, params=params)

[I 2023-10-24 03:11:15,434] A new study created in memory with name: no-name-da0738a0-ed39-4615-8d3f-0b3d80ff7995
[I 2023-10-24 03:11:17,273] Trial 0 finished with value: 0.03242022346234097 and parameters: {'alpha': 0.7, 'tol': 1.2751838714603785e-05, 'l1_ratio': 0.6}. Best is trial 0 with value: 0.03242022346234097.
[I 2023-10-24 03:11:18,983] Trial 1 finished with value: 0.03298183442405378 and parameters: {'alpha': 0.8, 'tol': 7.403549714227818e-05, 'l1_ratio': 0.4}. Best is trial 1 with value: 0.03298183442405378.
[I 2023-10-24 03:11:20,744] Trial 2 finished with value: 0.032420220994643145 and parameters: {'alpha': 0.7, 'tol': 4.262539722234466e-05, 'l1_ratio': 0.6}. Best is trial 1 with value: 0.03298183442405378.
[I 2023-10-24 03:11:22,494] Trial 3 finished with value: 0.0326584479577241 and parameters: {'alpha': 0.7, 'tol': 9.740460752680227e-05, 'l1_ratio': 0.4}. Best is trial 1 with value: 0.03298183442405378.
[I 2023-10-24 03:11:24,245] Trial 4 finished with value: 0.033947

Number of finished trials:  5
Best trial:
  R2:  0.0339472218349014
  Params: 
    alpha: 1.2
    tol: 4.282230857026484e-05
    l1_ratio: 0.4


### Support Vector Regression

- Parameters on the model that needs to be tuned.

In [35]:
params={
    'C':{
        'low': 0.5,
        'high':1.0,
        'type': 'float'
    },
    'kernel':{
        'values': ['linear', 'rbf', 'poly'],
        'type':'categorical'
    },
    'degree':{
        'low':2,
        'high': 4,
        'type': 'int'
    }
}

In [36]:
reg.SupportVectorRegressor(is_tune=True, params=params)

[I 2023-10-24 03:11:26,123] A new study created in memory with name: no-name-acf43d68-ea9b-4a26-848c-3ca8f9cd3d0e
[I 2023-10-24 03:11:28,067] Trial 0 finished with value: -0.10513598418936732 and parameters: {'C': 1.0, 'kernel': 'poly', 'degree': 4}. Best is trial 0 with value: -0.10513598418936732.
[I 2023-10-24 03:11:30,361] Trial 1 finished with value: -0.11052091748328752 and parameters: {'C': 0.6, 'kernel': 'linear', 'degree': 2}. Best is trial 0 with value: -0.10513598418936732.
[I 2023-10-24 03:11:32,342] Trial 2 finished with value: -0.10511406299293191 and parameters: {'C': 0.9, 'kernel': 'poly', 'degree': 4}. Best is trial 2 with value: -0.10511406299293191.
[I 2023-10-24 03:11:34,212] Trial 3 finished with value: -0.10240814580584212 and parameters: {'C': 1.0, 'kernel': 'rbf', 'degree': 4}. Best is trial 3 with value: -0.10240814580584212.
[I 2023-10-24 03:11:36,566] Trial 4 finished with value: -0.1103753120070039 and parameters: {'C': 0.9, 'kernel': 'linear', 'degree': 3}.

Number of finished trials:  5
Best trial:
  R2:  -0.10240814580584212
  Params: 
    C: 1.0
    kernel: rbf
    degree: 4


### KNN Regression

- Parameters on the model that needs to be tuned.

In [37]:
params={
    'n_neighbors':{
        'low':5,
        'high':10,
        'type':'int'
    },
    'algorithm':{
        'values':['ball_tree', 'kd_tree', 'brute'],
        'type':'categorical'
    },
    'leaf_size':{
        'low': 20,
        'high': 30,
        'type': 'int'
    }
}

In [38]:
reg.KNNRegressor(is_tune=True, params=params)

[I 2023-10-24 03:11:38,509] A new study created in memory with name: no-name-dadcb2d0-50bf-4d1d-8ab9-e5a91c186329
[I 2023-10-24 03:11:40,275] Trial 0 finished with value: 0.3213201752249384 and parameters: {'n_neighbors': 6, 'algorithm': 'kd_tree', 'leaf_size': 25}. Best is trial 0 with value: 0.3213201752249384.
[I 2023-10-24 03:11:42,093] Trial 1 finished with value: 0.259802649962134 and parameters: {'n_neighbors': 5, 'algorithm': 'kd_tree', 'leaf_size': 22}. Best is trial 0 with value: 0.3213201752249384.
[I 2023-10-24 03:11:43,868] Trial 2 finished with value: 0.24611135621953417 and parameters: {'n_neighbors': 8, 'algorithm': 'ball_tree', 'leaf_size': 24}. Best is trial 0 with value: 0.3213201752249384.
[I 2023-10-24 03:11:45,639] Trial 3 finished with value: 0.23481808751219768 and parameters: {'n_neighbors': 10, 'algorithm': 'brute', 'leaf_size': 22}. Best is trial 0 with value: 0.3213201752249384.
[I 2023-10-24 03:11:47,370] Trial 4 finished with value: 0.2432788528872779 and 

Number of finished trials:  5
Best trial:
  R2:  0.3213201752249384
  Params: 
    n_neighbors: 6
    algorithm: kd_tree
    leaf_size: 25


### XGB Regressor

- Parameters on the model that needs to be tuned.

In [39]:
params={
    'n_estimators':{
        'low': 10,
        'high': 40,
        'step':10,
        'type': 'int'
    },
    'max_depth':{
        'low':1,
        'high':5,
        'type':'int'
    },
    'learning_rate':{
        'low':0.2,
        'high':0.5,
        'type':'float'
    },
    'objective':{
        'values': ['reg:squarederror'],
        'type':'categorical'
    }
}

In [40]:
reg.XGBRegressor(is_tune=True, params=params)

[I 2023-10-24 03:11:49,189] A new study created in memory with name: no-name-1e056f97-c2a1-4de8-8b9c-ff043a780d46
[I 2023-10-24 03:11:51,321] Trial 0 finished with value: 0.44070202755604126 and parameters: {'n_estimators': 40, 'max_depth': 3, 'learning_rate': 0.5, 'objective': 'reg:squarederror'}. Best is trial 0 with value: 0.44070202755604126.
[I 2023-10-24 03:11:53,101] Trial 1 finished with value: 0.18387025964090264 and parameters: {'n_estimators': 30, 'max_depth': 1, 'learning_rate': 0.2, 'objective': 'reg:squarederror'}. Best is trial 0 with value: 0.44070202755604126.
[I 2023-10-24 03:11:54,872] Trial 2 finished with value: 0.21745928953157756 and parameters: {'n_estimators': 30, 'max_depth': 1, 'learning_rate': 0.30000000000000004, 'objective': 'reg:squarederror'}. Best is trial 0 with value: 0.44070202755604126.
[I 2023-10-24 03:11:56,660] Trial 3 finished with value: 0.3927476460571484 and parameters: {'n_estimators': 40, 'max_depth': 2, 'learning_rate': 0.4, 'objective': '

Number of finished trials:  5
Best trial:
  R2:  0.44070202755604126
  Params: 
    n_estimators: 40
    max_depth: 3
    learning_rate: 0.5
    objective: reg:squarederror


### CatBoost Regressor

- Parameters on the model that needs to be tuned.

In [41]:
params={
    'iterations':{
        'low':20,
        'high': 30,
        'type': 'int'
    },
    'learning_rate':{
        'low': 0.3,
        'high': 0.6,
        'type': 'float'
    },
    'depth':{
        'low': 5,
        'high': 10,
        'type': 'int'
    }
}

In [42]:
reg.CatBoostRegressor(is_tune=True, params=params)

Learning rate set to 0.045339
0:	learn: 55.2831784	total: 46.4ms	remaining: 46.4s
1:	learn: 54.5806531	total: 47ms	remaining: 23.4s
2:	learn: 53.9097977	total: 47.5ms	remaining: 15.8s
3:	learn: 53.4174069	total: 48ms	remaining: 12s
4:	learn: 52.8370323	total: 48.5ms	remaining: 9.65s
5:	learn: 52.2836875	total: 48.9ms	remaining: 8.11s
6:	learn: 51.7517156	total: 49.4ms	remaining: 7.01s
7:	learn: 51.2147231	total: 49.9ms	remaining: 6.19s
8:	learn: 50.8052855	total: 50.5ms	remaining: 5.56s
9:	learn: 50.4886819	total: 51ms	remaining: 5.05s
10:	learn: 50.1007761	total: 51.5ms	remaining: 4.63s
11:	learn: 49.6344441	total: 52ms	remaining: 4.28s
12:	learn: 49.2989897	total: 52.5ms	remaining: 3.99s
13:	learn: 49.0919230	total: 52.9ms	remaining: 3.73s
14:	learn: 48.7738804	total: 53.4ms	remaining: 3.51s
15:	learn: 48.5451412	total: 53.9ms	remaining: 3.32s
16:	learn: 48.2717077	total: 54.4ms	remaining: 3.15s
17:	learn: 48.0946269	total: 54.9ms	remaining: 3s
18:	learn: 47.8006022	total: 55.4ms	rem

[I 2023-10-24 03:12:00,825] A new study created in memory with name: no-name-17d1ad5d-82a8-4ffa-8140-1a183d79d0cc


0:	learn: 46.6618682	total: 1.48ms	remaining: 31.1ms
1:	learn: 44.6532274	total: 3.36ms	remaining: 33.6ms
2:	learn: 43.4375820	total: 5.26ms	remaining: 33.3ms
3:	learn: 42.2994651	total: 6.83ms	remaining: 30.8ms
4:	learn: 41.5359766	total: 8.23ms	remaining: 28ms
5:	learn: 40.5695932	total: 9.84ms	remaining: 26.2ms
6:	learn: 40.1110236	total: 11.2ms	remaining: 23.9ms
7:	learn: 39.8443671	total: 12.7ms	remaining: 22.2ms
8:	learn: 39.7678975	total: 13.9ms	remaining: 20.1ms
9:	learn: 39.5693823	total: 14.6ms	remaining: 17.6ms
10:	learn: 39.2568103	total: 15.9ms	remaining: 15.9ms
11:	learn: 39.1448218	total: 17.7ms	remaining: 14.8ms
12:	learn: 38.3245305	total: 19.1ms	remaining: 13.2ms
13:	learn: 38.2264334	total: 20.5ms	remaining: 11.7ms
14:	learn: 38.2244684	total: 20.8ms	remaining: 9.73ms
15:	learn: 37.8933321	total: 22.1ms	remaining: 8.3ms
16:	learn: 37.1917541	total: 23.4ms	remaining: 6.88ms
17:	learn: 36.8901720	total: 24.8ms	remaining: 5.5ms
18:	learn: 36.6627736	total: 26.2ms	remain

[I 2023-10-24 03:12:02,364] Trial 0 finished with value: 0.3389440354954768 and parameters: {'iterations': 22, 'learning_rate': 0.6, 'depth': 10}. Best is trial 0 with value: 0.3389440354954768.


0:	learn: 49.2932208	total: 1.13ms	remaining: 27.2ms
1:	learn: 47.0143411	total: 1.78ms	remaining: 20.5ms
2:	learn: 45.9929459	total: 2.32ms	remaining: 17ms
3:	learn: 44.8643525	total: 2.83ms	remaining: 14.9ms
4:	learn: 44.6675214	total: 3.37ms	remaining: 13.5ms
5:	learn: 44.4141295	total: 3.88ms	remaining: 12.3ms
6:	learn: 44.2472317	total: 4.37ms	remaining: 11.2ms
7:	learn: 43.9197520	total: 4.84ms	remaining: 10.3ms
8:	learn: 43.6460970	total: 5.27ms	remaining: 9.36ms
9:	learn: 43.1455290	total: 5.78ms	remaining: 8.67ms
10:	learn: 42.6628677	total: 6.31ms	remaining: 8.03ms
11:	learn: 42.5431109	total: 6.7ms	remaining: 7.26ms
12:	learn: 42.2633188	total: 7.24ms	remaining: 6.68ms
13:	learn: 42.0262975	total: 7.77ms	remaining: 6.11ms
14:	learn: 41.9562296	total: 8.29ms	remaining: 5.53ms
15:	learn: 41.8916830	total: 8.8ms	remaining: 4.95ms
16:	learn: 41.8186156	total: 9.35ms	remaining: 4.4ms
17:	learn: 41.7314629	total: 9.89ms	remaining: 3.84ms
18:	learn: 41.4847483	total: 10.4ms	remaini

[I 2023-10-24 03:12:03,843] Trial 1 finished with value: 0.3958345819970469 and parameters: {'iterations': 25, 'learning_rate': 0.4, 'depth': 6}. Best is trial 1 with value: 0.3958345819970469.


0:	learn: 49.2296183	total: 1.29ms	remaining: 37.4ms
1:	learn: 46.3714886	total: 3.09ms	remaining: 43.3ms
2:	learn: 44.2513484	total: 4.98ms	remaining: 44.8ms
3:	learn: 43.4701996	total: 6.39ms	remaining: 41.5ms
4:	learn: 43.0238615	total: 7.85ms	remaining: 39.3ms
5:	learn: 41.8233006	total: 9.51ms	remaining: 38ms
6:	learn: 41.6724337	total: 10ms	remaining: 33ms
7:	learn: 41.5311613	total: 11.3ms	remaining: 31.2ms
8:	learn: 41.1454566	total: 11.9ms	remaining: 27.7ms
9:	learn: 41.1322486	total: 12.2ms	remaining: 24.3ms
10:	learn: 40.6168146	total: 13.5ms	remaining: 23.4ms
11:	learn: 40.5198435	total: 14.1ms	remaining: 21.2ms
12:	learn: 40.2397391	total: 15.5ms	remaining: 20.3ms
13:	learn: 39.7005435	total: 16.8ms	remaining: 19.2ms
14:	learn: 39.2034642	total: 18.2ms	remaining: 18.2ms
15:	learn: 39.0846792	total: 19.6ms	remaining: 17.1ms
16:	learn: 38.8991816	total: 20.8ms	remaining: 15.9ms
17:	learn: 38.8213231	total: 22.3ms	remaining: 14.9ms
18:	learn: 38.3510302	total: 23.7ms	remainin

[I 2023-10-24 03:12:05,347] Trial 2 finished with value: 0.34677943391584454 and parameters: {'iterations': 30, 'learning_rate': 0.4, 'depth': 10}. Best is trial 1 with value: 0.3958345819970469.


0:	learn: 50.0785021	total: 524us	remaining: 11.5ms
1:	learn: 47.3313166	total: 1.13ms	remaining: 11.8ms
2:	learn: 46.6840385	total: 1.65ms	remaining: 11ms
3:	learn: 45.2096382	total: 2.12ms	remaining: 10.1ms
4:	learn: 45.0396180	total: 2.58ms	remaining: 9.29ms
5:	learn: 44.5672662	total: 2.99ms	remaining: 8.47ms
6:	learn: 44.3030034	total: 3.4ms	remaining: 7.78ms
7:	learn: 44.1435968	total: 3.82ms	remaining: 7.16ms
8:	learn: 43.4866831	total: 4.26ms	remaining: 6.63ms
9:	learn: 43.2572788	total: 4.68ms	remaining: 6.08ms
10:	learn: 43.2006447	total: 5.09ms	remaining: 5.55ms
11:	learn: 42.9183181	total: 5.49ms	remaining: 5.03ms
12:	learn: 42.3877632	total: 5.96ms	remaining: 4.58ms
13:	learn: 42.3158966	total: 6.33ms	remaining: 4.07ms
14:	learn: 42.1907345	total: 6.76ms	remaining: 3.61ms
15:	learn: 41.8938598	total: 7.19ms	remaining: 3.15ms
16:	learn: 41.7092144	total: 7.58ms	remaining: 2.68ms
17:	learn: 41.5876570	total: 7.96ms	remaining: 2.21ms
18:	learn: 41.3663735	total: 8.43ms	remain

[I 2023-10-24 03:12:06,822] Trial 3 finished with value: 0.4044833038888116 and parameters: {'iterations': 23, 'learning_rate': 0.5, 'depth': 5}. Best is trial 3 with value: 0.4044833038888116.


0:	learn: 47.9682797	total: 1.2ms	remaining: 26.4ms
1:	learn: 45.8444126	total: 1.89ms	remaining: 19.8ms
2:	learn: 44.9494923	total: 2.47ms	remaining: 16.4ms
3:	learn: 44.1261512	total: 2.99ms	remaining: 14.2ms
4:	learn: 43.9748378	total: 3.5ms	remaining: 12.6ms
5:	learn: 43.9202880	total: 3.97ms	remaining: 11.3ms
6:	learn: 43.7287283	total: 4.47ms	remaining: 10.2ms
7:	learn: 43.5034126	total: 5.01ms	remaining: 9.39ms
8:	learn: 43.1922860	total: 5.54ms	remaining: 8.63ms
9:	learn: 42.9346837	total: 6.07ms	remaining: 7.88ms
10:	learn: 42.6795325	total: 6.62ms	remaining: 7.22ms
11:	learn: 42.5736698	total: 7.1ms	remaining: 6.51ms
12:	learn: 42.3641596	total: 7.63ms	remaining: 5.87ms
13:	learn: 42.2162551	total: 8.15ms	remaining: 5.24ms
14:	learn: 41.7177355	total: 8.71ms	remaining: 4.65ms
15:	learn: 41.6484175	total: 9.15ms	remaining: 4ms
16:	learn: 41.6085945	total: 9.67ms	remaining: 3.41ms
17:	learn: 41.1733208	total: 10.2ms	remaining: 2.84ms
18:	learn: 40.8671012	total: 10.7ms	remainin

[I 2023-10-24 03:12:08,309] Trial 4 finished with value: 0.4041548784581418 and parameters: {'iterations': 23, 'learning_rate': 0.5, 'depth': 6}. Best is trial 3 with value: 0.4044833038888116.


Number of finished trials:  5
Best trial:
  R2:  0.4044833038888116
  Params: 
    iterations: 23
    learning_rate: 0.5
    depth: 5


### LGBM Regressor

- Parameters on the model that needs to be tuned.

In [43]:
params={
    'num_leaves':{
        'low':25,
        'high':35,
        'type':'int'
    },
    'learning_rate':{
        'low':0.1,
        'high':0.5,
        'type':'float'
    },
    'n_estimators':{
        'low':80,
        'high':120,
        'step':10,
        'type':'int'
    },
    'min_child_samples':{
        'low': 10,
        'high':20,
        'type': 'int'
    }
}

In [44]:
reg.LGBMRegressor(is_tune=True, params=params)

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000061 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 100
[LightGBM] [Info] Number of data points in the train set: 1907, number of used features: 3
[LightGBM] [Info] Start training from score 20.054536


[I 2023-10-24 03:12:10,376] A new study created in memory with name: no-name-77a65ee6-50b8-4df3-8315-db1979a85e5b


[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000053 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 100
[LightGBM] [Info] Number of data points in the train set: 1907, number of used features: 3
[LightGBM] [Info] Start training from score 20.054536


[I 2023-10-24 03:12:12,315] Trial 0 finished with value: 0.4402741595472641 and parameters: {'num_leaves': 33, 'learning_rate': 0.1, 'n_estimators': 100, 'min_child_samples': 20}. Best is trial 0 with value: 0.4402741595472641.


[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000058 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 100
[LightGBM] [Info] Number of data points in the train set: 1907, number of used features: 3
[LightGBM] [Info] Start training from score 20.054536


[I 2023-10-24 03:12:14,192] Trial 1 finished with value: 0.40914464173607 and parameters: {'num_leaves': 26, 'learning_rate': 0.2, 'n_estimators': 80, 'min_child_samples': 11}. Best is trial 0 with value: 0.4402741595472641.


[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000052 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 100
[LightGBM] [Info] Number of data points in the train set: 1907, number of used features: 3
[LightGBM] [Info] Start training from score 20.054536


[I 2023-10-24 03:12:16,125] Trial 2 finished with value: 0.3996771259236719 and parameters: {'num_leaves': 32, 'learning_rate': 0.4, 'n_estimators': 110, 'min_child_samples': 18}. Best is trial 0 with value: 0.4402741595472641.


[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000022 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 100
[LightGBM] [Info] Number of data points in the train set: 1907, number of used features: 3
[LightGBM] [Info] Start training from score 20.054536


[I 2023-10-24 03:12:18,085] Trial 3 finished with value: 0.4184060856963028 and parameters: {'num_leaves': 25, 'learning_rate': 0.30000000000000004, 'n_estimators': 90, 'min_child_samples': 15}. Best is trial 0 with value: 0.4402741595472641.


[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000026 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 100
[LightGBM] [Info] Number of data points in the train set: 1907, number of used features: 3
[LightGBM] [Info] Start training from score 20.054536


[I 2023-10-24 03:12:20,209] Trial 4 finished with value: 0.41849234610222286 and parameters: {'num_leaves': 29, 'learning_rate': 0.2, 'n_estimators': 110, 'min_child_samples': 15}. Best is trial 0 with value: 0.4402741595472641.


Number of finished trials:  5
Best trial:
  R2:  0.4402741595472641
  Params: 
    num_leaves: 33
    learning_rate: 0.1
    n_estimators: 100
    min_child_samples: 20


### Gradient Boosting Regressor

- Parameters on the model that needs to be tuned.

In [45]:
params = {
'n_estimators': {
    'low': 80,
    'high': 120,
    'step': 10,
    'type': 'int'
    },
'learning_rate':{
    'low': 0.6,
    'high':1.0,
    'type': 'float'
    },
'min_samples_split': {
    'low': 2,
    'high': 5,
    'type': 'int'
    },
'min_samples_leaf':{
    'low': 1,
    'high': 5,
    'type': 'int'
    },
'max_depth': {
    'low': 2,
    'high': 4,
    'type': 'int'
    }
}

In [46]:
reg.GradientBoostingRegressor(is_tune=True, params=params)

[I 2023-10-24 03:12:22,124] A new study created in memory with name: no-name-d83ce797-be5d-4152-acba-e4d60fb00bc1
[I 2023-10-24 03:12:24,064] Trial 0 finished with value: 0.42183195495115544 and parameters: {'n_estimators': 110, 'learning_rate': 0.8, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_depth': 3}. Best is trial 0 with value: 0.42183195495115544.
[I 2023-10-24 03:12:25,903] Trial 1 finished with value: 0.4249985497479234 and parameters: {'n_estimators': 80, 'learning_rate': 1.0, 'min_samples_split': 2, 'min_samples_leaf': 2, 'max_depth': 2}. Best is trial 1 with value: 0.4249985497479234.
[I 2023-10-24 03:12:27,837] Trial 2 finished with value: 0.2878256724459112 and parameters: {'n_estimators': 110, 'learning_rate': 1.0, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_depth': 4}. Best is trial 1 with value: 0.4249985497479234.
[I 2023-10-24 03:12:29,728] Trial 3 finished with value: 0.3011359400980148 and parameters: {'n_estimators': 80, 'learning_rate': 1.0, 'min_s

Number of finished trials:  5
Best trial:
  R2:  0.4249985497479234
  Params: 
    n_estimators: 80
    learning_rate: 1.0
    min_samples_split: 2
    min_samples_leaf: 2
    max_depth: 2


### AdaBoost Regressor

- Parameters on the model that needs to be tuned.

In [47]:
params={
    'n_estimators':{
        'low':30,
        'high':60,
        'step':10,
        'type':'int'
    },
    'learning_rate':{
        'low':0.7,
        'high':1.0,
        'type':'float',
    },
    'loss':{
        'values':['linear', 'square'],
        'type': 'categorical'
    }
}

In [48]:
reg.AdaBoostRegressor(is_tune=True, params=params)

[I 2023-10-24 03:12:33,532] A new study created in memory with name: no-name-c0ccfdd6-9f76-4f19-86d7-3d80b79cce67
[I 2023-10-24 03:12:35,513] Trial 0 finished with value: 0.3503490163939573 and parameters: {'n_estimators': 30, 'learning_rate': 0.7999999999999999, 'loss': 'square'}. Best is trial 0 with value: 0.3503490163939573.
[I 2023-10-24 03:12:37,333] Trial 1 finished with value: 0.37178368274812257 and parameters: {'n_estimators': 60, 'learning_rate': 0.7999999999999999, 'loss': 'square'}. Best is trial 1 with value: 0.37178368274812257.
[I 2023-10-24 03:12:39,091] Trial 2 finished with value: 0.3989439509053283 and parameters: {'n_estimators': 60, 'learning_rate': 0.7999999999999999, 'loss': 'linear'}. Best is trial 2 with value: 0.3989439509053283.
[I 2023-10-24 03:12:41,012] Trial 3 finished with value: 0.3716045001327678 and parameters: {'n_estimators': 30, 'learning_rate': 0.7, 'loss': 'square'}. Best is trial 2 with value: 0.3989439509053283.
[I 2023-10-24 03:12:42,913] Tri

Number of finished trials:  5
Best trial:
  R2:  0.4096425555831199
  Params: 
    n_estimators: 30
    learning_rate: 0.7
    loss: square


### Decision Tree Regressor

- Parameters on the model that needs to be tuned.

In [49]:
params = {
'max_depth': {
    'low': 8,
    'high': 15,
    'type': 'int'
    },
'criterion':{
    'values': ['mse', 'mae'],
    'type': 'categorical'
    },
'splitter':{
    'values': ['best', 'random'],
    'type': 'categorical'
    },
'min_samples_split': {
    'low': 2,
    'high': 5,
    'type': 'int'
    },
'min_samples_leaf':{
    'low': 1,
    'high': 5,
    'type': 'int'
    },
'max_features':{
    'values': ['auto', 'sqrt', 'log2'],
    'type': 'categorical'
    }
}

In [50]:
reg.DecisionTreeRegressor(is_tune=True, params=params)

[I 2023-10-24 03:12:44,712] A new study created in memory with name: no-name-c4ec68a2-3683-40a1-af5e-50cbfeb6c67d
[I 2023-10-24 03:12:46,528] Trial 0 finished with value: 0.4294594503368575 and parameters: {'max_depth': 15, 'criterion': 'mse', 'splitter': 'best', 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': 'auto'}. Best is trial 0 with value: 0.4294594503368575.
[I 2023-10-24 03:12:48,316] Trial 1 finished with value: 0.0493934603109899 and parameters: {'max_depth': 9, 'criterion': 'mse', 'splitter': 'random', 'min_samples_split': 3, 'min_samples_leaf': 3, 'max_features': 'log2'}. Best is trial 0 with value: 0.4294594503368575.
[I 2023-10-24 03:12:50,221] Trial 2 finished with value: 0.11390167708609189 and parameters: {'max_depth': 12, 'criterion': 'mae', 'splitter': 'best', 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': 'log2'}. Best is trial 0 with value: 0.4294594503368575.
[I 2023-10-24 03:12:52,012] Trial 3 finished with value: 0.123880622859577

Number of finished trials:  5
Best trial:
  R2:  0.4294594503368575
  Params: 
    max_depth: 15
    criterion: mse
    splitter: best
    min_samples_split: 4
    min_samples_leaf: 3
    max_features: auto


### ExtraTree Regressor

- Parameters on the model that needs to be tuned.

In [51]:
params = {
'max_depth': {
    'low': 8,
    'high': 15,
    'type': 'int'
    },
'criterion':{
    'values': ['mse', 'mae'],
    'type': 'categorical'
    },
'n_estimators':{
    'low': 80,
    'high':120,
    'step':10,
    'type': 'int'
    },
'min_samples_split': {
    'low': 2,
    'high': 5,
    'type': 'int'
    },
'min_samples_leaf':{
    'low': 1,
    'high': 5,
    'type': 'int'
    },
'max_features':{
    'values': ['auto', 'sqrt', 'log2'],
    'type': 'categorical'
    }
}

In [52]:
reg.ExtraTreeRegressor(is_tune=True, params=params)

[I 2023-10-24 03:12:56,014] A new study created in memory with name: no-name-ed9053f6-e7fc-4206-896b-b076c0d7a295
[I 2023-10-24 03:12:58,048] Trial 0 finished with value: 0.20617844263151797 and parameters: {'max_depth': 14, 'criterion': 'mse', 'n_estimators': 90, 'min_samples_split': 2, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 0 with value: 0.20617844263151797.
[I 2023-10-24 03:13:00,601] Trial 1 finished with value: -0.053139932783343946 and parameters: {'max_depth': 15, 'criterion': 'mae', 'n_estimators': 80, 'min_samples_split': 4, 'min_samples_leaf': 5, 'max_features': 'sqrt'}. Best is trial 0 with value: 0.20617844263151797.
[I 2023-10-24 03:13:03,162] Trial 2 finished with value: -0.05219600131358937 and parameters: {'max_depth': 14, 'criterion': 'mae', 'n_estimators': 90, 'min_samples_split': 5, 'min_samples_leaf': 5, 'max_features': 'log2'}. Best is trial 0 with value: 0.20617844263151797.
[I 2023-10-24 03:13:06,773] Trial 3 finished with value: 0.03463065

Number of finished trials:  5
Best trial:
  R2:  0.33849503785938695
  Params: 
    max_depth: 8
    criterion: mse
    n_estimators: 110
    min_samples_split: 4
    min_samples_leaf: 3
    max_features: auto


In [53]:
pd.set_option('display.max_columns', None)

## Select the run of the experiment

In [54]:
df_runs = reg.search_runs(exp_id)
print("Number of runs done : ", len(df_runs))
df_runs

Number of runs done :  78


Unnamed: 0,artifact_uri,end_time,experiment_id,metrics.MAE,metrics.MSE,metrics.R2,metrics.RMSE,metrics.RMSLE,params.C,params.algorithm,params.alpha,params.criterion,params.degree,params.depth,params.iterations,params.kernel,params.l1_ratio,params.leaf_size,params.learning_rate,params.loss,params.max_depth,params.max_features,params.min_child_samples,params.min_samples_leaf,params.min_samples_split,params.n_estimators,params.n_neighbors,params.num_leaves,params.objective,params.splitter,params.tol,run_id,run_name,start_time,status,tags.data_path,tags.experiment_id,tags.experiment_name,tags.features,tags.mlflow.log-model.history,tags.mlflow.parentRunId,tags.run_id,tags.version.mlflow
0,s3://models/15/b7d4489220dd4a8a9b7466dff578e16...,2023-10-24 03:13:08.688000+00:00,15,21.823123,3077.836021,0.338495,55.478248,4.015991,,,,mse,,,,,,,,,8,auto,,3,4,110,,,,,,b7d4489220dd4a8a9b7466dff578e167,incident-forecasting_15_ExtraTree_regression_t...,2023-10-24 03:13:06.775000+00:00,FINISHED,-,15,incident-forecasting,"['ci_subtype_aff', 'priority', 'week_number']","[{""run_id"": ""b7d4489220dd4a8a9b7466dff578e167""...",ba7fcc13217f4740bb5baf2362ae1647,b7d4489220dd4a8a9b7466dff578e167,2.0.1
1,s3://models/15/3b8e7915f33540cabdc53f6ca6c0c29...,2023-10-24 03:13:06.753000+00:00,15,22.081030,4491.649649,0.034631,67.019771,4.204988,,,,mae,,,,,,,,,8,auto,,2,3,110,,,,,,3b8e7915f33540cabdc53f6ca6c0c291,incident-forecasting_15_ExtraTree_regression_t...,2023-10-24 03:13:03.164000+00:00,FINISHED,-,15,incident-forecasting,"['ci_subtype_aff', 'priority', 'week_number']","[{""run_id"": ""3b8e7915f33540cabdc53f6ca6c0c291""...",ba7fcc13217f4740bb5baf2362ae1647,3b8e7915f33540cabdc53f6ca6c0c291,2.0.1
2,s3://models/15/1c1921a7722547c3a4cf568b64db7fb...,2023-10-24 03:13:03.143000+00:00,15,23.044444,4895.634863,-0.052196,69.968813,4.248050,,,,mae,,,,,,,,,14,log2,,5,5,90,,,,,,1c1921a7722547c3a4cf568b64db7fb2,incident-forecasting_15_ExtraTree_regression_t...,2023-10-24 03:13:00.604000+00:00,FINISHED,-,15,incident-forecasting,"['ci_subtype_aff', 'priority', 'week_number']","[{""run_id"": ""1c1921a7722547c3a4cf568b64db7fb2""...",ba7fcc13217f4740bb5baf2362ae1647,1c1921a7722547c3a4cf568b64db7fb2,2.0.1
3,s3://models/15/2f277a72ee6c48b3acafc17955d355d...,2023-10-24 03:13:00.582000+00:00,15,23.094979,4900.026767,-0.053140,70.000191,4.248498,,,,mae,,,,,,,,,15,sqrt,,5,4,80,,,,,,2f277a72ee6c48b3acafc17955d355dc,incident-forecasting_15_ExtraTree_regression_t...,2023-10-24 03:12:58.050000+00:00,FINISHED,-,15,incident-forecasting,"['ci_subtype_aff', 'priority', 'week_number']","[{""run_id"": ""2f277a72ee6c48b3acafc17955d355dc""...",ba7fcc13217f4740bb5baf2362ae1647,2f277a72ee6c48b3acafc17955d355dc,2.0.1
4,s3://models/15/f44ddb2f8e3b407b8ec66108ab02fe5...,2023-10-24 03:12:58.021000+00:00,15,29.997993,3693.475822,0.206178,60.773973,4.107162,,,,mse,,,,,,,,,14,sqrt,,2,2,90,,,,,,f44ddb2f8e3b407b8ec66108ab02fe51,incident-forecasting_15_ExtraTree_regression_t...,2023-10-24 03:12:56.016000+00:00,FINISHED,-,15,incident-forecasting,"['ci_subtype_aff', 'priority', 'week_number']","[{""run_id"": ""f44ddb2f8e3b407b8ec66108ab02fe51""...",ba7fcc13217f4740bb5baf2362ae1647,f44ddb2f8e3b407b8ec66108ab02fe51,2.0.1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
73,s3://models/15/23781e36214b468da5490d07889379f...,2023-10-24 03:09:33.205000+00:00,15,19.785070,2798.369499,0.398559,52.899617,3.968396,,,,mse,,,,,,,,,,,,4,4,80,,,,,,23781e36214b468da5490d07889379fe,incident-forecasting_15_RandomForest_regressio...,2023-10-24 03:09:31.178000+00:00,FINISHED,-,15,incident-forecasting,"['ci_subtype_aff', 'priority', 'week_number']","[{""run_id"": ""23781e36214b468da5490d07889379fe""...",83828510e89e44ab8056e798bec70c0d,23781e36214b468da5490d07889379fe,2.0.1
74,s3://models/15/d76a94bb614549f09e1254a8be7ede7...,2023-10-24 03:09:31.154000+00:00,15,20.338700,2706.504621,0.418304,52.024077,3.951707,,,,mae,,,,,,,,,,,,2,3,100,,,,,,d76a94bb614549f09e1254a8be7ede75,incident-forecasting_15_RandomForest_regressio...,2023-10-24 03:09:27.616000+00:00,FINISHED,-,15,incident-forecasting,"['ci_subtype_aff', 'priority', 'week_number']","[{""run_id"": ""d76a94bb614549f09e1254a8be7ede75""...",83828510e89e44ab8056e798bec70c0d,d76a94bb614549f09e1254a8be7ede75,2.0.1
75,s3://models/15/84cdc546a0ed494baafa74f325ce3be...,2023-10-24 03:09:27.594000+00:00,15,21.006242,2850.665139,0.387320,53.391620,3.977654,,,,mae,,,,,,,,,,,,1,4,110,,,,,,84cdc546a0ed494baafa74f325ce3be5,incident-forecasting_15_RandomForest_regressio...,2023-10-24 03:09:23.799000+00:00,FINISHED,-,15,incident-forecasting,"['ci_subtype_aff', 'priority', 'week_number']","[{""run_id"": ""84cdc546a0ed494baafa74f325ce3be5""...",83828510e89e44ab8056e798bec70c0d,84cdc546a0ed494baafa74f325ce3be5,2.0.1
76,s3://models/15/a402482ea50d49499a24cfdcbe7a863...,2023-10-24 03:09:23.777000+00:00,15,19.750240,2779.055879,0.402710,52.716751,3.964933,,,,mse,,,,,,,,,,,,4,4,100,,,,,,a402482ea50d49499a24cfdcbe7a8631,incident-forecasting_15_RandomForest_regressio...,2023-10-24 03:09:21.339000+00:00,FINISHED,-,15,incident-forecasting,"['ci_subtype_aff', 'priority', 'week_number']","[{""run_id"": ""a402482ea50d49499a24cfdcbe7a8631""...",83828510e89e44ab8056e798bec70c0d,a402482ea50d49499a24cfdcbe7a8631,2.0.1


## Evaluating Models

In [55]:
top_runs = df_runs.sort_values(['metrics.RMSE'],ascending=False)
top_runs.head()

Unnamed: 0,artifact_uri,end_time,experiment_id,metrics.MAE,metrics.MSE,metrics.R2,metrics.RMSE,metrics.RMSLE,params.C,params.algorithm,params.alpha,params.criterion,params.degree,params.depth,params.iterations,params.kernel,params.l1_ratio,params.leaf_size,params.learning_rate,params.loss,params.max_depth,params.max_features,params.min_child_samples,params.min_samples_leaf,params.min_samples_split,params.n_estimators,params.n_neighbors,params.num_leaves,params.objective,params.splitter,params.tol,run_id,run_name,start_time,status,tags.data_path,tags.experiment_id,tags.experiment_name,tags.features,tags.mlflow.log-model.history,tags.mlflow.parentRunId,tags.run_id,tags.version.mlflow
51,s3://models/15/3f283e3c6e7844159c81df60b202a9f...,2023-10-24 03:11:30.341000+00:00,15,23.485317,5167.007775,-0.110521,71.881902,4.275025,0.6,,,,2.0,,,linear,,,,,,,,,,,,,,,,3f283e3c6e7844159c81df60b202a9fb,incident-forecasting_15_SupportVector_regressi...,2023-10-24 03:11:28.070000+00:00,FINISHED,-,15,incident-forecasting,"['ci_subtype_aff', 'priority', 'week_number']","[{""run_id"": ""3f283e3c6e7844159c81df60b202a9fb""...",4e31db38a67744c58106de9673188dad,3f283e3c6e7844159c81df60b202a9fb,2.0.1
48,s3://models/15/9b543026b7294754a6880776e6115b8...,2023-10-24 03:11:36.548000+00:00,15,23.484236,5166.330305,-0.110375,71.877189,4.274959,0.9,,,,3.0,,,linear,,,,,,,,,,,,,,,,9b543026b7294754a6880776e6115b8c,incident-forecasting_15_SupportVector_regressi...,2023-10-24 03:11:34.215000+00:00,FINISHED,-,15,incident-forecasting,"['ci_subtype_aff', 'priority', 'week_number']","[{""run_id"": ""9b543026b7294754a6880776e6115b8c""...",4e31db38a67744c58106de9673188dad,9b543026b7294754a6880776e6115b8c,2.0.1
52,s3://models/15/93350baeaa364589ae450470470853d...,2023-10-24 03:11:28.050000+00:00,15,23.466088,5141.952873,-0.105136,71.707412,4.272594,1.0,,,,4.0,,,poly,,,,,,,,,,,,,,,,93350baeaa364589ae450470470853d1,incident-forecasting_15_SupportVector_regressi...,2023-10-24 03:11:26.126000+00:00,FINISHED,-,15,incident-forecasting,"['ci_subtype_aff', 'priority', 'week_number']","[{""run_id"": ""93350baeaa364589ae450470470853d1""...",4e31db38a67744c58106de9673188dad,93350baeaa364589ae450470470853d1,2.0.1
50,s3://models/15/2ba8df379b13410e9a4faf94328a5b8...,2023-10-24 03:11:32.323000+00:00,15,23.466001,5141.850879,-0.105114,71.7067,4.272584,0.9,,,,4.0,,,poly,,,,,,,,,,,,,,,,2ba8df379b13410e9a4faf94328a5b87,incident-forecasting_15_SupportVector_regressi...,2023-10-24 03:11:30.363000+00:00,FINISHED,-,15,incident-forecasting,"['ci_subtype_aff', 'priority', 'week_number']","[{""run_id"": ""2ba8df379b13410e9a4faf94328a5b87""...",4e31db38a67744c58106de9673188dad,2ba8df379b13410e9a4faf94328a5b87,2.0.1
53,s3://models/15/4e31db38a67744c58106de9673188da...,2023-10-24 03:11:36.600000+00:00,15,23.429327,5129.260846,-0.102408,71.618858,4.271358,,,,,,,,,,,,,,,,,,,,,,,,4e31db38a67744c58106de9673188dad,incident-forecasting_15_SupportVector_regression,2023-10-24 03:11:24.304000+00:00,FINISHED,-,15,incident-forecasting,"['ci_subtype_aff', 'priority', 'week_number']","[{""run_id"": ""4e31db38a67744c58106de9673188dad""...",,4e31db38a67744c58106de9673188dad,2.0.1


## Selecting best model

In [56]:
artifacts = top_runs.iloc[0]["artifact_uri"]
run_id = top_runs.iloc[0]["run_id"]
model_name = top_runs.iloc[0]["run_name"] 


print('Best model_artifacts :',artifacts)
print("=" * 100)
print('Best model run_id :',run_id)
print("=" * 100)
print('Best model :',model_name)
print("=" * 100)
print("Best model experiment id :",exp_id)

Best model_artifacts : s3://models/15/3f283e3c6e7844159c81df60b202a9fb/artifacts
Best model run_id : 3f283e3c6e7844159c81df60b202a9fb
Best model : incident-forecasting_15_SupportVector_regression_tuned
Best model experiment id : 15


## Fetching Model

In [57]:
location = f"{artifacts}/{model_name}"

In [58]:
model = load_model(location)

## Predict

In [59]:
y_pred = model.predict(X_test)

In [60]:
# Prepare variable as DataFrame in pandas
df = pd.DataFrame(X_test)

# Add the target variable to df
df["y_pred"] = y_pred

In [61]:
df

Unnamed: 0,ci_subtype_aff,priority,week_number,y_pred
78,2,4,52,2.870413
118,2,4,52,2.870413
157,2,4,52,2.870413
260,3,4,52,2.85812
345,4,3,52,3.237375
410,5,4,52,2.833534
453,6,5,52,2.429694
516,7,4,52,2.808949
572,9,4,52,2.784363
612,9,4,52,2.784363
