Your model will be evaluated using two metrics: profit @ top-20, and AUC. The reasons for this is to be in line with a more realistic setting. E.g. one can image data scientists in a team arguing to use AUC and optimize for that. However, as seen in the course, for this scenario, we also imagine management arguing that there is not enough budget (in terms of time and money) to contact a lot of people (or hand out a lot of promotions). Hence, they have come up with the following: based on the top-k would-be churners as predicted by your model, sum some proxy of "retained profitability" in case the customer was indeed a churner, or zero otherwise

As a proxy of profitability, the feature average cost min was deemed to be a good value. Based on the size of the test set, k=20 was deemed to be a good choice. Hence, management cares about optimizing this metric
Note that only about half of the test set is used for the "public" leaderboard. That means that the score you will see on the leaderboard is done using this part of the test only (you don't know which half). Later on through the semester, submissions are frozen and the resuls on the "hidden" part will be revealed

Also, whilst you can definitely try, the goal is not to "win", but to help you reflect on your model's results, see how others are doing, etc.

Objectives:

Some groups prefer to write their final report using Jupyter Notebook, which is fine too, as long as it is readable top-to-bottom

You can use any predictive technique/approach you want, though focus on the whole process: general setup, critical thinking, and the ability to get and validate an outcome

You're free to use unsupervised technique for your data exploration part, too. When you decide to build a black box model, including some interpretability techniques to explain it is a plus

Any other assumptions or insights are thoughts can be included as well: the idea is to take what we've seen in class, get your hands dirty and try out what we've seen

Perform a critical review of the evaluation metric chosen by management. How in line is it with AUC? What would you have picked instead? Were there particular issues with this chosen metric, in your view?

In [None]:
import pandas as pd
import os
import plotly.graph_objects as go
import numpy as np

pd.options.display.max_columns = 100

In [None]:
# Initialising
TRAIN_SET_FRAC = 0.8
SEED = 42
TARGET_VAR = "target"
DROP_VARS = ['Connect_Date', 'id'] # TBC
KFOLD = 5

**Loading Data**

In [None]:
# GitHib urls to fetch data from
url_train = 'https://raw.githubusercontent.com/hello-bob/AA_P1/main/data/train.csv'
url_test = 'https://raw.githubusercontent.com/hello-bob/AA_P1/main/data/test.csv'

# Read train and test data
train_data = pd.read_csv(url_train, sep = ',', skipinitialspace = True, engine = 'python')
train_data = train_data.drop(columns=DROP_VARS)
test_data  = pd.read_csv(url_test, sep = ',', skipinitialspace = True, engine = 'python')

**Data exploration**

In [None]:
train_data.head()

In [None]:
# Check data types
train_data.info()
test_data.info()

In [None]:
# Basic descriptives
train_data.describe(include='all')

In [None]:
# Impute missing data before modelling: Can quantitate and put it on the report since 4/5k samples
# Apply on the test set. Train set is ok.
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer

train_data.isnull().any().sort_values(ascending=False) # Columns with missing values: Dropped_calls_ratio, Usage_Band, call_cost_per_min.
train_data[train_data.isnull().any(axis=1)] # 4 cases, 2 churners

imputer_compiled = ColumnTransformer(
    [("numeric_imputer", SimpleImputer(strategy="median",), ["Dropped_calls_ratio", "call_cost_per_min"]),
     ("cat_imputer", SimpleImputer(strategy="most_frequent"), ["Usage_Band"])]
)

# Imput median for numeric variables first. Because "most_frequent" strategy will impute for both numeric and categorical data
train_data[["Dropped_calls_ratio", "call_cost_per_min", "Usage_Band"]] = imputer_compiled.fit_transform(train_data)
test_data[["Dropped_calls_ratio", "call_cost_per_min", "Usage_Band"]] = imputer_compiled.transform(test_data)

# Correcting dtype
train_data[["Dropped_calls_ratio", "call_cost_per_min"]] = train_data[["Dropped_calls_ratio", "call_cost_per_min"]].astype(float)
test_data[["Dropped_calls_ratio", "call_cost_per_min"]] = test_data[["Dropped_calls_ratio", "call_cost_per_min"]].astype(float)


In [None]:
# [For report] Pie chart about class inbalance (train set) + Percentage churn in categorical variable


In [None]:
# [For report] correlation plot
corr = train_data.corr(numeric_only=True)

fig = go.Figure()
fig.add_trace(
    go.Heatmap(
        x = corr.columns,
        y = corr.index,
        z = np.array(corr),
        text=corr.values,
        texttemplate='%{text:.2f}'
    )
)
fig.update_layout(
    autosize=False,
    width=800,
    height=800,
)
fig.show()

In [None]:
# [For report] Correlation between categorical variables



**Data preprocessing**

In [None]:
# Imputing missing values
# outliers
from sklearn.ensemble import IsolationForest

outlier_df = (train_data.select_dtypes(include='number')
              .drop(columns=TARGET_VAR)
              .dropna()
              .copy())

iso_forest = IsolationForest(random_state=SEED, n_jobs=-1).fit(outlier_df)
pred = iso_forest.predict(outlier_df)
outlier_df['is_outlier'] = (pred == -1).astype(int)

In [None]:
# Finding what drives outliers: TBC
corr = outlier_df.corr(numeric_only=True)

fig = go.Figure()
fig.add_trace(
    go.Heatmap(
        x = corr.columns,
        y = corr.index,
        z = np.array(corr),
        text=corr.values,
        texttemplate='%{text:.2f}'
    )
)
fig.update_layout(
    autosize=False,
    width=800,
    height=800,
)
fig.show()

In [None]:
# !pip install shap

In [None]:
# https://stats.stackexchange.com/questions/404017/how-to-get-top-features-that-contribute-to-anomalies-in-isolation-forest
import shap

# Create shap values and plot them
shap_values = shap.TreeExplainer(iso_forest).shap_values(outlier_df)
shap.summary_plot(shap_values, outlier_df)

In [None]:
# Decide or not to keep/drop/use the outliers as a feature. To research on churn context

**Modelling**

In [None]:
X = train_data.drop(columns=TARGET_VAR)
y = train_data[TARGET_VAR] 

NUM_VARS = train_data.select_dtypes(include='number').drop(columns=TARGET_VAR).columns
CAT_VARS = train_data.select_dtypes(include='object').columns

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score, GridSearchCV

# Define preprocessors for numerical and categorical features
numerical_preprocessor = Pipeline([
    ("scaler", StandardScaler())
])

categorical_preprocessor = Pipeline([
    ("onehot", OneHotEncoder(drop="if_binary"))
])

In [None]:
# Combine preprocessors and model
model = Pipeline([
    ("preprocessor", ColumnTransformer([
        ("numerical", numerical_preprocessor, NUM_VARS),
        ("categorical", categorical_preprocessor, CAT_VARS)
    ])),
    ("model", SVC(probability=True, random_state=SEED))
])


In [None]:
# For SVM
parameters = {'model__kernel':['linear', 'rbf'], 
              'model__C':[1]} # rmb to add the double underscores to allow gridsearch to fit on pipelines
svc_gs_est = GridSearchCV(estimator=model, param_grid=parameters,cv=KFOLD,
                      scoring="roc_auc",n_jobs=-1, refit=True)
svc_gs_est.fit(X, y)

In [68]:
svc_gs_results = pd.DataFrame(data=svc_gs_est.cv_results_)
svc_gs_results.sort_values(by='rank_test_score', ascending = True)

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_model__C,param_model__kernel,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
1,3.127157,0.602093,0.216811,0.060668,1,rbf,"{'model__C': 1, 'model__kernel': 'rbf'}",0.936874,0.903418,0.909888,0.940346,0.905075,0.91912,0.016092,1
0,4.448209,1.201936,0.043547,0.007839,1,linear,"{'model__C': 1, 'model__kernel': 'linear'}",0.905346,0.908717,0.902435,0.911292,0.91074,0.907706,0.00336,2


In [None]:
# Interpretability

**Prediction**

In [None]:
# Retrain best model: Set up the params accordingly
numerical_preprocessor = Pipeline([
    ("scaler", StandardScaler())
])

categorical_preprocessor = Pipeline([
    ("onehot", OneHotEncoder(drop="if_binary"))
])

best_model = Pipeline([
    ("preprocessor", ColumnTransformer([
        ("numerical", numerical_preprocessor, NUM_VARS),
        ("categorical", categorical_preprocessor, CAT_VARS)
    ])),
    ("model", SVC(probability=True, random_state=SEED, C=1, kernel="rbf"))
])


best_model.fit(X, y)
pred = pd.DataFrame(best_model.predict_proba(test_data), 
                    columns=["0", "1"])

In [None]:
# For submission
test_data_sub = pd.DataFrame(data={'ID':test_data['id'], 
                                   'PRED':pred["1"]})
test_data_sub

**XGBoost**

In [None]:
# Basic preprocessing
numeric_transformer = Pipeline(
    steps = [
        ("imputer", SimpleImputer(strategy="median"))
    ]
)

categorical_transformer = Pipeline(
    steps = [
        ("imputer", SimpleImputer(strategy="most_frequent")),
        ("encoder", OneHotEncoder())
    ]
)

preprocessor = ColumnTransformer(
    transformers=[
        ("num", numeric_transformer, NUM_VARS),
        ("cat", categorical_transformer, CAT_VARS),
    ]
)

X = preprocessor.fit_transform(X)
# Alternatively split train-test before, do preprocessing on training data (fit_transform) then transform test data


In [None]:
X

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test= train_test_split(X, y,
test_size=0.2, random_state=420)

In [None]:
import xgboost as xgb
xg_cl = xgb.XGBClassifier(objective='binary:logistic', n_estimators=10, seed=420)

In [None]:
xg_cl.fit(X_train, y_train)

In [None]:
preds = xg_cl.predict(X_test)

In [None]:
accuracy = float(np.sum(preds==y_test))/y_test.shape[0]
print("accuracy: %f" % (accuracy))

In [None]:
churn_dmatrix = xgb.DMatrix(data=X,label=y)

In [None]:
params={"objective":"binary:logistic","max_depth":4}

In [None]:
cv_results = xgb.cv(dtrain=churn_dmatrix, params=params, nfold=4,
num_boost_round=10, metrics="auc", as_pandas=True, seed = 420)
print(cv_results)

In [None]:
print("AUC: %f" %((cv_results["test-auc-mean"]).iloc[-1]))

In [None]:
# Hyperparameter tuning

gbm_param_grid = {'learning_rate': [0.01,0.1,0.5,0.9],
                  'n_estimators': [50],
                  'subsample': [0.3, 0.5, 0.9],
                  'max_depth': [3, 4, 5, 10]}

gbm = xgb.XGBClassifier()
grid_auc = GridSearchCV(estimator=gbm,param_grid=gbm_param_grid,
scoring='roc_auc', cv=4, verbose=1)

In [None]:
grid_auc.fit(X, y)
print("Best parameters found: ",grid_auc.best_params_)
print("Lowest AUC found: ", np.sqrt(np.abs(grid_auc.best_score_)))

Tuning individual parameters, using xgboost's built-in cv

In [41]:
# Tune boosting rounds: num_rounds
params={"objective":"binary:logistic","max_depth":4}
num_rounds = [5, 10, 15, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
final_auc_per_round = []
for curr_num_rounds in num_rounds:
    cv_results = xgb.cv(  dtrain=churn_dmatrix
                        , params=params
                        , nfold=5
                        , num_boost_round=curr_num_rounds
                        , metrics="auc"
                        , as_pandas=True
                        , seed=420
                        )
    final_auc_per_round.append(cv_results["test-auc-mean"].tail().values[-1])

num_rounds_aucs = list(zip(num_rounds, final_auc_per_round))
print(pd.DataFrame(num_rounds_aucs, columns=["num_boosting_rounds", "auc"]))

    num_boosting_rounds       auc
0                     5  0.887040
1                    10  0.926403
2                    15  0.931121
3                    20  0.931976
4                    21  0.931619
5                    22  0.932006
6                    23  0.932105
7                    24  0.931427
8                    25  0.931152
9                    26  0.931289
10                   27  0.931365
11                   28  0.931454
12                   29  0.930831
13                   30  0.931385


In [42]:
# Automated boosting round selection using early stopping
cv_results = xgb.cv(  dtrain=churn_dmatrix
                    , params=params
                    , nfold=5
                    , num_boost_round=50
                    , early_stopping_rounds=5
                    , metrics="auc"
                    , as_pandas=True
                    , seed=420
                    )

print(cv_results)

    train-auc-mean  train-auc-std  test-auc-mean  test-auc-std
0         0.843986       0.012999       0.834137      0.026721
1         0.871512       0.004073       0.863295      0.013572
2         0.875975       0.004784       0.866624      0.015697
3         0.881229       0.006182       0.869440      0.013840
4         0.899352       0.023111       0.887040      0.007989
5         0.930135       0.004538       0.909397      0.014975
6         0.937508       0.004778       0.918990      0.014786
7         0.942165       0.004286       0.922538      0.016709
8         0.947413       0.004621       0.923927      0.011666
9         0.951834       0.004212       0.926403      0.012128
10        0.954418       0.003076       0.928033      0.011876
11        0.958006       0.004205       0.928746      0.013141
12        0.960966       0.002897       0.929687      0.013613
13        0.963332       0.003045       0.930444      0.014328
14        0.965062       0.003191       0.931121      0

In [43]:
# Tuning eta (learning rate)
params={"objective":"binary:logistic","max_depth":4}
eta_vals = [0.001, 0.01, 0.1]
best_auc = []

for curr_val in eta_vals:
    params["eta"] = curr_val
    cv_results = xgb.cv(  dtrain=churn_dmatrix
                    , params=params
                    , nfold=5
                    , num_boost_round=50
                    , early_stopping_rounds=5
                    , metrics="auc"
                    , as_pandas=True
                    , seed=420
                    )
    best_auc.append(cv_results["test-auc-mean"].tail().values[-1])

print(pd.DataFrame(list(zip(eta_vals, best_auc)), columns=["eta", "best_auc"]))
    

     eta  best_auc
0  0.001  0.834137
1  0.010  0.862392
2  0.100  0.932728


Grid search

In [75]:
gbm_param_grid = {  'n_estimators': [10, 20, 30, 50]
                  #, 'early_stopping_rounds': [5]
                  , 'max_depth': range(2, 9)
                  , 'learning_rate': [0.001, 0.01, 0.1]
                  , 'subsample': [0.8, 1]
                  , 'colsample_bytree': [0.2, 0.5, 0.8, 1]
                  , 'reg_lambda': [0, 1, 5, 10]
                  }

gbm = xgb.XGBClassifier(random_state=420)
grid_auc = GridSearchCV(estimator=gbm, param_grid=gbm_param_grid, scoring='roc_auc', cv=5, n_jobs=-1, verbose=1)

In [77]:
grid_auc.fit(X, y)    # I think this fits it on the whole dataset, not just the 80% training. Also - validation? or it's inside cv?
print("Best parameters found: ",grid_auc.best_params_)
print("Lowest AUC found: ", grid_auc.best_score_)

Fitting 5 folds for each of 2688 candidates, totalling 13440 fits
Best parameters found:  {'colsample_bytree': 1, 'learning_rate': 0.1, 'max_depth': 8, 'n_estimators': 50, 'reg_lambda': 0, 'subsample': 1}
Lowest AUC found:  0.9430718366280477


In [78]:
# The previous code ran for 6.5+ minutes. Let's do RandomizedSearchCV, and get 1000 random models from the hyperparameter space
from sklearn.model_selection import RandomizedSearchCV 
gbm_param_grid = {  'n_estimators': np.arange(5, 101, 1)
                  , 'max_depth': range(2, 13)
                  , 'learning_rate': np.arange(0.001, 5, 0.01)
                  , 'subsample': [0.2, 0.4, 0.6, 0.8, 1]
                  , 'colsample_bytree': [0.2, 0.5, 0.8, 1]
                  , 'reg_lambda': [0, 1, 5, 10, 100]
                  }

gbm = xgb.XGBClassifier(random_state=420)
randomized_auc = RandomizedSearchCV(estimator=gbm, param_distributions=gbm_param_grid, n_iter=1000, scoring='roc_auc', cv=5, n_jobs=-1, verbose=1)


In [79]:
randomized_auc.fit(X, y)    # I think this fits it on the whole dataset, not just the 80% training. Also - validation? or it's inside cv?
print("Best parameters found: ",randomized_auc.best_params_)
print("Lowest AUC found: ", randomized_auc.best_score_)

Fitting 5 folds for each of 1000 candidates, totalling 5000 fits
Best parameters found:  {'subsample': 0.8, 'reg_lambda': 0, 'n_estimators': 34, 'max_depth': 12, 'learning_rate': 0.08099999999999999, 'colsample_bytree': 1}
Lowest AUC found:  0.9436782764340442


All in one - pipeline, xgboost with gridsearch

In [84]:
gbm_param_grid = {  'n_estimators': [10, 20]
                  , 'max_depth': [10, 20]
                  , 'learning_rate': [0.01, 0.1]
                  , 'subsample': [0.8, 1]
                  , 'colsample_bytree': [0.8, 1]
                  , 'reg_lambda': [0, 1]
                  }

gbm = xgb.XGBClassifier(random_state=420)
grid_auc = GridSearchCV(estimator=gbm, param_grid=gbm_param_grid, scoring='roc_auc', cv=5, n_jobs=-1, verbose=1)

In [85]:
grid_auc.fit(X, y)    # I think this fits it on the whole dataset, not just the 80% training. Also - validation? or it's inside cv?
print("Best parameters found: ",grid_auc.best_params_)
print("Lowest AUC found: ", grid_auc.best_score_)

Fitting 5 folds for each of 64 candidates, totalling 320 fits
Best parameters found:  {'colsample_bytree': 1, 'learning_rate': 0.1, 'max_depth': 10, 'n_estimators': 10, 'reg_lambda': 0, 'subsample': 0.8}
Lowest AUC found:  0.9441100727830973


In [86]:
grid_auc_results = pd.DataFrame(data=grid_auc.cv_results_)
grid_auc_results.sort_values(by='rank_test_score', ascending = True)

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_colsample_bytree,param_learning_rate,param_max_depth,param_n_estimators,param_reg_lambda,param_subsample,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
48,0.432357,0.055280,0.011198,0.010851,1,0.1,10,10,0,0.8,"{'colsample_bytree': 1, 'learning_rate': 0.1, ...",0.945158,0.941825,0.937982,0.957172,0.938414,0.944110,0.007028,1
52,0.701782,0.037729,0.007999,0.000004,1,0.1,10,20,0,0.8,"{'colsample_bytree': 1, 'learning_rate': 0.1, ...",0.946496,0.933487,0.939336,0.954277,0.940633,0.942846,0.007054,2
32,0.367997,0.019595,0.004799,0.003918,1,0.01,10,10,0,0.8,"{'colsample_bytree': 1, 'learning_rate': 0.01,...",0.948580,0.928324,0.936078,0.960504,0.939965,0.942690,0.011041,3
36,0.636467,0.018652,0.006398,0.003199,1,0.01,10,20,0,0.8,"{'colsample_bytree': 1, 'learning_rate': 0.01,...",0.947733,0.926046,0.935262,0.961269,0.940652,0.942192,0.011880,4
20,0.607997,0.011310,0.006395,0.003197,0.8,0.1,10,20,0,0.8,"{'colsample_bytree': 0.8, 'learning_rate': 0.1...",0.947979,0.930467,0.936351,0.953473,0.941125,0.941879,0.008161,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
37,0.703184,0.016702,0.008000,0.000005,1,0.01,10,20,0,1,"{'colsample_bytree': 1, 'learning_rate': 0.01,...",0.927162,0.918710,0.921301,0.937759,0.920596,0.925106,0.006929,60
33,0.384969,0.011756,0.004801,0.003920,1,0.01,10,10,0,1,"{'colsample_bytree': 1, 'learning_rate': 0.01,...",0.922604,0.917949,0.921625,0.939231,0.913295,0.922941,0.008777,61
43,0.356656,0.030918,0.004800,0.003919,1,0.01,20,10,1,1,"{'colsample_bytree': 1, 'learning_rate': 0.01,...",0.919845,0.911476,0.912763,0.938349,0.921537,0.920794,0.009601,62
45,2.544092,0.342451,0.006406,0.003203,1,0.01,20,20,0,1,"{'colsample_bytree': 1, 'learning_rate': 0.01,...",0.937346,0.897511,0.921601,0.927821,0.912201,0.919296,0.013629,63


In [87]:
gbm_param_grid = {  'n_estimators': [10, 20]
                  , 'max_depth': [10, 20]
                  , 'learning_rate': [0.01, 0.1]
                  , 'subsample': [0.8, 1]
                  , 'colsample_bytree': [0.8, 1]
                  , 'reg_lambda': [0, 1]
                  }

gbm = xgb.XGBClassifier(random_state=420)
randomized_auc = RandomizedSearchCV(estimator=gbm, param_distributions=gbm_param_grid, n_iter=10, scoring='roc_auc', cv=5, n_jobs=-1, verbose=1)

In [90]:
randomized_auc.fit(X, y)    # I think this fits it on the whole dataset, not just the 80% training. Also - validation? or it's inside cv?
print("Best parameters found: ",randomized_auc.best_params_)
print("Lowest AUC found: ", randomized_auc.best_score_)

Fitting 5 folds for each of 10 candidates, totalling 50 fits
Best parameters found:  {'subsample': 0.8, 'reg_lambda': 0, 'n_estimators': 20, 'max_depth': 10, 'learning_rate': 0.01, 'colsample_bytree': 1}
Lowest AUC found:  0.9421923788972808


In [91]:
randomized_auc_results = pd.DataFrame(data=randomized_auc.cv_results_)
randomized_auc_results.sort_values(by='rank_test_score', ascending = True)

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_subsample,param_reg_lambda,param_n_estimators,param_max_depth,param_learning_rate,param_colsample_bytree,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
6,0.681047,0.027432,0.006397,0.003198,0.8,0,20,10,0.01,1.0,"{'subsample': 0.8, 'reg_lambda': 0, 'n_estimat...",0.947733,0.926046,0.935262,0.961269,0.940652,0.942192,0.01188,1
7,1.056358,0.080919,0.008,6e-06,1.0,1,20,20,0.1,0.8,"{'subsample': 1, 'reg_lambda': 1, 'n_estimator...",0.951323,0.937877,0.933483,0.947924,0.929745,0.94007,0.008284,2
0,0.199039,0.011889,0.007999,1e-06,0.8,1,10,10,0.01,0.8,"{'subsample': 0.8, 'reg_lambda': 1, 'n_estimat...",0.95288,0.926853,0.940795,0.949848,0.923901,0.938855,0.011738,3
3,0.632894,0.027085,0.007995,7e-06,0.8,1,10,20,0.1,1.0,"{'subsample': 0.8, 'reg_lambda': 1, 'n_estimat...",0.948369,0.934985,0.930568,0.949727,0.925069,0.937744,0.00976,4
4,2.374222,0.097859,0.006397,0.003199,0.8,0,20,20,0.01,1.0,"{'subsample': 0.8, 'reg_lambda': 0, 'n_estimat...",0.945942,0.924403,0.936546,0.947487,0.931808,0.937237,0.008666,5
2,0.440511,0.021383,0.008106,0.000202,1.0,0,10,10,0.01,0.8,"{'subsample': 1, 'reg_lambda': 0, 'n_estimator...",0.949887,0.927228,0.93527,0.952845,0.920616,0.937169,0.012521,6
9,0.796079,0.065245,0.00846,0.000923,1.0,1,20,20,0.01,0.8,"{'subsample': 1, 'reg_lambda': 1, 'n_estimator...",0.952907,0.931731,0.928925,0.943507,0.924991,0.936412,0.010304,7
8,2.319031,0.159208,0.006926,0.003609,1.0,0,20,20,0.1,0.8,"{'subsample': 1, 'reg_lambda': 0, 'n_estimator...",0.949259,0.928976,0.926885,0.945969,0.928311,0.93588,0.009661,8
5,1.441069,0.033816,0.004801,0.00392,0.8,0,10,20,0.01,1.0,"{'subsample': 0.8, 'reg_lambda': 0, 'n_estimat...",0.940483,0.931899,0.93511,0.947487,0.922713,0.935538,0.008308,9
1,1.773034,0.080382,0.007998,0.005059,1.0,0,10,20,0.1,0.8,"{'subsample': 1, 'reg_lambda': 0, 'n_estimator...",0.943726,0.923365,0.92713,0.933631,0.911623,0.927895,0.010672,10
