# 4.0 Modeling 

Contents

4.1 [Introduction](#4.1-Introduction)

  * [4.1.1 Problem Recap](#4.1.1-Problem-Recap)
  * [4.1.2 Notebook Goals](#4.1.2-Notebook-Goals)
 
4.2 [Load the data](#4.2-Load-the-data)

  * [4.2.1 Imports](#4.2.1-Imports)
  * [4.2.2 Load dataframe](#4.2.2-Load-dataframe)

4.3 [Vectorize and split data](#4.3-Vectorize-and-split-data)

4.4 [Sampling Techniques for Imbalance](#4.4-Sampling-Techniques-for-Imbalance)

  * [4.4.1 Sampling overview](#4.4.1-sampling-overview)
  * [4.4.2 Timing Comparisons](#4.4.2-timing-comparison-of-samplers)

4.5 [Bayesian Hyperparameter Optimization with Optuna](#4.5-bayesian-hyperparameter-optimization-with-optuna)

  * [4.5.1 About Optuna](#4.5.1-about-optuna:)
  * [4.5.2 Custom Optimization Function](#4.5.2-custom-optimization-function)
  * [4.5.3 Run 1: Small training size](#4.5.3-run-1)
  * [4.5.4 Run 2 Medium training size](#4.5.4-run-2)
  * [4.5.5 Run 3 Large training size, no sampling](#4.5.5-run-3)

4.6 [Modeling Summary](#4.6-modeling-summary)

  * [4.6.1 Model Comparisons](#461-model-comparisons)
  * [4.6.2 Conclusions](#462-conclusion)

## 4.1 Introduction

### 4.1.1 Problem Recap

Using customer text data about amazon products, we will build, evaluate and compare models to estimate the probability that a given text review can be classified as “positive” or “negative”.

Our goal is to build a text classifier using Amazon product review data which can be used to analyze customer sentiment which does not have accompanying numeric data. The metric we will be primarily interested in will be Recall on the positive class. This is the proportion of the positive class (negative reviews coded as "1" in the data) we correctly predict.

### 4.1.2 Notebook Goals

1. In our previous notebook our best results came from Term-Frequency Inverse-Document Frequency vectorization and a Logistic Regression Model.

2. We had slightly worse results from a Naive Bayes and Random Forest model. The Naive Bayes model incorrectly predicted a higher proportion of the negative class and the Random Forest model appeared to strongly overfit the training data with a very poor Recall on the test set.

3. Try over-sampling the minority class that we are trying to predict (encoded as "1"s) and/or under-sampling the majority class.

4. Test some other models such as gradient boosted trees (LightGBM/XGBoost) 

5. Examine how well our models will generalize with random sub-samples of the data.

6. Tune hyperparameters with bayesian search optimization using [Optuna](https://optuna.org/).

## 4.2 Load the data

### 4.2.1 Imports

In [1]:
from random import seed

#reading/processing data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pyarrow.parquet as pq
import plotly.express as px

#splitting the dataset
from sklearn.model_selection import train_test_split

#scaling/vectorization
from sklearn.feature_extraction.text import TfidfVectorizer

# models
from sklearn.linear_model import LogisticRegression
import xgboost as xgb
from imblearn.pipeline import Pipeline
import lightgbm as lgb

#metrics
from sklearn.metrics import recall_score

#dealing with class imbalance
from imblearn.over_sampling import RandomOverSampler, SMOTE, ADASYN
from imblearn.under_sampling import RandomUnderSampler

#hyperparameter tuning
import optuna


  from pandas import MultiIndex, Int64Index


### 4.2.2 Load dataframe

In [2]:
data = pq.read_table("../data/edited/fashion.parquet")
fashion = data.to_pandas()
fashion

Unnamed: 0,review,neg_sentiment,stars,review_length
0,exactly needed,0,5,4
1,agree review opening small bent hook expensiv...,1,2,49
2,love going order pack work including losing ea...,0,4,50
3,tiny opening,1,2,4
4,okay,1,3,1
...,...,...,...,...
883631,absolutely love dress sexy comfortable split ...,0,5,51
883632,lbs tall wear large ordered large comfortable...,0,5,39
883633,big chest area,1,3,6
883634,clear needs lining,1,3,7


## 4.3 Vectorize and split data

In [3]:
#Vectorizing and splitting the data into train and test sets

tfidf = TfidfVectorizer(ngram_range=(1,2), min_df = 5, max_df=0.95)

X_train, X_test, y_train, y_test = train_test_split(fashion["review"].values, fashion["neg_sentiment"], test_size = .1, random_state=1)

#convert to 1d arrays
y_train, y_test = np.ravel(y_train), np.ravel(y_test)

#ensure our 1/0 values are integers for the pipeline model we will use
y_train, y_test = y_train.astype(int), y_test.astype(int)

#fit on ONLY the training data
tfidf.fit(X_train)

#transform both train and test data
X_train = tfidf.transform(X_train)
X_test = tfidf.transform(X_test)

## 4.4 Sampling Techniques for Imbalance

### 4.4.1 Sampling overview

In [4]:
# 30 % of our reviews are negative (what we're trying to predict)
fashion.neg_sentiment.value_counts(normalize = True)

0    0.695061
1    0.304939
Name: neg_sentiment, dtype: float64



In data sets with an imbalanced split between the classes we are trying to predict, there are few possible approaches to try to improve the target metric our model (classifier) is optimizing for.

1. Over-sampling - if we train with a higher proportion of the class we are trying to predict using resampling, we may be able to improve the result for our classifier.

2. Under-sampling - by the same logic, we can under-sample the majority classes we are NOT trying to predict.

3. Synthesize data: ADASYN and SMOTE both use Nearest Neigbhors algorithms to generate artificial points that are located "close" in the n-dimensional feature space of the target class to the actual data points. Conceptually, we can think of it as if we gathered MORE data, and are assuming it looks similar to the current data we have. It will be unlikely to have strong outliers due to the nature of the algorithm and will be more "clumped" together than if we gathered more "real" data.


### 4.4.2 Timing comparison of samplers

In [5]:
# Initialize samplers

In [6]:
smote = SMOTE() #default 5 neighbors

ada = ADASYN() #default 5 neighbors

ros = RandomOverSampler()

rus = RandomUnderSampler()

Using a loop with increasing counts for the rows used, we can combine with the timeit cell magic to get a sense of how long each of our sampling methods takes.

In [7]:
samplers = {"smote":smote, "ada":ada, "ros":ros, "rus":rus}

timing_dict = {sampler:{k:None for k in range(1,25)} for sampler in samplers.keys()}

for sampler in samplers.keys():
    for i in range(1,25):
    
        n_rows = i*5000
    
        time_var = %timeit -n1 -o samplers[sampler].fit_resample(X_train[0:n_rows,:], y_train[0:n_rows])

        timing_dict[sampler][i] = np.mean(time_var.all_runs)

dfs = []

for sampler_type in samplers.keys():

    dfs.append(pd.DataFrame([(k,v) for k,v in timing_dict[sampler_type].items()], columns=["iter", "time_in_seconds"]))

for sampler_type, df in zip(samplers.keys(), dfs):
    df["sampler_type"] = sampler_type
combined_df = pd.concat(dfs, axis=0)

33.7 ms ± 3.15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
143 ms ± 1.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
317 ms ± 1.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
574 ms ± 3.26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
886 ms ± 2.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.29 s ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.73 s ± 5.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.13 s ± 2.37 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.71 s ± 23.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.51 s ± 4.91 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.1 s ± 5.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.85 s ± 5.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
5.87 s ± 5.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
6.83 s ± 14.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
7.79 s ± 10.6 ms per loop (mean ± std. dev. of 7 

In [8]:
px.line(data_frame=combined_df, x="iter", y="time_in_seconds", color='sampler_type', title="Time Complexity of Sampling Methods for Class Imbalance")

As shown above, the synthetic data generation algorithms (SMOTE and ADASYN) which rely on K-Nearest Neighbors algorithms to generate the new data, run slower and slower on larger proportions of our data.

The KNN worst case time complexity is O(N * kD) (where N = number of data points/row, K is the number of neighbors used, and D is the number of features/columns). Because we have vectorized the reviews with TFIDF using unigrams and bigrams, the width of the data is roughly 2 million columns.

## 4.5 Bayesian Hyperparameter Optimization with Optuna

### 4.5.1 About Optuna:

A python library that allows you to write custom functions for Bayesian Optimization. When searching over a range of models and hyperparameters, the time complexity of grid search is exponential due to the brute-force nature of the algorithm. This quickly becomes untenable with more than a few different parameters.

### 4.5.2 Custom Optimization Function

#### Funtion optimization params:

##### 1. Sampling:
  * Random over-sampling
  * Random under-sampling
  * Synthetic Minority Over-sampling ('smote')
  * Adaptive Synthetic Over-sampling ('adasyn')
  * None

##### 2. Models:
  * Logistic Regression ('logreg')
    - C (regularization parameter)
  * XGBoost ('xgboost')
    - learning rate
    - max depth
    - number of estimators
  * LightGBM ('lgbm')
    - learning rate
    - max depth
    - number of estimators


In [9]:
def objective(trial, sub_sample_prop, model_choices):
    """
    params:
        trial: functions like a self parameter for the optimization function)
        sub_sample_prop: amount of the entire train set to sample from for current trial
        model_choices: allow dynamic selection between three model types   
    """

    #sampler

    
    sampler_type = trial.suggest_categorical('sampler', [None, 'ros', 'rus', 'smote', 'ada'])

    if sampler_type == 'ros':
        sampler = RandomOverSampler(random_state=0)
    
    elif sampler_type == 'smote':
        k_neighbors = trial.suggest_int('k_neighbors', 2,10)
        sampler = SMOTE(random_state=0, k_neighbors=k_neighbors)
    
    elif sampler_type == 'rus':
        sampler = RandomUnderSampler(random_state=0)
    
    elif sampler_type == 'ada':
        n_neighbors = trial.suggest_int('n_neighbors', 2,10)
        sampler = ADASYN(n_neighbors=n_neighbors)
    
    else:
        sampler = None


    model_type = trial.suggest_categorical('classifier', model_choices)

    if model_type == 'logreg':
        #optimize params
        C = trial.suggest_categorical('C', [1.0, 0.1, 0.01]) #note: models with larger values for C failed to converge
        
        #model
        model = LogisticRegression(solver = "lbfgs", n_jobs=-1, max_iter=1000, C=C)

    elif model_type == 'xgboost':
        #optimize params
        learning_rate = trial.suggest_categorical('learning_rate', [0.2, 0.1, 0.01, .001, .0001])
        max_depth = trial.suggest_int('max_depth', 3, 20)
        n_estimators = trial.suggest_categorical('n_estimators', [200,500,1000, 2000, 4000])

        #model
        model = xgb.XGBClassifier(n_estimators=n_estimators, max_depth=max_depth, learning_rate=learning_rate, n_jobs=-1, verbosity=0, use_label_encoder=False)

    elif model_type == "lgbm":
        #optimize params
        learning_rate = trial.suggest_categorical('learning_rate', [0.2, 0.1, 0.01, .001, .0001])
        max_depth = trial.suggest_int('max_depth', 3, 20)
        n_estimators = trial.suggest_categorical('n_estimators', [200,500,1000, 2000,4000])

        #model
        model = lgb.LGBMClassifier(max_depth = max_depth, n_estimators=n_estimators)
    
    pipeline = Pipeline([('sampler', sampler), ('model',model)])
    
    X_train_sample, _, y_train_sample, _ = train_test_split(X_train, y_train, train_size=sub_sample_prop, random_state=trial.number)

    print("N_rows: ", X_train_sample.shape[0])

    pipeline.fit(X_train_sample, y_train_sample)
    
    #using the original X_test from the top train_test_split above
    y_preds = pipeline.predict(X_test)

    return recall_score(y_preds, y_test)

### 4.5.3 Run 1

We'll test out a loop with a very small proportion of our data to train on, to see if the different sampling methods give a better recall score. Because they run slowly (smote and adasyn especially), it would be nice to skip them to save computation time if they are not proving useful. With a small subset of the data to train on (fewer members of the positive class), it should be an ideal test case to see their effectiveness.

In [10]:
# we have to use a lambda function as a wrapper in order to pass in parameters to the optimize function

func = lambda trial: objective(trial, .002, ["xgboost", "lgbm", "logreg"])

study = optuna.create_study(direction='maximize')

study.optimize(func, n_trials=30)

[32m[I 2022-08-09 00:21:11,343][0m A new study created in memory with name: no-name-aa814fad-e831-4393-b1c9-729565b956e3[0m


N_rows:  1572


[32m[I 2022-08-09 00:21:11,937][0m Trial 0 finished with value: 0.5460531432274789 and parameters: {'sampler': 'rus', 'classifier': 'lgbm', 'learning_rate': 0.001, 'max_depth': 4, 'n_estimators': 500}. Best is trial 0 with value: 0.5460531432274789.[0m


N_rows:  1572


[32m[I 2022-08-09 00:21:12,890][0m Trial 1 finished with value: 0.6867019019327958 and parameters: {'sampler': 'ros', 'classifier': 'lgbm', 'learning_rate': 0.001, 'max_depth': 4, 'n_estimators': 1000}. Best is trial 1 with value: 0.6867019019327958.[0m


N_rows:  1572


[32m[I 2022-08-09 00:21:13,996][0m Trial 2 finished with value: 0.875 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.1}. Best is trial 2 with value: 0.875.[0m


N_rows:  1572



Recall is ill-defined and being set to 0.0 due to no true samples. Use `zero_division` parameter to control this behavior.

[32m[I 2022-08-09 00:21:14,501][0m Trial 3 finished with value: 0.0 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 2 with value: 0.875.[0m


N_rows:  1572


[32m[I 2022-08-09 00:23:17,989][0m Trial 4 finished with value: 0.6556570067487099 and parameters: {'sampler': 'ada', 'n_neighbors': 5, 'classifier': 'xgboost', 'learning_rate': 0.0001, 'max_depth': 15, 'n_estimators': 4000}. Best is trial 2 with value: 0.875.[0m


N_rows:  1572


[32m[I 2022-08-09 00:23:52,327][0m Trial 5 finished with value: 0.5685018879477498 and parameters: {'sampler': 'rus', 'classifier': 'xgboost', 'learning_rate': 0.01, 'max_depth': 6, 'n_estimators': 4000}. Best is trial 2 with value: 0.875.[0m


N_rows:  1572


[32m[I 2022-08-09 00:23:57,112][0m Trial 6 finished with value: 0.5589719206487571 and parameters: {'sampler': 'ros', 'classifier': 'xgboost', 'learning_rate': 0.01, 'max_depth': 12, 'n_estimators': 200}. Best is trial 2 with value: 0.875.[0m


N_rows:  1572


[32m[I 2022-08-09 00:23:57,710][0m Trial 7 finished with value: 0.8883253588516746 and parameters: {'sampler': 'ada', 'n_neighbors': 2, 'classifier': 'logreg', 'C': 0.1}. Best is trial 7 with value: 0.8883253588516746.[0m


N_rows:  1572


[32m[I 2022-08-09 00:23:59,001][0m Trial 8 finished with value: 0.6726569185548983 and parameters: {'sampler': 'ada', 'n_neighbors': 10, 'classifier': 'lgbm', 'learning_rate': 0.001, 'max_depth': 18, 'n_estimators': 500}. Best is trial 7 with value: 0.8883253588516746.[0m


N_rows:  1572


[32m[I 2022-08-09 00:23:59,592][0m Trial 9 finished with value: 0.6955716270618832 and parameters: {'sampler': 'ros', 'classifier': 'logreg', 'C': 0.1}. Best is trial 7 with value: 0.8883253588516746.[0m


N_rows:  1572


[32m[I 2022-08-09 00:24:00,370][0m Trial 10 finished with value: 0.7691902104661892 and parameters: {'sampler': 'smote', 'k_neighbors': 5, 'classifier': 'logreg', 'C': 1.0}. Best is trial 7 with value: 0.8883253588516746.[0m


N_rows:  1572


[32m[I 2022-08-09 00:24:01,004][0m Trial 11 finished with value: 0.875 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.1}. Best is trial 7 with value: 0.8883253588516746.[0m


N_rows:  1572


[32m[I 2022-08-09 00:24:01,649][0m Trial 12 finished with value: 0.9349593495934959 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.1}. Best is trial 12 with value: 0.9349593495934959.[0m


N_rows:  1572


[32m[I 2022-08-09 00:24:02,304][0m Trial 13 finished with value: 0.9133574007220217 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.1}. Best is trial 12 with value: 0.9349593495934959.[0m


N_rows:  1572



Recall is ill-defined and being set to 0.0 due to no true samples. Use `zero_division` parameter to control this behavior.

[32m[I 2022-08-09 00:24:02,926][0m Trial 14 finished with value: 0.0 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.1}. Best is trial 12 with value: 0.9349593495934959.[0m


N_rows:  1572


[32m[I 2022-08-09 00:24:03,865][0m Trial 15 finished with value: 0.8921847487471217 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 1.0}. Best is trial 12 with value: 0.9349593495934959.[0m


N_rows:  1572



Recall is ill-defined and being set to 0.0 due to no true samples. Use `zero_division` parameter to control this behavior.

[32m[I 2022-08-09 00:24:04,329][0m Trial 16 finished with value: 0.0 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 12 with value: 0.9349593495934959.[0m


N_rows:  1572


[32m[I 2022-08-09 00:24:04,962][0m Trial 17 finished with value: 0.9139784946236559 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.1}. Best is trial 12 with value: 0.9349593495934959.[0m


N_rows:  1572


[32m[I 2022-08-09 00:24:33,154][0m Trial 18 finished with value: 0.7236810290633009 and parameters: {'sampler': None, 'classifier': 'xgboost', 'learning_rate': 0.2, 'max_depth': 20, 'n_estimators': 2000}. Best is trial 12 with value: 0.9349593495934959.[0m


N_rows:  1572


[32m[I 2022-08-09 00:24:34,695][0m Trial 19 finished with value: 0.6758751411517987 and parameters: {'sampler': None, 'classifier': 'lgbm', 'learning_rate': 0.1, 'max_depth': 9, 'n_estimators': 1000}. Best is trial 12 with value: 0.9349593495934959.[0m


N_rows:  1572


[32m[I 2022-08-09 00:24:35,403][0m Trial 20 finished with value: 0.9130434782608695 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.1}. Best is trial 12 with value: 0.9349593495934959.[0m


N_rows:  1572


[32m[I 2022-08-09 00:24:36,042][0m Trial 21 finished with value: 0.9166666666666666 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.1}. Best is trial 12 with value: 0.9349593495934959.[0m


N_rows:  1572


[32m[I 2022-08-09 00:24:36,698][0m Trial 22 finished with value: 0.9136690647482014 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.1}. Best is trial 12 with value: 0.9349593495934959.[0m


N_rows:  1572


[32m[I 2022-08-09 00:24:37,339][0m Trial 23 finished with value: 1.0 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.1}. Best is trial 23 with value: 1.0.[0m


N_rows:  1572


[32m[I 2022-08-09 00:24:37,823][0m Trial 24 finished with value: 0.875 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.1}. Best is trial 23 with value: 1.0.[0m


N_rows:  1572



Recall is ill-defined and being set to 0.0 due to no true samples. Use `zero_division` parameter to control this behavior.

[32m[I 2022-08-09 00:24:38,142][0m Trial 25 finished with value: 0.0 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 23 with value: 1.0.[0m


N_rows:  1572


[32m[I 2022-08-09 00:24:38,952][0m Trial 26 finished with value: 0.8993517570794951 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 1.0}. Best is trial 23 with value: 1.0.[0m


N_rows:  1572


[32m[I 2022-08-09 00:24:39,462][0m Trial 27 finished with value: 0.7028662279733251 and parameters: {'sampler': None, 'classifier': 'lgbm', 'learning_rate': 0.1, 'max_depth': 10, 'n_estimators': 200}. Best is trial 23 with value: 1.0.[0m


N_rows:  1572


[32m[I 2022-08-09 00:25:10,434][0m Trial 28 finished with value: 0.7176679895990147 and parameters: {'sampler': None, 'classifier': 'xgboost', 'learning_rate': 0.2, 'max_depth': 16, 'n_estimators': 2000}. Best is trial 23 with value: 1.0.[0m


N_rows:  1572


[32m[I 2022-08-09 00:25:10,944][0m Trial 29 finished with value: 0.875 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.1}. Best is trial 23 with value: 1.0.[0m


#### Run 1 Results:

Training on a small portion of the data (~1500 rows), our best model was consistently the Logistic Regression and none of the sampling methods seemed to improve the recall of our model.

### 4.5.4 Run 2
With a more reasonable training size, we'll run a longer study and compare how well the different models and sampling methods work.

In [11]:
# we have to use a lambda function as a wrapper in order to pass in parameters to the optimize function

func = lambda trial: objective(trial, .1, ["xgboost", "lgbm", "logreg"])

study = optuna.create_study(direction='maximize')

study.optimize(func, n_trials=200)

[32m[I 2022-08-09 00:25:10,986][0m A new study created in memory with name: no-name-d5ca5c8c-f9bf-4f72-8136-586d0d14ddd0[0m


N_rows:  78601


[32m[I 2022-08-09 00:26:22,830][0m Trial 0 finished with value: 0.8257125772373929 and parameters: {'sampler': None, 'classifier': 'lgbm', 'learning_rate': 0.1, 'max_depth': 12, 'n_estimators': 4000}. Best is trial 0 with value: 0.8257125772373929.[0m


N_rows:  78601


[32m[I 2022-08-09 00:26:32,640][0m Trial 1 finished with value: 0.7119376087496893 and parameters: {'sampler': 'smote', 'k_neighbors': 8, 'classifier': 'logreg', 'C': 0.1}. Best is trial 0 with value: 0.8257125772373929.[0m


N_rows:  78601


[32m[I 2022-08-09 00:27:47,783][0m Trial 2 finished with value: 0.7704555659171761 and parameters: {'sampler': 'ada', 'n_neighbors': 4, 'classifier': 'lgbm', 'learning_rate': 0.0001, 'max_depth': 5, 'n_estimators': 2000}. Best is trial 0 with value: 0.8257125772373929.[0m


N_rows:  78601


[32m[I 2022-08-09 00:29:41,737][0m Trial 3 finished with value: 0.7930411039775662 and parameters: {'sampler': 'smote', 'k_neighbors': 5, 'classifier': 'lgbm', 'learning_rate': 0.2, 'max_depth': 12, 'n_estimators': 4000}. Best is trial 0 with value: 0.8257125772373929.[0m


N_rows:  78601


[32m[I 2022-08-09 00:30:26,476][0m Trial 4 finished with value: 0.7317670707959043 and parameters: {'sampler': 'rus', 'classifier': 'xgboost', 'learning_rate': 0.2, 'max_depth': 10, 'n_estimators': 1000}. Best is trial 0 with value: 0.8257125772373929.[0m


N_rows:  78601


[32m[I 2022-08-09 00:30:27,651][0m Trial 5 finished with value: 0.6586107971833435 and parameters: {'sampler': 'rus', 'classifier': 'logreg', 'C': 0.01}. Best is trial 0 with value: 0.8257125772373929.[0m


N_rows:  78601


[32m[I 2022-08-09 00:30:37,159][0m Trial 6 finished with value: 0.7102276251317993 and parameters: {'sampler': 'smote', 'k_neighbors': 10, 'classifier': 'logreg', 'C': 0.1}. Best is trial 0 with value: 0.8257125772373929.[0m


N_rows:  78601


[32m[I 2022-08-09 00:31:11,731][0m Trial 7 finished with value: 0.5404836521820479 and parameters: {'sampler': 'rus', 'classifier': 'xgboost', 'learning_rate': 0.0001, 'max_depth': 18, 'n_estimators': 200}. Best is trial 0 with value: 0.8257125772373929.[0m


N_rows:  78601


[32m[I 2022-08-09 00:31:48,874][0m Trial 8 finished with value: 0.7451646628332462 and parameters: {'sampler': 'ada', 'n_neighbors': 3, 'classifier': 'logreg', 'C': 1.0}. Best is trial 0 with value: 0.8257125772373929.[0m


N_rows:  78601


[32m[I 2022-08-09 00:31:49,971][0m Trial 9 finished with value: 0.9406657018813314 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 9 with value: 0.9406657018813314.[0m


N_rows:  78601


[32m[I 2022-08-09 00:31:51,045][0m Trial 10 finished with value: 0.9415292353823088 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 10 with value: 0.9415292353823088.[0m


N_rows:  78601


[32m[I 2022-08-09 00:31:52,248][0m Trial 11 finished with value: 0.6774732819869843 and parameters: {'sampler': 'ros', 'classifier': 'logreg', 'C': 0.01}. Best is trial 10 with value: 0.9415292353823088.[0m


N_rows:  78601


[32m[I 2022-08-09 00:31:53,313][0m Trial 12 finished with value: 0.9408881199538639 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 10 with value: 0.9415292353823088.[0m


N_rows:  78601


[32m[I 2022-08-09 00:31:54,344][0m Trial 13 finished with value: 0.9443162146566647 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:31:55,426][0m Trial 14 finished with value: 0.9437519002736394 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:33:37,126][0m Trial 15 finished with value: 0.8196337601542063 and parameters: {'sampler': None, 'classifier': 'xgboost', 'learning_rate': 0.01, 'max_depth': 20, 'n_estimators': 500}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:33:40,025][0m Trial 16 finished with value: 0.853753932488734 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 1.0}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:33:41,076][0m Trial 17 finished with value: 0.9417052574230454 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:33:52,748][0m Trial 18 finished with value: 0.8350316006707081 and parameters: {'sampler': None, 'classifier': 'lgbm', 'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 1000}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:37:10,147][0m Trial 19 finished with value: 0.8331457060147958 and parameters: {'sampler': None, 'classifier': 'xgboost', 'learning_rate': 0.1, 'max_depth': 15, 'n_estimators': 2000}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:37:11,723][0m Trial 20 finished with value: 0.9407535231521427 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:37:13,041][0m Trial 21 finished with value: 0.9415375621163402 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:37:14,099][0m Trial 22 finished with value: 0.9420908259676928 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:37:15,161][0m Trial 23 finished with value: 0.9406923950056754 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:37:16,207][0m Trial 24 finished with value: 0.942916296953564 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:37:19,367][0m Trial 25 finished with value: 0.8540926771520378 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 1.0}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:37:21,066][0m Trial 26 finished with value: 0.8672033265806589 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.1}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:37:26,084][0m Trial 27 finished with value: 0.8248511704937111 and parameters: {'sampler': None, 'classifier': 'lgbm', 'learning_rate': 0.01, 'max_depth': 7, 'n_estimators': 200}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:37:59,879][0m Trial 28 finished with value: 0.7248169164362966 and parameters: {'sampler': None, 'classifier': 'xgboost', 'learning_rate': 0.001, 'max_depth': 8, 'n_estimators': 500}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:38:00,970][0m Trial 29 finished with value: 0.94344092389695 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:39:18,252][0m Trial 30 finished with value: 0.8225968292193457 and parameters: {'sampler': None, 'classifier': 'lgbm', 'learning_rate': 0.0001, 'max_depth': 16, 'n_estimators': 4000}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:39:19,319][0m Trial 31 finished with value: 0.9402943626770341 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:39:20,390][0m Trial 32 finished with value: 0.9405131096701438 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:39:21,494][0m Trial 33 finished with value: 0.9408497639544571 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:39:22,537][0m Trial 34 finished with value: 0.9415024630541872 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:39:23,572][0m Trial 35 finished with value: 0.9436319317489336 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:39:30,225][0m Trial 36 finished with value: 0.8267213040933039 and parameters: {'sampler': None, 'classifier': 'lgbm', 'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 500}. Best is trial 13 with value: 0.9443162146566647.[0m


N_rows:  78601


[32m[I 2022-08-09 00:39:31,394][0m Trial 37 finished with value: 0.9454374412041392 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:39:32,690][0m Trial 38 finished with value: 0.8672359836630775 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.1}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:42:35,270][0m Trial 39 finished with value: 0.8357137060780654 and parameters: {'sampler': None, 'classifier': 'xgboost', 'learning_rate': 0.1, 'max_depth': 14, 'n_estimators': 2000}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:42:36,723][0m Trial 40 finished with value: 0.9452054794520548 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:42:37,983][0m Trial 41 finished with value: 0.9420330439652759 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:42:39,300][0m Trial 42 finished with value: 0.9424985405720957 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:42:40,622][0m Trial 43 finished with value: 0.9434017595307918 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:42:41,911][0m Trial 44 finished with value: 0.9418364334219073 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:42:45,521][0m Trial 45 finished with value: 0.8540531335149864 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 1.0}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:42:47,198][0m Trial 46 finished with value: 0.8652998353428587 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.1}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:42:48,476][0m Trial 47 finished with value: 0.9413624604771486 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:42:54,784][0m Trial 48 finished with value: 0.8337718748694817 and parameters: {'sampler': None, 'classifier': 'lgbm', 'learning_rate': 0.2, 'max_depth': 20, 'n_estimators': 200}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:44:05,082][0m Trial 49 finished with value: 0.820997154983443 and parameters: {'sampler': None, 'classifier': 'xgboost', 'learning_rate': 0.01, 'max_depth': 9, 'n_estimators': 1000}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:44:06,185][0m Trial 50 finished with value: 0.9403069926094372 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:44:07,260][0m Trial 51 finished with value: 0.9422795194843246 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:44:08,325][0m Trial 52 finished with value: 0.9413118527042578 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:44:09,364][0m Trial 53 finished with value: 0.9424985405720957 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:44:10,513][0m Trial 54 finished with value: 0.9408181026979983 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:44:11,560][0m Trial 55 finished with value: 0.9415527769700495 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:44:15,537][0m Trial 56 finished with value: 0.8552990287953655 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 1.0}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:44:16,817][0m Trial 57 finished with value: 0.8649472080684982 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.1}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:44:18,104][0m Trial 58 finished with value: 0.9416058394160584 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:44:40,452][0m Trial 59 finished with value: 0.7032568223594288 and parameters: {'sampler': None, 'classifier': 'xgboost', 'learning_rate': 0.001, 'max_depth': 6, 'n_estimators': 500}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:44:41,774][0m Trial 60 finished with value: 0.943266646760227 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:44:43,085][0m Trial 61 finished with value: 0.9448961156278229 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:44:44,367][0m Trial 62 finished with value: 0.9432054713053821 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:44:45,729][0m Trial 63 finished with value: 0.9411764705882353 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:44:47,116][0m Trial 64 finished with value: 0.9422483880011214 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:44:48,398][0m Trial 65 finished with value: 0.9403953968722337 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:44:54,745][0m Trial 66 finished with value: 0.8323093755260057 and parameters: {'sampler': None, 'classifier': 'lgbm', 'learning_rate': 0.01, 'max_depth': 17, 'n_estimators': 200}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:44:56,060][0m Trial 67 finished with value: 0.9399031614924523 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:44:57,358][0m Trial 68 finished with value: 0.9421440726035167 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:45:00,683][0m Trial 69 finished with value: 0.8538066324147594 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 1.0}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:45:01,946][0m Trial 70 finished with value: 0.9419460343417825 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:45:03,240][0m Trial 71 finished with value: 0.9408969408969409 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:45:04,544][0m Trial 72 finished with value: 0.9421860885275519 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:45:05,810][0m Trial 73 finished with value: 0.9413333333333334 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:45:07,076][0m Trial 74 finished with value: 0.9398907103825137 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:45:08,383][0m Trial 75 finished with value: 0.9394796380090498 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:46:39,169][0m Trial 76 finished with value: 0.8296168088874577 and parameters: {'sampler': None, 'classifier': 'xgboost', 'learning_rate': 0.2, 'max_depth': 14, 'n_estimators': 1000}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:47:14,187][0m Trial 77 finished with value: 0.8358872761088424 and parameters: {'sampler': None, 'classifier': 'lgbm', 'learning_rate': 0.1, 'max_depth': 11, 'n_estimators': 2000}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:47:15,590][0m Trial 78 finished with value: 0.9445887445887445 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:47:16,877][0m Trial 79 finished with value: 0.9442896935933147 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:47:18,634][0m Trial 80 finished with value: 0.8692530275425999 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.1}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:47:20,031][0m Trial 81 finished with value: 0.9415919587068732 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:47:21,275][0m Trial 82 finished with value: 0.9408647140864714 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:47:22,571][0m Trial 83 finished with value: 0.9416101445001475 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:47:23,898][0m Trial 84 finished with value: 0.9425525614450696 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:47:25,257][0m Trial 85 finished with value: 0.672664851426542 and parameters: {'sampler': 'ros', 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:47:26,599][0m Trial 86 finished with value: 0.9409158050221565 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:47:27,831][0m Trial 87 finished with value: 0.9424460431654677 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 00:47:31,892][0m Trial 88 finished with value: 0.8512874630645842 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 1.0}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:02:40,144][0m Trial 89 finished with value: 0.7698493931861383 and parameters: {'sampler': None, 'classifier': 'xgboost', 'learning_rate': 0.0001, 'max_depth': 18, 'n_estimators': 4000}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:02:41,740][0m Trial 90 finished with value: 0.9409616555082166 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:02:42,789][0m Trial 91 finished with value: 0.9440344403444034 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:02:43,866][0m Trial 92 finished with value: 0.9426789426789427 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:02:45,024][0m Trial 93 finished with value: 0.9405236198064884 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:02:46,106][0m Trial 94 finished with value: 0.9419410745233969 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:02:47,167][0m Trial 95 finished with value: 0.9413823272090989 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:02:54,629][0m Trial 96 finished with value: 0.8265705100356243 and parameters: {'sampler': None, 'classifier': 'lgbm', 'learning_rate': 0.2, 'max_depth': 4, 'n_estimators': 500}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:02:55,726][0m Trial 97 finished with value: 0.9443793911007026 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:02:57,218][0m Trial 98 finished with value: 0.8666809904925605 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.1}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:02:58,258][0m Trial 99 finished with value: 0.9427338129496403 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:02:59,290][0m Trial 100 finished with value: 0.942652329749104 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:03:00,359][0m Trial 101 finished with value: 0.941571720712825 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:03:01,426][0m Trial 102 finished with value: 0.9387524696584815 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:03:02,579][0m Trial 103 finished with value: 0.9418934240362812 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:03:03,624][0m Trial 104 finished with value: 0.9397767332549941 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:03:04,700][0m Trial 105 finished with value: 0.9417818288738606 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:03:05,785][0m Trial 106 finished with value: 0.9404379562043795 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:03:07,072][0m Trial 107 finished with value: 0.9412800939518496 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:03:10,399][0m Trial 108 finished with value: 0.8518784297171802 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 1.0}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:09,594][0m Trial 109 finished with value: 0.7555544632324027 and parameters: {'sampler': None, 'classifier': 'xgboost', 'learning_rate': 0.0001, 'max_depth': 13, 'n_estimators': 4000}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:11,243][0m Trial 110 finished with value: 0.9415049970605526 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:12,344][0m Trial 111 finished with value: 0.9448484848484848 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:13,485][0m Trial 112 finished with value: 0.942158273381295 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:14,536][0m Trial 113 finished with value: 0.9442253521126761 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:15,656][0m Trial 114 finished with value: 0.9424803591470258 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:16,701][0m Trial 115 finished with value: 0.9399408284023668 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:17,771][0m Trial 116 finished with value: 0.9414298018949182 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:23,013][0m Trial 117 finished with value: 0.8266619952702111 and parameters: {'sampler': None, 'classifier': 'lgbm', 'learning_rate': 0.01, 'max_depth': 8, 'n_estimators': 200}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:24,138][0m Trial 118 finished with value: 0.9431181844974948 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:25,634][0m Trial 119 finished with value: 0.8683184706573397 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.1}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:26,700][0m Trial 120 finished with value: 0.940809968847352 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:27,752][0m Trial 121 finished with value: 0.9399373754625676 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:28,820][0m Trial 122 finished with value: 0.944074969770254 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:29,876][0m Trial 123 finished with value: 0.9378792256573245 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:30,971][0m Trial 124 finished with value: 0.9414985590778098 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:32,064][0m Trial 125 finished with value: 0.9411420204978038 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:33,111][0m Trial 126 finished with value: 0.9433526011560693 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:34,423][0m Trial 127 finished with value: 0.9424144986845951 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:35,717][0m Trial 128 finished with value: 0.9434860202260559 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:37,018][0m Trial 129 finished with value: 0.942684766214178 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:38,319][0m Trial 130 finished with value: 0.9420883299210295 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:39,638][0m Trial 131 finished with value: 0.9424269264836138 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:40,952][0m Trial 132 finished with value: 0.9429347826086957 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:42,265][0m Trial 133 finished with value: 0.9422066549912435 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:43,570][0m Trial 134 finished with value: 0.9411764705882353 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:47,339][0m Trial 135 finished with value: 0.8525074992606363 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 1.0}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:13:48,640][0m Trial 136 finished with value: 0.9419668381932533 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:15:09,763][0m Trial 137 finished with value: 0.8421791167973587 and parameters: {'sampler': None, 'classifier': 'xgboost', 'learning_rate': 0.1, 'max_depth': 6, 'n_estimators': 2000}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:15:11,090][0m Trial 138 finished with value: 0.9401660463784712 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:15:34,538][0m Trial 139 finished with value: 0.83836784409257 and parameters: {'sampler': None, 'classifier': 'lgbm', 'learning_rate': 0.001, 'max_depth': 19, 'n_estimators': 1000}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:15:35,850][0m Trial 140 finished with value: 0.9448336252189142 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:15:37,170][0m Trial 141 finished with value: 0.9402985074626866 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:15:38,473][0m Trial 142 finished with value: 0.9431719338938823 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:15:39,766][0m Trial 143 finished with value: 0.9389644200173561 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:15:41,042][0m Trial 144 finished with value: 0.940709219858156 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:15:42,411][0m Trial 145 finished with value: 0.9422799422799423 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:15:43,708][0m Trial 146 finished with value: 0.9412591343451377 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:15:45,353][0m Trial 147 finished with value: 0.8660056206585715 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.1}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:15:46,638][0m Trial 148 finished with value: 0.9450582263362197 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:15:47,919][0m Trial 149 finished with value: 0.9443099273607748 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:15:49,226][0m Trial 150 finished with value: 0.9432773109243697 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:15:50,616][0m Trial 151 finished with value: 0.9432084309133489 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:15:51,991][0m Trial 152 finished with value: 0.9427012278308322 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:15:53,379][0m Trial 153 finished with value: 0.9399425287356322 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:15:54,682][0m Trial 154 finished with value: 0.9425253126861227 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:15:55,951][0m Trial 155 finished with value: 0.9403747870528109 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:15:57,238][0m Trial 156 finished with value: 0.9447674418604651 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:15:58,528][0m Trial 157 finished with value: 0.9421536636013053 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:15:59,794][0m Trial 158 finished with value: 0.9418604651162791 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:16:01,082][0m Trial 159 finished with value: 0.9403159742539496 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:16:04,605][0m Trial 160 finished with value: 0.8560792989024079 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 1.0}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:16:05,898][0m Trial 161 finished with value: 0.9432709716354858 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:16:07,224][0m Trial 162 finished with value: 0.94359410430839 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:16:08,508][0m Trial 163 finished with value: 0.941312518068806 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:16:09,791][0m Trial 164 finished with value: 0.9407254497198466 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:16:24,273][0m Trial 165 finished with value: 0.8320770336804496 and parameters: {'sampler': None, 'classifier': 'xgboost', 'learning_rate': 0.1, 'max_depth': 10, 'n_estimators': 200}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:16:25,608][0m Trial 166 finished with value: 0.940957294353683 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:16:26,913][0m Trial 167 finished with value: 0.9427008350129571 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:16:28,309][0m Trial 168 finished with value: 0.9440559440559441 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:16:29,629][0m Trial 169 finished with value: 0.9410218978102189 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:16:30,931][0m Trial 170 finished with value: 0.9418298743057586 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:16:32,171][0m Trial 171 finished with value: 0.9426111908177905 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:16:33,558][0m Trial 172 finished with value: 0.942174913693901 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:16:34,865][0m Trial 173 finished with value: 0.9421369235176167 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:17:46,323][0m Trial 174 finished with value: 0.8216494845360824 and parameters: {'sampler': None, 'classifier': 'lgbm', 'learning_rate': 0.01, 'max_depth': 12, 'n_estimators': 4000}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:17:47,640][0m Trial 175 finished with value: 0.942226255293406 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:17:48,920][0m Trial 176 finished with value: 0.9427023945267959 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:17:50,210][0m Trial 177 finished with value: 0.9413489736070382 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:17:51,771][0m Trial 178 finished with value: 0.8636603654028937 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.1}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:17:53,030][0m Trial 179 finished with value: 0.9439977024698449 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:17:54,269][0m Trial 180 finished with value: 0.9402781565312245 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:17:55,556][0m Trial 181 finished with value: 0.942765833817548 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:17:56,894][0m Trial 182 finished with value: 0.9414831981460023 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:17:58,192][0m Trial 183 finished with value: 0.9429824561403509 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:17:59,485][0m Trial 184 finished with value: 0.942080378250591 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:18:00,799][0m Trial 185 finished with value: 0.9437324163801187 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:18:02,070][0m Trial 186 finished with value: 0.9425964085958198 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:18:03,491][0m Trial 187 finished with value: 0.9404891304347827 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:18:04,806][0m Trial 188 finished with value: 0.9411592505854801 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:18:06,111][0m Trial 189 finished with value: 0.9392746696654484 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:18:07,375][0m Trial 190 finished with value: 0.9435011709601874 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:18:08,630][0m Trial 191 finished with value: 0.9389889599070308 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:18:09,979][0m Trial 192 finished with value: 0.9430097951914514 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:18:11,265][0m Trial 193 finished with value: 0.940247055443838 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:18:12,570][0m Trial 194 finished with value: 0.9411595038938564 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:18:13,874][0m Trial 195 finished with value: 0.9415121255349501 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:18:17,169][0m Trial 196 finished with value: 0.8521915550107781 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 1.0}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:19:47,932][0m Trial 197 finished with value: 0.764406286379511 and parameters: {'sampler': None, 'classifier': 'xgboost', 'learning_rate': 0.001, 'max_depth': 16, 'n_estimators': 500}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:19:49,367][0m Trial 198 finished with value: 0.9444926279271466 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


N_rows:  78601


[32m[I 2022-08-09 01:19:50,635][0m Trial 199 finished with value: 0.9417249417249417 and parameters: {'sampler': None, 'classifier': 'logreg', 'C': 0.01}. Best is trial 37 with value: 0.9454374412041392.[0m


### 4.5.5 Run 3

For the final run, we'll use a larger training size, but will skip the sampling methods completely.

In [12]:
def objective(trial, sub_sample_prop, model_choices):
    """
    params:
        trial: functions like a self parameter for the optimization function)
        sub_sample_prop: amount of the entire train set to sample from for current trial
        model_choices: allow dynamic selection between three model types   
    """

    #sampler

    sampler = None


    model_type = trial.suggest_categorical('classifier', model_choices)

    if model_type == 'logreg':
        #optimize params
        C = trial.suggest_categorical('C', [1.0, 0.1, 0.01]) #note: models with larger values for C failed to converge
        
        #model
        model = LogisticRegression(solver = "lbfgs", n_jobs=-1, max_iter=1000, C=C)

    elif model_type == 'xgboost':
        #optimize params
        learning_rate = trial.suggest_categorical('learning_rate', [0.2, 0.1, 0.01, .001, .0001])
        max_depth = trial.suggest_int('max_depth', 3, 20)
        n_estimators = trial.suggest_categorical('n_estimators', [200,500,1000, 2000, 4000])

        #model
        model = xgb.XGBClassifier(n_estimators=n_estimators, max_depth=max_depth, learning_rate=learning_rate, n_jobs=-1, verbosity=0, use_label_encoder=False)

    elif model_type == "lgbm":
        #optimize params
        learning_rate = trial.suggest_categorical('learning_rate', [0.2, 0.1, 0.01, .001, .0001])
        max_depth = trial.suggest_int('max_depth', 3, 20)
        n_estimators = trial.suggest_categorical('n_estimators', [200,500,1000, 2000,4000])

        #model
        model = lgb.LGBMClassifier(max_depth = max_depth, n_estimators=n_estimators)
    
    pipeline = Pipeline([('sampler', sampler), ('model',model)])
    
    X_train_sample, _, y_train_sample, _ = train_test_split(X_train, y_train, train_size=sub_sample_prop, random_state=trial.number)

    print("N_rows: ", X_train_sample.shape[0])

    pipeline.fit(X_train_sample, y_train_sample)
    
    #using the original X_test from the top train_test_split above
    y_preds = pipeline.predict(X_test)

    return recall_score(y_preds, y_test)

In [13]:
# We've seen above that the under/over sampling is not effective, even on small training sets. Here we will skip it, but use a larger proportion of our training data to tune hyperparameters

func = lambda trial: objective(trial, .2, ["xgboost", "lgbm", "logreg"])

study = optuna.create_study(direction='maximize')

study.optimize(func, n_trials=200)

[32m[I 2022-08-09 01:19:50,708][0m A new study created in memory with name: no-name-ed45df43-167f-474e-aa48-9ede165c2302[0m


N_rows:  157203


[32m[I 2022-08-09 01:19:54,369][0m Trial 0 finished with value: 0.8552049298978452 and parameters: {'classifier': 'logreg', 'C': 1.0}. Best is trial 0 with value: 0.8552049298978452.[0m


N_rows:  157203


[32m[I 2022-08-09 01:22:41,388][0m Trial 1 finished with value: 0.7556986866602928 and parameters: {'classifier': 'xgboost', 'learning_rate': 0.001, 'max_depth': 10, 'n_estimators': 1000}. Best is trial 0 with value: 0.8552049298978452.[0m


N_rows:  157203


[32m[I 2022-08-09 01:23:10,447][0m Trial 2 finished with value: 0.7724155320221886 and parameters: {'classifier': 'xgboost', 'learning_rate': 0.01, 'max_depth': 9, 'n_estimators': 200}. Best is trial 0 with value: 0.8552049298978452.[0m


N_rows:  157203


[32m[I 2022-08-09 01:24:17,148][0m Trial 3 finished with value: 0.8443885825494648 and parameters: {'classifier': 'lgbm', 'learning_rate': 0.01, 'max_depth': 7, 'n_estimators': 2000}. Best is trial 0 with value: 0.8552049298978452.[0m


N_rows:  157203


[32m[I 2022-08-09 01:27:14,984][0m Trial 4 finished with value: 0.8354888773926539 and parameters: {'classifier': 'lgbm', 'learning_rate': 0.01, 'max_depth': 20, 'n_estimators': 4000}. Best is trial 0 with value: 0.8552049298978452.[0m


N_rows:  157203


[32m[I 2022-08-09 01:27:16,970][0m Trial 5 finished with value: 0.9134615384615384 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 5 with value: 0.9134615384615384.[0m


N_rows:  157203


[32m[I 2022-08-09 01:28:07,297][0m Trial 6 finished with value: 0.8443835395590665 and parameters: {'classifier': 'lgbm', 'learning_rate': 0.0001, 'max_depth': 18, 'n_estimators': 1000}. Best is trial 5 with value: 0.9134615384615384.[0m


N_rows:  157203


[32m[I 2022-08-09 01:31:26,477][0m Trial 7 finished with value: 0.8409457714654616 and parameters: {'classifier': 'xgboost', 'learning_rate': 0.2, 'max_depth': 16, 'n_estimators': 1000}. Best is trial 5 with value: 0.9134615384615384.[0m


N_rows:  157203


[32m[I 2022-08-09 01:34:30,162][0m Trial 8 finished with value: 0.8311363636363637 and parameters: {'classifier': 'xgboost', 'learning_rate': 0.01, 'max_depth': 12, 'n_estimators': 1000}. Best is trial 5 with value: 0.9134615384615384.[0m


N_rows:  157203


[32m[I 2022-08-09 01:34:35,356][0m Trial 9 finished with value: 0.85627253512371 and parameters: {'classifier': 'logreg', 'C': 1.0}. Best is trial 5 with value: 0.9134615384615384.[0m


N_rows:  157203


[32m[I 2022-08-09 01:34:36,823][0m Trial 10 finished with value: 0.9143839541547278 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 10 with value: 0.9143839541547278.[0m


N_rows:  157203


[32m[I 2022-08-09 01:34:38,328][0m Trial 11 finished with value: 0.9137241458119072 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 10 with value: 0.9143839541547278.[0m


N_rows:  157203


[32m[I 2022-08-09 01:34:39,814][0m Trial 12 finished with value: 0.9135420249971333 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 10 with value: 0.9143839541547278.[0m


N_rows:  157203


[32m[I 2022-08-09 01:34:41,290][0m Trial 13 finished with value: 0.9143693034769889 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 10 with value: 0.9143839541547278.[0m


N_rows:  157203


[32m[I 2022-08-09 01:34:43,177][0m Trial 14 finished with value: 0.8611310292078971 and parameters: {'classifier': 'logreg', 'C': 0.1}. Best is trial 10 with value: 0.9143839541547278.[0m


N_rows:  157203


[32m[I 2022-08-09 01:34:44,629][0m Trial 15 finished with value: 0.9129550079131811 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 10 with value: 0.9143839541547278.[0m


N_rows:  157203


[32m[I 2022-08-09 01:34:46,085][0m Trial 16 finished with value: 0.9135159610595427 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 10 with value: 0.9143839541547278.[0m


N_rows:  157203


[32m[I 2022-08-09 01:34:48,721][0m Trial 17 finished with value: 0.8603357268533929 and parameters: {'classifier': 'logreg', 'C': 0.1}. Best is trial 10 with value: 0.9143839541547278.[0m


N_rows:  157203


[32m[I 2022-08-09 01:34:50,221][0m Trial 18 finished with value: 0.9148573377287712 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 18 with value: 0.9148573377287712.[0m


N_rows:  157203


[32m[I 2022-08-09 01:35:09,258][0m Trial 19 finished with value: 0.8328956582633054 and parameters: {'classifier': 'lgbm', 'learning_rate': 0.1, 'max_depth': 4, 'n_estimators': 500}. Best is trial 18 with value: 0.9148573377287712.[0m


N_rows:  157203


[32m[I 2022-08-09 01:35:10,771][0m Trial 20 finished with value: 0.9124575311438279 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 18 with value: 0.9148573377287712.[0m


N_rows:  157203


[32m[I 2022-08-09 01:35:12,283][0m Trial 21 finished with value: 0.9122550123901779 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 18 with value: 0.9148573377287712.[0m


N_rows:  157203


[32m[I 2022-08-09 01:35:13,763][0m Trial 22 finished with value: 0.914189892448248 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 18 with value: 0.9148573377287712.[0m


N_rows:  157203


[32m[I 2022-08-09 01:35:15,286][0m Trial 23 finished with value: 0.9120617269942131 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 18 with value: 0.9148573377287712.[0m


N_rows:  157203


[32m[I 2022-08-09 01:35:16,805][0m Trial 24 finished with value: 0.9141471119133574 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 18 with value: 0.9148573377287712.[0m


N_rows:  157203


[32m[I 2022-08-09 01:35:19,263][0m Trial 25 finished with value: 0.8589453860640301 and parameters: {'classifier': 'logreg', 'C': 0.1}. Best is trial 18 with value: 0.9148573377287712.[0m


N_rows:  157203


[32m[I 2022-08-09 01:35:24,936][0m Trial 26 finished with value: 0.8544154717137071 and parameters: {'classifier': 'logreg', 'C': 1.0}. Best is trial 18 with value: 0.9148573377287712.[0m


N_rows:  157203


[32m[I 2022-08-09 01:36:53,409][0m Trial 27 finished with value: 0.8402908259129156 and parameters: {'classifier': 'xgboost', 'learning_rate': 0.2, 'max_depth': 14, 'n_estimators': 500}. Best is trial 18 with value: 0.9148573377287712.[0m


N_rows:  157203


[32m[I 2022-08-09 01:37:03,742][0m Trial 28 finished with value: 0.8071917157514725 and parameters: {'classifier': 'lgbm', 'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 200}. Best is trial 18 with value: 0.9148573377287712.[0m


N_rows:  157203


[32m[I 2022-08-09 01:37:05,477][0m Trial 29 finished with value: 0.9136059136059136 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 18 with value: 0.9148573377287712.[0m


N_rows:  157203


[32m[I 2022-08-09 01:37:07,179][0m Trial 30 finished with value: 0.9152992835682922 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 30 with value: 0.9152992835682922.[0m


N_rows:  157203


[32m[I 2022-08-09 01:37:08,892][0m Trial 31 finished with value: 0.9060724233983287 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 30 with value: 0.9152992835682922.[0m


N_rows:  157203


[32m[I 2022-08-09 01:37:10,571][0m Trial 32 finished with value: 0.9123380281690141 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 30 with value: 0.9152992835682922.[0m


N_rows:  157203


[32m[I 2022-08-09 01:37:12,265][0m Trial 33 finished with value: 0.9046264902110904 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 30 with value: 0.9152992835682922.[0m


N_rows:  157203


[32m[I 2022-08-09 01:40:03,146][0m Trial 34 finished with value: 0.7497616538712429 and parameters: {'classifier': 'xgboost', 'learning_rate': 0.001, 'max_depth': 6, 'n_estimators': 2000}. Best is trial 30 with value: 0.9152992835682922.[0m


N_rows:  157203


[32m[I 2022-08-09 01:40:04,929][0m Trial 35 finished with value: 0.9164345403899722 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 01:40:06,629][0m Trial 36 finished with value: 0.9139489644124041 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 01:42:41,598][0m Trial 37 finished with value: 0.8387212931103465 and parameters: {'classifier': 'lgbm', 'learning_rate': 0.0001, 'max_depth': 13, 'n_estimators': 4000}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 01:42:47,454][0m Trial 38 finished with value: 0.8549552152557064 and parameters: {'classifier': 'logreg', 'C': 1.0}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 01:43:52,655][0m Trial 39 finished with value: 0.7665506933895232 and parameters: {'classifier': 'xgboost', 'learning_rate': 0.001, 'max_depth': 16, 'n_estimators': 200}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 01:44:16,932][0m Trial 40 finished with value: 0.840518693162681 and parameters: {'classifier': 'lgbm', 'learning_rate': 0.1, 'max_depth': 8, 'n_estimators': 500}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 01:44:18,460][0m Trial 41 finished with value: 0.9060877251132472 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 01:44:19,920][0m Trial 42 finished with value: 0.9136332805787927 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 01:44:21,417][0m Trial 43 finished with value: 0.9050273102218259 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 01:44:23,098][0m Trial 44 finished with value: 0.9122905656134038 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 01:44:24,768][0m Trial 45 finished with value: 0.9142661179698217 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 01:44:27,200][0m Trial 46 finished with value: 0.8598689831956707 and parameters: {'classifier': 'logreg', 'C': 0.1}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:12:25,132][0m Trial 47 finished with value: 0.7799398985171684 and parameters: {'classifier': 'xgboost', 'learning_rate': 0.0001, 'max_depth': 19, 'n_estimators': 4000}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:12:31,089][0m Trial 48 finished with value: 0.8541683845963552 and parameters: {'classifier': 'logreg', 'C': 1.0}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:12:32,593][0m Trial 49 finished with value: 0.9152465348784368 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:12:34,051][0m Trial 50 finished with value: 0.9142824575401801 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:12:35,580][0m Trial 51 finished with value: 0.9119159400849541 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:12:37,063][0m Trial 52 finished with value: 0.9132369299221357 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:12:38,525][0m Trial 53 finished with value: 0.9133529145955717 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:12:40,034][0m Trial 54 finished with value: 0.912272829257348 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:12:41,510][0m Trial 55 finished with value: 0.913235794487921 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:14:00,487][0m Trial 56 finished with value: 0.8443162099995964 and parameters: {'classifier': 'lgbm', 'learning_rate': 0.2, 'max_depth': 11, 'n_estimators': 2000}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:14:04,023][0m Trial 57 finished with value: 0.8522941804153329 and parameters: {'classifier': 'logreg', 'C': 1.0}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:14:05,469][0m Trial 58 finished with value: 0.9136952078128548 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:14:07,693][0m Trial 59 finished with value: 0.8602369668246446 and parameters: {'classifier': 'logreg', 'C': 0.1}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:15:11,277][0m Trial 60 finished with value: 0.7713191785589976 and parameters: {'classifier': 'xgboost', 'learning_rate': 0.001, 'max_depth': 16, 'n_estimators': 200}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:15:12,834][0m Trial 61 finished with value: 0.912125340599455 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:15:14,337][0m Trial 62 finished with value: 0.9149600833044081 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:15:15,805][0m Trial 63 finished with value: 0.9150326797385621 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:15:17,354][0m Trial 64 finished with value: 0.9149710128452881 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:15:18,797][0m Trial 65 finished with value: 0.9149207084153258 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:15:20,490][0m Trial 66 finished with value: 0.9124054633705836 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:15:22,168][0m Trial 67 finished with value: 0.9103826359364511 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:15:23,918][0m Trial 68 finished with value: 0.9125197027696464 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:15:25,597][0m Trial 69 finished with value: 0.9132399457749661 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:16:18,045][0m Trial 70 finished with value: 0.8431801303737932 and parameters: {'classifier': 'lgbm', 'learning_rate': 0.2, 'max_depth': 4, 'n_estimators': 2000}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:16:19,814][0m Trial 71 finished with value: 0.9141621129326047 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:16:21,498][0m Trial 72 finished with value: 0.9148177951488677 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:16:23,216][0m Trial 73 finished with value: 0.9145041705282669 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:16:24,929][0m Trial 74 finished with value: 0.9135830324909747 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:16:26,626][0m Trial 75 finished with value: 0.9116336082014709 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:16:29,467][0m Trial 76 finished with value: 0.8591763142911368 and parameters: {'classifier': 'logreg', 'C': 0.1}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:16:31,162][0m Trial 77 finished with value: 0.911850685855633 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:16:32,909][0m Trial 78 finished with value: 0.9140018273184103 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:34:49,376][0m Trial 79 finished with value: 0.7673950256771724 and parameters: {'classifier': 'xgboost', 'learning_rate': 0.0001, 'max_depth': 14, 'n_estimators': 4000}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:34:51,591][0m Trial 80 finished with value: 0.9136772577155221 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:34:53,065][0m Trial 81 finished with value: 0.9125454345192202 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:34:54,547][0m Trial 82 finished with value: 0.91260390923388 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:34:56,029][0m Trial 83 finished with value: 0.9143476277372263 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:34:57,506][0m Trial 84 finished with value: 0.9134865813611143 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:35:02,703][0m Trial 85 finished with value: 0.8543376923394249 and parameters: {'classifier': 'logreg', 'C': 1.0}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:35:04,166][0m Trial 86 finished with value: 0.9145406704038439 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:35:26,871][0m Trial 87 finished with value: 0.8391877998629198 and parameters: {'classifier': 'lgbm', 'learning_rate': 0.1, 'max_depth': 6, 'n_estimators': 500}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:35:28,387][0m Trial 88 finished with value: 0.9125374127036914 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:35:29,851][0m Trial 89 finished with value: 0.9133912248628885 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:35:31,324][0m Trial 90 finished with value: 0.914671006742618 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:35:32,803][0m Trial 91 finished with value: 0.9153525046382189 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:35:34,272][0m Trial 92 finished with value: 0.9133627019089574 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:35:35,754][0m Trial 93 finished with value: 0.9108788491795909 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:35:37,237][0m Trial 94 finished with value: 0.910958904109589 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:35:38,690][0m Trial 95 finished with value: 0.9118108471537427 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:35:41,225][0m Trial 96 finished with value: 0.8615882380929724 and parameters: {'classifier': 'logreg', 'C': 0.1}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:37:34,537][0m Trial 97 finished with value: 0.841938536224133 and parameters: {'classifier': 'xgboost', 'learning_rate': 0.1, 'max_depth': 17, 'n_estimators': 500}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:37:36,286][0m Trial 98 finished with value: 0.9149060347846387 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:37:41,449][0m Trial 99 finished with value: 0.8532423208191127 and parameters: {'classifier': 'logreg', 'C': 1.0}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:37:43,175][0m Trial 100 finished with value: 0.914967162115451 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:37:44,875][0m Trial 101 finished with value: 0.9120333822036766 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:37:46,552][0m Trial 102 finished with value: 0.9114984914515588 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:37:48,286][0m Trial 103 finished with value: 0.9065380028190394 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:37:49,961][0m Trial 104 finished with value: 0.9135717560751949 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:37:51,644][0m Trial 105 finished with value: 0.9125284738041002 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:38:43,209][0m Trial 106 finished with value: 0.8445268119737319 and parameters: {'classifier': 'lgbm', 'learning_rate': 0.01, 'max_depth': 20, 'n_estimators': 1000}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:38:44,972][0m Trial 107 finished with value: 0.9139489644124041 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:38:46,672][0m Trial 108 finished with value: 0.9137852593266605 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:38:48,351][0m Trial 109 finished with value: 0.9133484162895927 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:38:50,067][0m Trial 110 finished with value: 0.9137637580846477 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:38:51,759][0m Trial 111 finished with value: 0.9134582231927365 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:38:53,505][0m Trial 112 finished with value: 0.912819337839386 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:38:55,221][0m Trial 113 finished with value: 0.9144475920679886 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:38:56,971][0m Trial 114 finished with value: 0.9123201438848921 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:38:58,685][0m Trial 115 finished with value: 0.9119765871229176 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:39:00,380][0m Trial 116 finished with value: 0.9123595505617977 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:39:02,922][0m Trial 117 finished with value: 0.8603966973521875 and parameters: {'classifier': 'logreg', 'C': 0.1}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:47:49,003][0m Trial 118 finished with value: 0.8348554637865311 and parameters: {'classifier': 'xgboost', 'learning_rate': 0.2, 'max_depth': 11, 'n_estimators': 4000}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:47:51,254][0m Trial 119 finished with value: 0.9152327058551796 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:47:52,778][0m Trial 120 finished with value: 0.9139857771757535 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:47:54,259][0m Trial 121 finished with value: 0.9134213793881928 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:47:55,735][0m Trial 122 finished with value: 0.9144299151181464 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:47:57,211][0m Trial 123 finished with value: 0.9123987681076765 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:47:58,718][0m Trial 124 finished with value: 0.913556654676259 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:48:04,302][0m Trial 125 finished with value: 0.8557892112043887 and parameters: {'classifier': 'logreg', 'C': 1.0}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:48:05,780][0m Trial 126 finished with value: 0.913449634214969 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:49:17,461][0m Trial 127 finished with value: 0.8428988326848249 and parameters: {'classifier': 'lgbm', 'learning_rate': 0.0001, 'max_depth': 9, 'n_estimators': 2000}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:49:18,948][0m Trial 128 finished with value: 0.9137834509013548 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:49:20,374][0m Trial 129 finished with value: 0.9142207053469852 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:49:21,902][0m Trial 130 finished with value: 0.9125552909152773 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:49:23,439][0m Trial 131 finished with value: 0.9135153661601737 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:49:24,918][0m Trial 132 finished with value: 0.9135985409780006 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:49:26,416][0m Trial 133 finished with value: 0.9128616636528029 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:49:27,897][0m Trial 134 finished with value: 0.9123104774835936 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:49:29,428][0m Trial 135 finished with value: 0.9145142857142857 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:49:31,132][0m Trial 136 finished with value: 0.9153184165232358 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:49:32,824][0m Trial 137 finished with value: 0.9116178806444293 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:49:34,497][0m Trial 138 finished with value: 0.9117249154453213 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:50:25,129][0m Trial 139 finished with value: 0.7565402065836471 and parameters: {'classifier': 'xgboost', 'learning_rate': 0.001, 'max_depth': 14, 'n_estimators': 200}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:50:26,895][0m Trial 140 finished with value: 0.9137542277339347 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:50:28,590][0m Trial 141 finished with value: 0.9132259528130672 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:50:30,284][0m Trial 142 finished with value: 0.9140881786240508 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:50:31,986][0m Trial 143 finished with value: 0.9128471438247912 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:50:33,669][0m Trial 144 finished with value: 0.9126475548060708 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:50:36,371][0m Trial 145 finished with value: 0.859382377939371 and parameters: {'classifier': 'logreg', 'C': 0.1}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:50:38,070][0m Trial 146 finished with value: 0.9123605736398817 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:50:39,775][0m Trial 147 finished with value: 0.9121545720062907 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:50:44,896][0m Trial 148 finished with value: 0.8550604604936226 and parameters: {'classifier': 'logreg', 'C': 1.0}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:50:46,639][0m Trial 149 finished with value: 0.9132425317614742 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:50:59,101][0m Trial 150 finished with value: 0.8203236008617133 and parameters: {'classifier': 'lgbm', 'learning_rate': 0.2, 'max_depth': 5, 'n_estimators': 200}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:51:00,849][0m Trial 151 finished with value: 0.9139698260969711 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:51:02,551][0m Trial 152 finished with value: 0.9137290235386868 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:51:04,238][0m Trial 153 finished with value: 0.9133130424980272 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:51:05,940][0m Trial 154 finished with value: 0.9120480562167064 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:51:07,635][0m Trial 155 finished with value: 0.9109298167188199 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:51:09,313][0m Trial 156 finished with value: 0.9065420560747663 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:51:10,997][0m Trial 157 finished with value: 0.9135436837908646 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:51:12,707][0m Trial 158 finished with value: 0.9135511395531605 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:51:14,393][0m Trial 159 finished with value: 0.9120583286595625 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:51:16,061][0m Trial 160 finished with value: 0.9132176234979973 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:51:17,750][0m Trial 161 finished with value: 0.9159430357763112 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:51:19,420][0m Trial 162 finished with value: 0.911984809561041 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:51:21,128][0m Trial 163 finished with value: 0.912959818902094 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:51:22,818][0m Trial 164 finished with value: 0.9132676443629697 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:51:24,503][0m Trial 165 finished with value: 0.9147286821705426 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:51:26,168][0m Trial 166 finished with value: 0.9132269099201824 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:51:27,872][0m Trial 167 finished with value: 0.9144195612431444 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:54:10,756][0m Trial 168 finished with value: 0.7537324391795976 and parameters: {'classifier': 'xgboost', 'learning_rate': 0.001, 'max_depth': 10, 'n_estimators': 1000}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:54:12,459][0m Trial 169 finished with value: 0.914180816888481 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:54:14,225][0m Trial 170 finished with value: 0.8614120106423413 and parameters: {'classifier': 'logreg', 'C': 0.1}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:54:15,894][0m Trial 171 finished with value: 0.9121367521367522 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:54:17,585][0m Trial 172 finished with value: 0.9130580357142857 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:54:19,245][0m Trial 173 finished with value: 0.9133303511518676 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:54:20,909][0m Trial 174 finished with value: 0.9132497458488648 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:54:22,615][0m Trial 175 finished with value: 0.9124659400544959 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:54:24,278][0m Trial 176 finished with value: 0.9124831913939937 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:54:25,981][0m Trial 177 finished with value: 0.9137989778534923 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:54:27,755][0m Trial 178 finished with value: 0.9123023438376135 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:54:32,287][0m Trial 179 finished with value: 0.8558521220159151 and parameters: {'classifier': 'logreg', 'C': 1.0}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:54:49,411][0m Trial 180 finished with value: 0.8257150494520181 and parameters: {'classifier': 'lgbm', 'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 500}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:54:51,111][0m Trial 181 finished with value: 0.9124698864288172 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:54:52,769][0m Trial 182 finished with value: 0.9148022598870057 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:54:54,467][0m Trial 183 finished with value: 0.9132311186825667 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:54:56,144][0m Trial 184 finished with value: 0.9134615384615384 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:54:57,855][0m Trial 185 finished with value: 0.9154684601113172 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:54:59,526][0m Trial 186 finished with value: 0.9129358830146231 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:55:01,200][0m Trial 187 finished with value: 0.9104624467038374 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:55:02,896][0m Trial 188 finished with value: 0.9067223099481164 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:55:04,603][0m Trial 189 finished with value: 0.9132404575829652 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:55:06,279][0m Trial 190 finished with value: 0.9128280211735555 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:55:07,933][0m Trial 191 finished with value: 0.9125143513203214 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:55:09,647][0m Trial 192 finished with value: 0.9132466940264478 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:55:11,355][0m Trial 193 finished with value: 0.9111994698475812 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:55:13,104][0m Trial 194 finished with value: 0.9120401337792642 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:55:14,785][0m Trial 195 finished with value: 0.914074074074074 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:55:16,474][0m Trial 196 finished with value: 0.9122451278585108 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:55:18,225][0m Trial 197 finished with value: 0.914272527979693 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:55:20,051][0m Trial 198 finished with value: 0.9124719101123595 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


N_rows:  157203


[32m[I 2022-08-09 02:55:21,722][0m Trial 199 finished with value: 0.9126366813211588 and parameters: {'classifier': 'logreg', 'C': 0.01}. Best is trial 35 with value: 0.9164345403899722.[0m


In [14]:
study_df = study.trials_dataframe()

## 4.6 Modeling Summary

1. Train size of 18% of the total data

2. Validate on 10% of the total dataset

3. Using Bayesian optimization with optuna

### 4.6.1 Model Comparisons

### XGBoost results:

1. The best recall for any xgboost models was around 0.85

2. The XGBoost was the slowest to train and predict of the 3 models, with an average time of 5 minutes and 28 seconds.

3. The top XGBoost models all had similar max depths of around 10-20 and varying estimators between 500 and 4000.

In [15]:
best_xg_model = study_df[study_df["params_classifier"] == "xgboost"].sort_values("value", ascending=False).head(1)
print(best_xg_model)
best_xg_model.to_csv("best_models.csv", index=False, mode="w")

    number     value             datetime_start          datetime_complete  \
97      97  0.841939 2022-08-09 02:35:41.225839 2022-08-09 02:37:34.537309   

                 duration  params_C params_classifier  params_learning_rate  \
97 0 days 00:01:53.311470       NaN           xgboost                   0.1   

    params_max_depth  params_n_estimators     state  
97              17.0                500.0  COMPLETE  


In [16]:
study_df[study_df["params_classifier"] == "xgboost"].describe()

Unnamed: 0,number,value,duration,params_C,params_learning_rate,params_max_depth,params_n_estimators
count,14.0,14.0,14,0.0,14.0,14.0,14.0
mean,59.0,0.79018,0 days 00:05:28.169958,,0.051871,13.142857,1414.285714
std,53.864502,0.037802,0 days 00:07:58.468925269,,0.08437,3.613163,1484.720716
min,1.0,0.749762,0 days 00:00:29.058107,,0.0001,6.0,200.0
25%,12.75,0.759043,0 days 00:01:11.018197,,0.001,10.25,275.0
50%,43.0,0.771867,0 days 00:02:44.950635500,,0.001,14.0,1000.0
75%,92.5,0.833926,0 days 00:03:15.305097,,0.0775,16.0,1750.0
max,168.0,0.841939,0 days 00:27:57.931114,,0.2,19.0,4000.0


In [17]:
study_df[study_df["params_classifier"] == "xgboost"].sort_values("value", ascending=False).head(5)

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_C,params_classifier,params_learning_rate,params_max_depth,params_n_estimators,state
97,97,0.841939,2022-08-09 02:35:41.225839,2022-08-09 02:37:34.537309,0 days 00:01:53.311470,,xgboost,0.1,17.0,500.0,COMPLETE
7,7,0.840946,2022-08-09 01:28:07.298355,2022-08-09 01:31:26.477149,0 days 00:03:19.178794,,xgboost,0.2,16.0,1000.0,COMPLETE
27,27,0.840291,2022-08-09 01:35:24.936693,2022-08-09 01:36:53.408932,0 days 00:01:28.472239,,xgboost,0.2,14.0,500.0,COMPLETE
118,118,0.834855,2022-08-09 02:39:02.922890,2022-08-09 02:47:49.002882,0 days 00:08:46.079992,,xgboost,0.2,11.0,4000.0,COMPLETE
8,8,0.831136,2022-08-09 01:31:26.477985,2022-08-09 01:34:30.161991,0 days 00:03:03.684006,,xgboost,0.01,12.0,1000.0,COMPLETE


### LightGBM Reults:

1. With the size of the data (roughly 800,000 reviews/rows, 2 million features/columns), the LightGB model only took an average of 58 seconds to train/predict.
This was much faster than the XGBoost, and it is often used for large datasets instead.

2. The best LightGBM models all had similar recall scores to the best XGboost model, around 0.85 as well.

3. LightGBM models did well with lower learning rates and higher depth or a higher learning rate and relatively lower depth.

In [18]:
best_lgb_model = study_df[study_df["params_classifier"] == "lgbm"].sort_values("value", ascending=False).head(1)
print(best_lgb_model)
best_lgb_model.to_csv("best_models.csv", index=False, mode="a")

     number     value             datetime_start          datetime_complete  \
106     106  0.844527 2022-08-09 02:37:51.645079 2022-08-09 02:38:43.209772   

                  duration  params_C params_classifier  params_learning_rate  \
106 0 days 00:00:51.564693       NaN              lgbm                  0.01   

     params_max_depth  params_n_estimators     state  
106              20.0               1000.0  COMPLETE  


In [19]:
study_df[study_df["params_classifier"] == "lgbm"].describe()

Unnamed: 0,number,value,duration,params_C,params_learning_rate,params_max_depth,params_n_estimators
count,14.0,14.0,14,0.0,14.0,14.0,14.0
mean,65.214286,0.835981,0 days 00:00:57.888010928,,0.074307,9.357143,1457.142857
std,57.180118,0.011132,0 days 00:00:51.526948337,,0.079951,6.15951,1281.654151
min,3.0,0.807192,0 days 00:00:10.332197,,0.0001,3.0,200.0
25%,21.25,0.833544,0 days 00:00:19.953306250,,0.01,4.25,500.0
50%,48.0,0.839853,0 days 00:00:50.945442,,0.055,7.5,1000.0
75%,101.25,0.844032,0 days 00:01:10.435751,,0.1,12.5,2000.0
max,180.0,0.844527,0 days 00:02:57.835429,,0.2,20.0,4000.0


In [20]:
study_df[study_df["params_classifier"] == "lgbm"].sort_values("value", ascending=False).head(5)

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_C,params_classifier,params_learning_rate,params_max_depth,params_n_estimators,state
106,106,0.844527,2022-08-09 02:37:51.645079,2022-08-09 02:38:43.209772,0 days 00:00:51.564693,,lgbm,0.01,20.0,1000.0,COMPLETE
3,3,0.844389,2022-08-09 01:23:10.448339,2022-08-09 01:24:17.147875,0 days 00:01:06.699536,,lgbm,0.01,7.0,2000.0,COMPLETE
6,6,0.844384,2022-08-09 01:27:16.971263,2022-08-09 01:28:07.297454,0 days 00:00:50.326191,,lgbm,0.0001,18.0,1000.0,COMPLETE
56,56,0.844316,2022-08-09 02:12:41.510831,2022-08-09 02:14:00.486834,0 days 00:01:18.976003,,lgbm,0.2,11.0,2000.0,COMPLETE
70,70,0.84318,2022-08-09 02:15:25.598517,2022-08-09 02:16:18.045150,0 days 00:00:52.446633,,lgbm,0.2,4.0,2000.0,COMPLETE


### Logistic Regression Results:

1. Train/predict was the fastest of the 3 models. It is a simpler model, using gradient descent, which computes very quickly. On average, it only took 2 seconds.

2. The best linear regression models had a C value (determining gradient descent speed) of 0.01 and a recall near 0.91.

3. Overall, despite being the simplest model, the Logistic Regression model gave the best results, as well as training/predicting far, far quicker.

In [21]:
best_lr_model = study_df[study_df["params_classifier"] == "logreg"].sort_values("value", ascending=False).head(1)
print(best_lr_model)
best_lr_model.to_csv("best_models.csv", index=False, mode="a")

    number     value             datetime_start          datetime_complete  \
35      35  0.916435 2022-08-09 01:40:03.147489 2022-08-09 01:40:04.929709   

                 duration  params_C params_classifier  params_learning_rate  \
35 0 days 00:00:01.782220      0.01            logreg                   NaN   

    params_max_depth  params_n_estimators     state  
35               NaN                  NaN  COMPLETE  


In [22]:
study_df[study_df["params_classifier"] == "logreg"].describe()

Unnamed: 0,number,value,duration,params_C,params_learning_rate,params_max_depth,params_n_estimators
count,172.0,172.0,172,172.0,0.0,0.0,0.0
mean,105.587209,0.906174,0 days 00:00:01.895718837,0.078547,,,
std,56.228112,0.018383,0 days 00:00:00.879439925,0.242477,,,
min,0.0,0.852294,0 days 00:00:01.425250,0.01,,,
25%,58.75,0.912021,0 days 00:00:01.511956750,0.01,,,
50%,107.5,0.913222,0 days 00:00:01.685848500,0.01,,,
75%,154.25,0.913836,0 days 00:00:01.716180500,0.01,,,
max,199.0,0.916435,0 days 00:00:05.955641,1.0,,,


In [23]:
study_df[study_df["params_classifier"] == "logreg"].sort_values("value", ascending=False).head(5)

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_C,params_classifier,params_learning_rate,params_max_depth,params_n_estimators,state
35,35,0.916435,2022-08-09 01:40:03.147489,2022-08-09 01:40:04.929709,0 days 00:00:01.782220,0.01,logreg,,,,COMPLETE
161,161,0.915943,2022-08-09 02:51:16.061808,2022-08-09 02:51:17.750527,0 days 00:00:01.688719,0.01,logreg,,,,COMPLETE
185,185,0.915468,2022-08-09 02:54:56.144753,2022-08-09 02:54:57.855027,0 days 00:00:01.710274,0.01,logreg,,,,COMPLETE
91,91,0.915353,2022-08-09 02:35:31.325249,2022-08-09 02:35:32.803748,0 days 00:00:01.478499,0.01,logreg,,,,COMPLETE
136,136,0.915318,2022-08-09 02:49:29.429322,2022-08-09 02:49:31.132673,0 days 00:00:01.703351,0.01,logreg,,,,COMPLETE


### 4.6.2 Conclusion

1. Though quite long to process, we examined the effects of different sampling methods to help with the class imbalance in our notebook.

2. We found no benefit using resampling/synthetic data generation, even on very small training sets, with few instances of the positive (predicted) class.

3. Our best model, both in terms of recall and prediction time, was the Logistic Regression model.

4. In the final notebook, we will look at the interpretability of the three models and compare any similarities and differences.