<a href="https://colab.research.google.com/github/hsaripalli/Pump-It-Up/blob/main/model/Pump_it_up_Optuna_Tuned.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Notes**

**Hyperparamter tuned using optuna library. Best accuracy of 82.4% on submission on submission.** 

Load Data:

- Loaded and combined train and test csv files
- Dropped columns that obviously did not have any significance (mostly 0s or same value all across)
- Parsed date and created two new columns- month and year

Numeric Columns:

- List all numerical columns
- Impute gps height, lat/long using grouped means
- Impute construction year and population using grouped mean
- Created new column, age, using year recorded - construction year. Imputed negative vlaues of age
- Created new column, 'season' using the month column. 
- Using DBScan to create clusters for lat/long. Didn't do anything for accuracy

Categorical Columns

- Converted all strings to lower case 
- Split into columns that have too many unique values vs not too many unique values
- Replaced 0s and 'none's with most frequent values
- Cleaned up some values that are mostly similar but have typos or entered as different versions. For example: community vs commu. 
- Dropped some columns that are mostly similar to others

Split Train and Test

- Seperated train and test csv files after cleaning
- Did not do a train-test split to maximize the training data. Used 3 fold cross validation instead. 
- Label encoded

Pipeline:

- MyCategoryCoalescer- Customer transformer (Uncle Steve's) to retain top 25 per column and replace the rest as "Other'
- Ordinal Encoder for all category columns
- Scaler for numeric columns. Scaler didnt really boost accuracy, IMO. 

Models: 

- Trained random forest, xgboost, adaboost, bagging (with base as decision trees), extra trees, LIghGBM, CatBoost
- All models have mostly similar accuracies except adaboost. adaboost lower by a few points
- Stacking all five models gave the best accuracy

# **Load Data**

In [None]:
# Merged train and test for preprocessing

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
values = pd.read_csv("https://raw.githubusercontent.com/hsaripalli/Pump-It-Up/main/Training_set_values.csv")
labels = pd.read_csv("https://raw.githubusercontent.com/hsaripalli/Pump-It-Up/main/Training_set_labels.csv")
test = pd.read_csv("https://raw.githubusercontent.com/hsaripalli/Pump-It-Up/main/Test_set_values.csv")

In [None]:
# Merge train and test

values['train'] = True
test['test'] = True
data = pd.concat([values, test], ignore_index = True)

In [None]:
#Drop columns

columns_to_drop = ['num_private', 'recorded_by']
data = data.drop(columns_to_drop, axis = 1)

In [None]:
#Parse dates

data['date_recorded' + '_year'] = pd.to_datetime(data['date_recorded']).dt.year 
data['date_recorded' + '_month'] = pd.to_datetime(data['date_recorded']).dt.month
data = data.drop('date_recorded', axis = 1)

# **Data Cleaning - Numerical Features**

In [None]:
numeric_columns = data.select_dtypes(exclude = 'object').columns.tolist()

In [None]:
# Impute small latitude values with 0
data.loc[data['latitude'] > -0.5, 'latitude'] = 0

In [None]:
# gps height and longitude: impute 0 and nan with grouped mean
col1 = ['gps_height', 'longitude', 'latitude']
data[col1] = data[col1].replace(0, np.nan)
for i in col1:
    data[i] = data[i].fillna(data.groupby('subvillage')[i].transform('mean'))
    data[i] = data[i].fillna(data.groupby('ward')[i].transform('mean'))
    data[i] = data[i].fillna(data.groupby('lga')[i].transform('mean'))
    data[i] = data[i].fillna(data.groupby('region')[i].transform('mean'))
    data[i] = data[i].fillna(data.groupby('basin')[i].transform('mean'))

In [None]:
# construction year and population: impute 0 and nan with most frequent
col2 = ['construction_year', 'population']
data[col2] = data[col2].replace(0, np.nan)
for i in col2:
    data[i] = round(data[i].fillna(data.groupby('subvillage')[i].transform('mean')))
    data[i] = round(data[i].fillna(data.groupby('ward')[i].transform('mean')))
    data[i] = round(data[i].fillna(data.groupby('lga')[i].transform('mean')))
    data[i] = round(data[i].fillna(data.groupby('region')[i].transform('mean')))
    data[i] = round(data[i].fillna(data.groupby('basin')[i].transform('mean')))

In [None]:
# Add age = date recordced - construction year
# Impute negative age with 1
data['age'] = data['date_recorded_year'] - data['construction_year']
data.loc[data['age'] < 0, 'age'] = 1

In [None]:
# Jan and Feb short dry season
# long rains lasts during about March, April and May 
# long dry season lasts throughout June, July, August, September and October 
# During November and December there's another rainy season: the 'short rains'

data.loc[(data['date_recorded_month'] >= 1) & (data['date_recorded_month'] <= 2), 'season'] = 1
data.loc[(data['date_recorded_month'] >= 3) & (data['date_recorded_month'] <= 5), 'season'] = 2
data.loc[(data['date_recorded_month'] >= 6) & (data['date_recorded_month'] <= 10), 'season'] = 3
data.loc[(data['date_recorded_month'] >= 11) & (data['date_recorded_month'] <= 12), 'season'] = 4

data['season']

0        2.0
1        2.0
2        1.0
3        1.0
4        3.0
        ... 
74245    1.0
74246    2.0
74247    2.0
74248    1.0
74249    1.0
Name: season, Length: 74250, dtype: float64

In [None]:
from sklearn.cluster import KMeans

clusters = 15
kmeans = KMeans(n_clusters=clusters, random_state=0).fit(data[['latitude', 'longitude']].values)
kmean_feats = pd.DataFrame(kmeans.fit_transform(data[['latitude', 'longitude']].values), columns=['gspatial_' + str(i) for i in range(clusters)])


In [None]:
data = pd.concat([data, kmean_feats], axis = 1)

# **Data Cleaning - Categorical Features**

In [None]:
categorical_columns = data.select_dtypes(include = 'object').columns.tolist()

*Dealing with columns that contain too many unique values*




In [None]:
# TOO MANY UNIQUE VALUES
#funder                    2140
#installer                 2410
#wpt_name                 45684
#subvillage               21425
#lga                        125
#ward                      2098
#scheme name

In [None]:
# convert to lowercase
col3 = ['funder', 'installer','wpt_name', 'basin', 'subvillage', 'region',
                 'lga', 'ward','scheme_management', 'extraction_type','extraction_type_group',
                 'extraction_type_class','management','management_group','payment','payment_type',
                 'water_quality', 'quality_group','quantity','quantity_group','source','source_type', 
                 'source_class','waterpoint_type','waterpoint_type_group', 'scheme_name']
for i in col3:
  data[i] = data[i].str.lower()

In [None]:
# fill na with most frequest
col4 = ['funder', 'installer', 'wpt_name', 'subvillage', 'lga', 'ward', 'scheme_name']
data[col4] = data[col4].replace(to_replace = ('0', 'none'), value = np.nan)

In [None]:
data['installer'] = data['installer'].replace(to_replace = ('gover'), value = 'government')
data['installer'] = data['installer'].replace(to_replace = ('commu'), value = 'community')

In [None]:
for i in col4:
    data[i] = data[i].fillna(data[i].mode()[0])

Dealing with columns containing **not** too many unique values




In [None]:
# Not too many unique values
#basin                        9
#region                      21
#region_code                 27
#district_code               20
#public_meeting               2
#scheme_management           12
#permit                       2
#construction_year           55
#extraction_type             18
#extraction_type_group       13
#extraction_type_class        7
#management                  12
#management_group             5
#payment                      7
#payment_type                 7
#water_quality                8
#quality_group                6
#quantity                     5
#quantity_group               5
#source                      10
#source_type                  7
#source_class                 3
#waterpoint_type              7
#waterpoint_type_group        6
#train                        1
#test                         1
#date_recorded_year           6
#date_recorded_month         12

In [None]:
#public_meeting               2
#scheme_management           12
#permit                       2

In [None]:
col5 = ['public_meeting', 'permit']
for i in col5:
    data[i] = data[i].fillna(data[i].mode()[0])
    data[i] = data[i].astype(str)

In [None]:
# public meeting and scheme management: fill na with most frequest

data['scheme_management'] = data['scheme_management'].replace(to_replace = (np.nan, 'none'), value = 'other')

In [None]:
#extraction_type             18
#extraction_type_group       13
#extraction_type_class        7

In [None]:
# clean/ replace some values in extraction_type column

data = data.replace({'extraction_type': 
                     {'cemo': 'other motorpump',
                      'climax': 'other motorpump',
                      'india mark ii': 'india mark',
                      'india mark iii': 'india mark',
                      'other - mkulima/shinyanga': 'other handpump',
                      'other - play pump': 'other handpump',
                      'other - rope pump': 'rope pump',
                      'other - swn 81': 'swn',
                      'swn 80': 'swn'
                      }})


In [None]:
# describe columns (run one at a time)

#data[['extraction_type', 'extraction_type_group', 'extraction_type_class']].groupby('extraction_type_group').describe()
#data[['payment', 'payment_type']].groupby('payment').describe()
#data[['water_quality', 'quality_group']].groupby('water_quality').describe()
#data[['quantity', 'quantity_group']].groupby('quantity').describe()
#data[['source', 'source_type', 'source_class']].groupby('source').describe()
#data[['waterpoint_type', 'waterpoint_type_group']].groupby('waterpoint_type').describe()

In [None]:
col6 = ['extraction_type_group', 'payment_type', 'quality_group', 'quantity_group', 'source_type', 'waterpoint_type_group']
data = data.drop(col6, axis = 1)

In [None]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 74250 entries, 0 to 74249
Data columns (total 52 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   id                     74250 non-null  int64  
 1   amount_tsh             74250 non-null  float64
 2   funder                 74250 non-null  object 
 3   gps_height             74250 non-null  float64
 4   installer              74250 non-null  object 
 5   longitude              74250 non-null  float64
 6   latitude               74250 non-null  float64
 7   wpt_name               74250 non-null  object 
 8   basin                  74250 non-null  object 
 9   subvillage             74250 non-null  object 
 10  region                 74250 non-null  object 
 11  region_code            74250 non-null  int64  
 12  district_code          74250 non-null  int64  
 13  lga                    74250 non-null  object 
 14  ward                   74250 non-null  object 
 15  po

# **Split Train and Test**

In [None]:
# Reverse split merged and clean data into train and test
train_values = data[data['train'] == True]
test = data[data['test'] == True]
train_values = train_values.drop(['train', 'test'], axis = 1)
test = test.drop(['train', 'test'], axis = 1)

In [None]:
test_set = test.drop('id', axis = 1)
x = train_values.drop('id', axis = 1)

In [None]:
X = x.copy()
y = pd.DataFrame(labels['status_group'])

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y = le.fit_transform(y.values.ravel())

In [None]:
le_name_mapping = dict(zip(le.classes_, le.transform(le.classes_)))
print(le_name_mapping)

{'functional': 0, 'functional needs repair': 1, 'non functional': 2}


In [None]:
# Split train and test

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.01, random_state = 123)

In [None]:
X_test.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 14850 entries, 36801 to 56271
Data columns (total 49 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   amount_tsh             14850 non-null  float64
 1   funder                 14850 non-null  object 
 2   gps_height             14850 non-null  float64
 3   installer              14850 non-null  object 
 4   longitude              14850 non-null  float64
 5   latitude               14850 non-null  float64
 6   wpt_name               14850 non-null  object 
 7   basin                  14850 non-null  object 
 8   subvillage             14850 non-null  object 
 9   region                 14850 non-null  object 
 10  region_code            14850 non-null  int64  
 11  district_code          14850 non-null  int64  
 12  lga                    14850 non-null  object 
 13  ward                   14850 non-null  object 
 14  population             14850 non-null  float64
 15

# **Pipeline**

In [None]:
#Uncle Steve's Custom Transformer for Category Coalescing

from sklearn.base import BaseEstimator, TransformerMixin

class MyCategoryCoalescer(BaseEstimator, TransformerMixin):
    # Coalesces (smushes/condenses) rare levels of a categorical 
    # feature into "__OTHER__".
    #
    # Will leave the `keep_top` most frequent levels unchanged; the rest
    # will be changed to `"__OTHER__"`.
    #
    # Note that there was a design choice: either have the user
    # pass in the names of the columns to operate one (which I've done here), 
    # or just operate on all the columns (and have the user be responsible for
    # passing in a subset of the dataframe). Pros and cons to each and there's
    # note a singe best answer.
    
    def __init__(self, cat_cols=[], keep_top=25):
        self.cat_cols = cat_cols
        self.keep_top = keep_top
        
        # For each cat_col, this dict will hold an list of the most-frequent 
        # levels
        self.top_n_values = {}
            
    def get_top_n_values(self, X, col, n=25):
        # A helper function to do the actual work.

        # Get the sorted value counts for the column
        vc = X[col].value_counts(sort=True, ascending=False)

        # Get the actual values
        vals = list(vc.index)
        if len(vals) > n:
            top_values = vals[0:n]
        else:
            top_values =  vals

        # Debug printing.
        #print("Top n={} values for column={}:".format(n, col))
        #print(top_values)
        return top_values
    
    def fit(self, X, y=None):

        # Find the top n values for each cateogircal column
        for col in self.cat_cols:
            self.top_n_values[col] = self.get_top_n_values(X, col, n=self.keep_top)
        return self
    
    def transform(self, X, y=None):
        _X = X.copy()
        _X[self.cat_cols] = _X[self.cat_cols].astype('category')
        for c in self.cat_cols:
            _X[c] = _X[c].cat.add_categories('__OTHER__')
            _X.loc[~_X[c].isin(self.top_n_values[c]), c] = "__OTHER__"
        return _X

In [None]:
# Model fit

from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OrdinalEncoder, LabelEncoder, OneHotEncoder
from sklearn.compose import make_column_selector
from sklearn.model_selection import cross_val_score, GridSearchCV, RandomizedSearchCV
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import VotingClassifier
from sklearn.impute import SimpleImputer
from sklearn.metrics import accuracy_score, f1_score

categorical_cols = ['funder', 'installer', 'wpt_name', 'basin', 'subvillage', 'region', 'lga', 'ward', 'scheme_management',
                    'extraction_type','extraction_type_class', 'management', 'management_group', 'payment', 'water_quality',
                    'quantity','source','source_class','waterpoint_type', 'permit', 'public_meeting', 'scheme_name']

columns_to_coal = ['funder','installer', 'subvillage', 'lga', 'ward', 'wpt_name', 'scheme_name']

columns_to_scale = ['population', 'gps_height', 'latitude', 'longitude']

coalescer = MyCategoryCoalescer(cat_cols=columns_to_coal, keep_top=25)
encoder = OrdinalEncoder()
scaler = StandardScaler()


cat_transformer = Pipeline([
                            ('coalescer', coalescer),                      
                            ('encoder', encoder)
                           ])

preprocessor = Pipeline(steps = [
                                 ('ct', ColumnTransformer(
                                     transformers=[
                                                   ('categorical', cat_transformer, categorical_cols),
                                                   ('scale', scaler, columns_to_scale)
                                                   ], 
                                                   remainder = 'passthrough', 
                                                   sparse_threshold =0)),
                                 ])

# *Random Forest (optuna tuned)*

In [None]:
# Random Forest Tuned
# Random Forest Tuned Hyper Parameters
# {'rf__max_depth': 20, 'rf__min_samples_split': 5, 'rf__n_estimators': 1000}

from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(criterion = 'gini',
                            n_estimators = 536,
                            min_samples_split = 8,
                            max_depth = 20,
                            random_state = 42)

rf_pipeline = Pipeline(steps = [('preprocess', preprocessor), ('rf', rf)])
rf_pipeline.fit(X_train,y_train)
y_pred_rf_pipeline = rf_pipeline.predict(X_test)

In [None]:
print("Accuracy of RF = {:.4f}".format(accuracy_score(y_test, y_pred_rf_pipeline)))

Accuracy of RF = 0.8418


In [None]:
from sklearn.metrics import classification_report, confusion_matrix

print(classification_report(y_test, y_pred_rf_pipeline))
pd.DataFrame(confusion_matrix(y_test, y_pred_rf_pipeline))

In [None]:
import optuna
def objective(trial):
    

    param = {
        "criterion": trial.suggest_categorical("criterion", ['gini', 'entropy']),
        "min_samples_split": trial.suggest_int("min_samples_split", 2,10),
        "n_estimators": trial.suggest_int("n_estimators", 200,1500),
        "max_depth": trial.suggest_int("max_depth", 5, 50)
    }

    rf = RandomForestClassifier(**param)
    rf_pipeline = Pipeline(steps = [('preprocess', preprocessor), ('rf', rf)])
    
    return cross_val_score(rf_pipeline, X, y, cv = 3).mean()
    
if __name__ == "__main__":
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=10)

    print("Number of finished trials: {}".format(len(study.trials)))

    print("Best trial:")
    trial = study.best_trial

    print("  Value: {}".format(trial.value))

    print("  Params: ")
    for key, value in trial.params.items():
        print("    {}: {}".format(key, value))

[32m[I 2021-11-16 10:40:02,033][0m A new study created in memory with name: no-name-4e962812-b08c-43ac-916e-97406778cbe3[0m
[32m[I 2021-11-16 10:42:50,490][0m Trial 0 finished with value: 0.7937205387205388 and parameters: {'criterion': 'gini', 'min_samples_split': 10, 'n_estimators': 470, 'max_depth': 14}. Best is trial 0 with value: 0.7937205387205388.[0m
[32m[I 2021-11-16 11:01:18,854][0m Trial 1 finished with value: 0.806969696969697 and parameters: {'criterion': 'entropy', 'min_samples_split': 4, 'n_estimators': 1430, 'max_depth': 49}. Best is trial 1 with value: 0.806969696969697.[0m
[32m[I 2021-11-16 11:07:59,682][0m Trial 2 finished with value: 0.8093771043771044 and parameters: {'criterion': 'entropy', 'min_samples_split': 7, 'n_estimators': 516, 'max_depth': 26}. Best is trial 2 with value: 0.8093771043771044.[0m
[32m[I 2021-11-16 11:13:46,526][0m Trial 3 finished with value: 0.8093602693602694 and parameters: {'criterion': 'gini', 'min_samples_split': 9, 'n_est

Number of finished trials: 10
Best trial:
  Value: 0.8093771043771044
  Params: 
    criterion: entropy
    min_samples_split: 7
    n_estimators: 516
    max_depth: 26


In [None]:
#Number of finished trials: 10
#Best trial:
#  Value: 0.8093771043771044
#  Params: 
#    criterion: entropy
#    min_samples_split: 7
#    n_estimators: 516
#    max_depth: 26
        
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(criterion = 'gini',
                            n_estimators = 516,
                            min_samples_split = 7,
                            max_depth = 26,
                            random_state = 42)

rf_pipeline = Pipeline(steps = [('preprocess', preprocessor), ('rf', rf)])
rf_pipeline.fit(X_train,y_train)
y_pred_rf_pipeline = rf_pipeline.predict(X_test)

In [None]:
print("Accuracy of RF = {:.4f}".format(accuracy_score(y_test, y_pred_rf_pipeline)))

Accuracy of RF = 0.8468


# *LGBM (optuna tuned)*

In [None]:
from lightgbm import LGBMClassifier

lgbm = LGBMClassifier(boosting_type = 'gbdt',
                      objective = 'multiclass',
                      num_class = 3,
                      metric = 'multi_error',
                      num_iterations = 200,
                      lambda_l1 =  2.2899315163770417e-06,
                      lambda_l2 =  2.6273452242794607e-06,
                      num_leaves = 239,
                      feature_fraction = 0.5633644014015632,
                      learning_rate = 0.06012805964180289,
                      bagging_fraction = 0.6953776886469089,
                      bagging_freq = 6,
                      min_child_samples = 47,
                      min_data_in_leaf = 17,
                      max_depth = 46
                      )

lgbm_pipeline = Pipeline(steps = [('preprocess', preprocessor), ('lgbm', lgbm)])
lgbm_pipeline.fit(X_train,y_train)
y_pred_lgbm_pipeline = lgbm_pipeline.predict(X_test)





In [None]:
print("Accuracy of LGBM   = {:.4f}".format(accuracy_score(y_test, y_pred_lgbm_pipeline)))

Accuracy of LGBM   = 0.8333


In [None]:
from sklearn.metrics import classification_report, confusion_matrix

print(classification_report(y_test, y_pred_lgbm_pipeline))
pd.DataFrame(confusion_matrix(y_test, y_pred_lgbm_pipeline))

# *Catboost (hyperparameter tuned)*

In [None]:
pip install catboost

In [None]:
from catboost import CatBoostClassifier

cat = CatBoostClassifier(depth = 10,
                        iterations = 500,
                         learning_rate = 0.05,
                        random_state = 42)

cat_pipeline = Pipeline(steps = [('preprocess', preprocessor), ('catboost', cat)])
cat_pipeline.fit(X_train,y_train)
y_pred_cat_pipeline = cat_pipeline.predict(X_test)

In [None]:
print("Accuracy of Catboost   = {:.4f}".format(accuracy_score(y_test, y_pred_cat_pipeline)))

In [None]:
import optuna
from catboost import CatBoostClassifier

def objective(trial):
    

    param = {
        "colsample_bylevel": trial.suggest_float("colsample_bylevel", 0.01, 0.1),
        "depth": trial.suggest_int("depth", 1, 12),
        "boosting_type": trial.suggest_categorical("boosting_type", ["Ordered", "Plain"]),
        "bootstrap_type": trial.suggest_categorical(
            "bootstrap_type", ["Bayesian", "Bernoulli", "MVS"]
        ),
        
        "used_ram_limit": "2gb",
    }

    if param["bootstrap_type"] == "Bayesian":
        param["bagging_temperature"] = trial.suggest_float("bagging_temperature", 0, 10)
    elif param["bootstrap_type"] == "Bernoulli":
        param["subsample"] = trial.suggest_float("subsample", 0.1, 1)

    cat = CatBoostClassifier(**param)
    cat_pipeline = Pipeline(steps = [('preprocess', preprocessor), ('catboost', cat)])
    
    return cross_val_score(cat_pipeline, X, y, cv = 3).mean()
    
if __name__ == "__main__":
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=10)

    print("Number of finished trials: {}".format(len(study.trials)))

    print("Best trial:")
    trial = study.best_trial

    print("  Value: {}".format(trial.value))

    print("  Params: ")
    for key, value in trial.params.items():
        print("    {}: {}".format(key, value))

In [None]:
#Number of finished trials: 10
#Best trial:
#  Value: 0.8049831649831649
#  Params: 
#    colsample_bylevel: 0.07369920952387737
#    depth: 12
#    boosting_type: Plain
#   bootstrap_type: MVS

from catboost import CatBoostClassifier

cat = CatBoostClassifier(colsample_bylevel = 0.073699209523,
                         depth = 12,
                         boosting_type = 'Plain',
                         bootstrap_type = 'MVS',
                        random_state = 42)

cat_pipeline = Pipeline(steps = [('preprocess', preprocessor), ('catboost', cat)])
cat_pipeline.fit(X_train,y_train)
y_pred_cat_pipeline = cat_pipeline.predict(X_test)

Learning rate set to 0.097401
0:	learn: 1.0659572	total: 7.63ms	remaining: 7.63s
1:	learn: 1.0033309	total: 153ms	remaining: 1m 16s
2:	learn: 0.9597864	total: 324ms	remaining: 1m 47s
3:	learn: 0.9241089	total: 572ms	remaining: 2m 22s
4:	learn: 0.9074601	total: 577ms	remaining: 1m 54s
5:	learn: 0.8780877	total: 590ms	remaining: 1m 37s
6:	learn: 0.8447890	total: 722ms	remaining: 1m 42s
7:	learn: 0.8213336	total: 737ms	remaining: 1m 31s
8:	learn: 0.7970821	total: 905ms	remaining: 1m 39s
9:	learn: 0.7741211	total: 1.12s	remaining: 1m 51s
10:	learn: 0.7564468	total: 1.29s	remaining: 1m 55s
11:	learn: 0.7415618	total: 1.49s	remaining: 2m 2s
12:	learn: 0.7359933	total: 1.5s	remaining: 1m 53s
13:	learn: 0.7230250	total: 1.66s	remaining: 1m 57s
14:	learn: 0.7118325	total: 1.82s	remaining: 1m 59s
15:	learn: 0.6965485	total: 1.93s	remaining: 1m 58s
16:	learn: 0.6843580	total: 2.1s	remaining: 2m 1s
17:	learn: 0.6729083	total: 2.22s	remaining: 2m 1s
18:	learn: 0.6638933	total: 2.3s	remaining: 1m 58

158:	learn: 0.4361522	total: 19.4s	remaining: 1m 42s
159:	learn: 0.4355991	total: 19.5s	remaining: 1m 42s
160:	learn: 0.4347033	total: 19.6s	remaining: 1m 42s
161:	learn: 0.4338943	total: 19.8s	remaining: 1m 42s
162:	learn: 0.4335211	total: 19.9s	remaining: 1m 42s
163:	learn: 0.4331401	total: 20s	remaining: 1m 42s
164:	learn: 0.4326484	total: 20.2s	remaining: 1m 42s
165:	learn: 0.4318537	total: 20.4s	remaining: 1m 42s
166:	learn: 0.4317413	total: 20.4s	remaining: 1m 41s
167:	learn: 0.4311746	total: 20.6s	remaining: 1m 41s
168:	learn: 0.4310572	total: 20.6s	remaining: 1m 41s
169:	learn: 0.4306093	total: 20.7s	remaining: 1m 41s
170:	learn: 0.4306027	total: 20.7s	remaining: 1m 40s
171:	learn: 0.4300504	total: 20.9s	remaining: 1m 40s
172:	learn: 0.4295617	total: 21s	remaining: 1m 40s
173:	learn: 0.4287719	total: 21.2s	remaining: 1m 40s
174:	learn: 0.4279940	total: 21.3s	remaining: 1m 40s
175:	learn: 0.4273593	total: 21.5s	remaining: 1m 40s
176:	learn: 0.4265683	total: 21.6s	remaining: 1m 4

315:	learn: 0.3745021	total: 40s	remaining: 1m 26s
316:	learn: 0.3744830	total: 40s	remaining: 1m 26s
317:	learn: 0.3740722	total: 40.3s	remaining: 1m 26s
318:	learn: 0.3737578	total: 40.4s	remaining: 1m 26s
319:	learn: 0.3730967	total: 40.6s	remaining: 1m 26s
320:	learn: 0.3728869	total: 40.8s	remaining: 1m 26s
321:	learn: 0.3725073	total: 41s	remaining: 1m 26s
322:	learn: 0.3719352	total: 41.1s	remaining: 1m 26s
323:	learn: 0.3715894	total: 41.3s	remaining: 1m 26s
324:	learn: 0.3712990	total: 41.4s	remaining: 1m 25s
325:	learn: 0.3711153	total: 41.5s	remaining: 1m 25s
326:	learn: 0.3706712	total: 41.7s	remaining: 1m 25s
327:	learn: 0.3699760	total: 41.9s	remaining: 1m 25s
328:	learn: 0.3694991	total: 42s	remaining: 1m 25s
329:	learn: 0.3694624	total: 42s	remaining: 1m 25s
330:	learn: 0.3689546	total: 42.1s	remaining: 1m 25s
331:	learn: 0.3687552	total: 42.3s	remaining: 1m 25s
332:	learn: 0.3682874	total: 42.5s	remaining: 1m 25s
333:	learn: 0.3681988	total: 42.5s	remaining: 1m 24s
334

475:	learn: 0.3377698	total: 59.8s	remaining: 1m 5s
476:	learn: 0.3373804	total: 60s	remaining: 1m 5s
477:	learn: 0.3373804	total: 60s	remaining: 1m 5s
478:	learn: 0.3372191	total: 1m	remaining: 1m 5s
479:	learn: 0.3369333	total: 1m	remaining: 1m 5s
480:	learn: 0.3365176	total: 1m	remaining: 1m 5s
481:	learn: 0.3362943	total: 1m	remaining: 1m 5s
482:	learn: 0.3360688	total: 1m	remaining: 1m 5s
483:	learn: 0.3359504	total: 1m 1s	remaining: 1m 5s
484:	learn: 0.3357524	total: 1m 1s	remaining: 1m 5s
485:	learn: 0.3355154	total: 1m 1s	remaining: 1m 5s
486:	learn: 0.3352612	total: 1m 1s	remaining: 1m 4s
487:	learn: 0.3351319	total: 1m 1s	remaining: 1m 4s
488:	learn: 0.3350832	total: 1m 1s	remaining: 1m 4s
489:	learn: 0.3350797	total: 1m 1s	remaining: 1m 4s
490:	learn: 0.3347238	total: 1m 1s	remaining: 1m 4s
491:	learn: 0.3345301	total: 1m 1s	remaining: 1m 3s
492:	learn: 0.3343560	total: 1m 2s	remaining: 1m 3s
493:	learn: 0.3340100	total: 1m 2s	remaining: 1m 3s
494:	learn: 0.3336602	total: 1m

635:	learn: 0.3086240	total: 1m 21s	remaining: 46.9s
636:	learn: 0.3083143	total: 1m 22s	remaining: 46.8s
637:	learn: 0.3081494	total: 1m 22s	remaining: 46.7s
638:	learn: 0.3079909	total: 1m 22s	remaining: 46.5s
639:	learn: 0.3077881	total: 1m 22s	remaining: 46.4s
640:	learn: 0.3076371	total: 1m 22s	remaining: 46.3s
641:	learn: 0.3074550	total: 1m 22s	remaining: 46.1s
642:	learn: 0.3072524	total: 1m 22s	remaining: 46.1s
643:	learn: 0.3069968	total: 1m 23s	remaining: 45.9s
644:	learn: 0.3068263	total: 1m 23s	remaining: 45.8s
645:	learn: 0.3065964	total: 1m 23s	remaining: 45.7s
646:	learn: 0.3062717	total: 1m 23s	remaining: 45.6s
647:	learn: 0.3062713	total: 1m 23s	remaining: 45.4s
648:	learn: 0.3061328	total: 1m 23s	remaining: 45.3s
649:	learn: 0.3060495	total: 1m 23s	remaining: 45.1s
650:	learn: 0.3059272	total: 1m 23s	remaining: 44.9s
651:	learn: 0.3056561	total: 1m 24s	remaining: 44.8s
652:	learn: 0.3054762	total: 1m 24s	remaining: 44.7s
653:	learn: 0.3052092	total: 1m 24s	remaining:

791:	learn: 0.2864272	total: 1m 43s	remaining: 27.3s
792:	learn: 0.2863535	total: 1m 44s	remaining: 27.2s
793:	learn: 0.2863535	total: 1m 44s	remaining: 27s
794:	learn: 0.2862454	total: 1m 44s	remaining: 26.9s
795:	learn: 0.2861336	total: 1m 44s	remaining: 26.8s
796:	learn: 0.2860468	total: 1m 44s	remaining: 26.6s
797:	learn: 0.2857496	total: 1m 44s	remaining: 26.5s
798:	learn: 0.2856604	total: 1m 44s	remaining: 26.4s
799:	learn: 0.2855096	total: 1m 45s	remaining: 26.3s
800:	learn: 0.2853861	total: 1m 45s	remaining: 26.1s
801:	learn: 0.2853846	total: 1m 45s	remaining: 26s
802:	learn: 0.2852232	total: 1m 45s	remaining: 25.8s
803:	learn: 0.2850029	total: 1m 45s	remaining: 25.7s
804:	learn: 0.2848136	total: 1m 45s	remaining: 25.6s
805:	learn: 0.2845375	total: 1m 45s	remaining: 25.5s
806:	learn: 0.2843160	total: 1m 46s	remaining: 25.4s
807:	learn: 0.2841930	total: 1m 46s	remaining: 25.2s
808:	learn: 0.2839572	total: 1m 46s	remaining: 25.1s
809:	learn: 0.2838937	total: 1m 46s	remaining: 25s

948:	learn: 0.2682551	total: 2m 7s	remaining: 6.87s
949:	learn: 0.2681506	total: 2m 8s	remaining: 6.74s
950:	learn: 0.2680342	total: 2m 8s	remaining: 6.61s
951:	learn: 0.2680341	total: 2m 8s	remaining: 6.46s
952:	learn: 0.2678979	total: 2m 8s	remaining: 6.33s
953:	learn: 0.2678865	total: 2m 8s	remaining: 6.19s
954:	learn: 0.2677332	total: 2m 8s	remaining: 6.06s
955:	learn: 0.2676560	total: 2m 8s	remaining: 5.93s
956:	learn: 0.2675550	total: 2m 9s	remaining: 5.8s
957:	learn: 0.2675527	total: 2m 9s	remaining: 5.66s
958:	learn: 0.2673989	total: 2m 9s	remaining: 5.53s
959:	learn: 0.2673139	total: 2m 9s	remaining: 5.4s
960:	learn: 0.2672403	total: 2m 9s	remaining: 5.26s
961:	learn: 0.2672081	total: 2m 9s	remaining: 5.13s
962:	learn: 0.2671400	total: 2m 9s	remaining: 4.99s
963:	learn: 0.2670283	total: 2m 10s	remaining: 4.86s
964:	learn: 0.2670224	total: 2m 10s	remaining: 4.72s
965:	learn: 0.2670146	total: 2m 10s	remaining: 4.58s
966:	learn: 0.2670126	total: 2m 10s	remaining: 4.44s
967:	learn

In [None]:
print("Accuracy of Catboost   = {:.4f}".format(accuracy_score(y_test, y_pred_cat_pipeline)))

Accuracy of Catboost   = 0.8316


In [None]:
from sklearn.metrics import classification_report, confusion_matrix

print(classification_report(y_test, y_pred_cat_pipeline))
pd.DataFrame(confusion_matrix(y_test, y_pred_cat_pipeline))

# *XG Boost*

In [None]:
from xgboost import XGBClassifier

#Number of finished trials: 30
#Best trial:
#  Value: 0.8013131313131314
#  Params: 
#    booster: dart
#    lambda: 4.572637572518502e-07
#    alpha: 6.037662427475617e-05
#    subsample: 0.7162353406216146
#    colsample_bytree: 0.8486248682584188
#    max_depth: 7
#    min_child_weight: 9
#    eta: 0.3563123559925298
#    gamma: 5.017895421049517e-05
#    grow_policy: depthwise
#    sample_type: uniform
#    normalize_type: forest
#    rate_drop: 0.012104590680294654
#    skip_drop: 0.00036189755567904127

xg = XGBClassifier(booster = 'dart',
                   alpha =  6.037662427475617e-05,
                   subsample = 0.7162353406216146,
                   colsample_bytree = 0.8486248682584188,
                   max_depth = 7,
                   min_child_weight = 9,
                   eta = 0.3563123559925298,
                   gamma = 5.017895421049517e-05,
                   grow_policy = 'depthwise',
                   sample_type = 'uniform',
                   normalize_type = 'forest',
                   rate_drop = 0.012104590680294654,
                   skip_drop = 0.00036189755567904127,
                   objective='multi:softmax',
                   use_label_encoder = False)

xg_pipeline = Pipeline(steps = [('preprocess', preprocessor), ('xgboost', xg)])
xg_pipeline.fit(X_train,y_train)
y_pred_xg_pipeline = xg_pipeline.predict(X_test)

In [None]:
print("Accuracy of XGB   = {:.4f}".format(accuracy_score(y_test, y_pred_xg_pipeline)))

In [None]:
import optuna

def objective(trial):
    
  
    param = {
        "verbosity": 0,
        "objective": "binary:logistic",
        # use exact for small dataset.
        "tree_method": "exact",
        # defines booster, gblinear for linear functions.
        "booster": trial.suggest_categorical("booster", ["gbtree", "gblinear", "dart"]),
        # L2 regularization weight.
        "lambda": trial.suggest_float("lambda", 1e-8, 1.0, log=True),
        # L1 regularization weight.
        "alpha": trial.suggest_float("alpha", 1e-8, 1.0, log=True),
        # sampling ratio for training data.
        "subsample": trial.suggest_float("subsample", 0.2, 1.0),
        # sampling according to each tree.
        "colsample_bytree": trial.suggest_float("colsample_bytree", 0.2, 1.0),
    }

    if param["booster"] in ["gbtree", "dart"]:
        # maximum depth of the tree, signifies complexity of the tree.
        param["max_depth"] = trial.suggest_int("max_depth", 3, 9, step=2)
        # minimum child weight, larger the term more conservative the tree.
        param["min_child_weight"] = trial.suggest_int("min_child_weight", 2, 10)
        param["eta"] = trial.suggest_float("eta", 1e-8, 1.0, log=True)
        # defines how selective algorithm is.
        param["gamma"] = trial.suggest_float("gamma", 1e-8, 1.0, log=True)
        param["grow_policy"] = trial.suggest_categorical("grow_policy", ["depthwise", "lossguide"])

    if param["booster"] == "dart":
        param["sample_type"] = trial.suggest_categorical("sample_type", ["uniform", "weighted"])
        param["normalize_type"] = trial.suggest_categorical("normalize_type", ["tree", "forest"])
        param["rate_drop"] = trial.suggest_float("rate_drop", 1e-8, 1.0, log=True)
        param["skip_drop"] = trial.suggest_float("skip_drop", 1e-8, 1.0, log=True)

    xg = XGBClassifier(**param)
    xg_pipeline = Pipeline(steps = [('preprocess', preprocessor), ('xgboost', xg)])
    
    return cross_val_score(xg_pipeline, X, y, cv = 3).mean()


if __name__ == "__main__":
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=30)

    print("Number of finished trials: {}".format(len(study.trials)))

    print("Best trial:")
    trial = study.best_trial

    print("  Value: {}".format(trial.value))

    print("  Params: ")
    for key, value in trial.params.items():
        print("    {}: {}".format(key, value))

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

print(classification_report(y_test, y_pred_xg_pipeline))
pd.DataFrame(confusion_matrix(y_test, y_pred_xg_pipeline))

# *Extra Trees*

In [None]:
from sklearn.ensemble import ExtraTreesClassifier

xt = ExtraTreesClassifier(n_estimators=200,
                          random_state=42)

xt_pipeline = Pipeline(steps = [('preprocess', preprocessor), ('extra trees', xt)])
xt_pipeline.fit(X_train,y_train)
y_pred_xt_pipeline = xt_pipeline.predict(X_test)

In [None]:
print("Accuracy of EXTRA TREES   = {:.4f}".format(accuracy_score(y_test, y_pred_xt_pipeline)))

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

print(classification_report(y_test, y_pred_xt_pipeline))
pd.DataFrame(confusion_matrix(y_test, y_pred_xt_pipeline))

# *Bagging*

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier

bag =  BaggingClassifier(n_estimators=100,
                         max_features = 0.5,
                         random_state=42)

bag_pipeline = Pipeline(steps = [('preprocess', preprocessor), ('bagging', bag)])
bag_pipeline.fit(X_train,y_train)
y_pred_bag_pipeline = bag_pipeline.predict(X_test)

In [None]:
print("Accuracy of BAGGING   = {:.4f}".format(accuracy_score(y_test, y_pred_bag_pipeline)))

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

print(classification_report(y_test, y_pred_bag_pipeline))
pd.DataFrame(confusion_matrix(y_test, y_pred_bag_pipeline))

# *Voting Classifier*

In [None]:
from sklearn.ensemble import VotingClassifier

est_list = [('rf', rf), ('xgboost', xg), ('lgbm', lgbm)]


vclf = VotingClassifier(estimators = est_list, voting='soft')


vote_pipeline = Pipeline(steps = [('preprocess', preprocessor), ('voting', vclf)])

vote_pipeline.fit(X_train,y_train)
y_pred_vote_pipeline = vote_pipeline.predict(X_test)

In [None]:
print("Accuracy of VOTING = {:.4f}".format(accuracy_score(y_test, y_pred_vote_pipeline)))

In [None]:
from sklearn.ensemble import VotingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from catboost import CatBoostClassifier
from lightgbm import LGBMClassifier
from xgboost import XGBClassifier


rf = RandomForestClassifier(criterion = 'gini',
                            n_estimators = 536,
                            min_samples_split = 8,
                            max_depth = 20,
                            random_state = 42)

xg = XGBClassifier(booster = 'dart',
                   alpha =  6.037662427475617e-05,
                   subsample = 0.7162353406216146,
                   colsample_bytree = 0.8486248682584188,
                   max_depth = 7,
                   min_child_weight = 9,
                   eta = 0.3563123559925298,
                   gamma = 5.017895421049517e-05,
                   grow_policy = 'depthwise',
                   sample_type = 'uniform',
                   normalize_type = 'forest',
                   rate_drop = 0.012104590680294654,
                   skip_drop = 0.00036189755567904127,
                   objective='multi:softmax',
                   use_label_encoder = False)

xt = ExtraTreesClassifier(n_estimators=200,
                          random_state=42)


bag =  BaggingClassifier(n_estimators=100,
                         max_features = 0.5,
                         random_state=42)

cat = CatBoostClassifier(colsample_bylevel = 0.073699209523,
                         depth = 12,
                         boosting_type = 'Plain',
                         bootstrap_type = 'MVS',
                        random_state = 42)

lgbm = LGBMClassifier(boosting_type = 'gbdt',
                      objective = 'multiclass',
                      num_class = 3,
                      metric = 'multi_error',
                      num_iterations = 200,
                      lambda_l1 =  2.2899315163770417e-06,
                      lambda_l2 =  2.6273452242794607e-06,
                      num_leaves = 239,
                      feature_fraction = 0.5633644014015632,
                      learning_rate = 0.06012805964180289,
                      bagging_fraction = 0.6953776886469089,
                      bagging_freq = 6,
                      min_child_samples = 47,
                      min_data_in_leaf = 17,
                      max_depth = 46
                      )

rf_pipeline = Pipeline(steps = [('preprocess', preprocessor), ('rf', rf)])
xg_pipeline = Pipeline(steps = [('preprocess', preprocessor), ('xgboost', xg)])
xt_pipeline = Pipeline(steps = [('preprocess', preprocessor), ('extra trees', xt)])
bag_pipeline = Pipeline(steps = [('preprocess', preprocessor), ('bagging', bag)])
cat_pipeline = Pipeline(steps = [('preprocess', preprocessor), ('catboost', cat)])
lgbm_pipeline = Pipeline(steps = [('preprocess', preprocessor), ('lgbm', lgbm)])


est_list = [('rf', rf), ('xgboost', xg), ('extra trees', xt), ('bagging', bag), ('catboost', cat), ('lgbm', lgbm)]
vclf = VotingClassifier(estimators = est_list, voting='soft')


vote_pipeline = Pipeline(steps = [('preprocess', preprocessor), ('voting', vclf)])

vote_pipeline.fit(X,y)

Learning rate set to 0.097451
0:	learn: 1.0659227	total: 8.11ms	remaining: 8.1s
1:	learn: 1.0031807	total: 165ms	remaining: 1m 22s
2:	learn: 0.9601727	total: 339ms	remaining: 1m 52s
3:	learn: 0.9240684	total: 562ms	remaining: 2m 19s
4:	learn: 0.9073494	total: 569ms	remaining: 1m 53s
5:	learn: 0.8777925	total: 582ms	remaining: 1m 36s
6:	learn: 0.8447936	total: 719ms	remaining: 1m 41s
7:	learn: 0.8210596	total: 735ms	remaining: 1m 31s
8:	learn: 0.7980529	total: 812ms	remaining: 1m 29s
9:	learn: 0.7747224	total: 1.04s	remaining: 1m 43s
10:	learn: 0.7552244	total: 1.28s	remaining: 1m 55s
11:	learn: 0.7403637	total: 1.46s	remaining: 1m 59s
12:	learn: 0.7238737	total: 1.59s	remaining: 2m
13:	learn: 0.7104943	total: 1.75s	remaining: 2m 3s
14:	learn: 0.6962230	total: 1.92s	remaining: 2m 6s
15:	learn: 0.6927906	total: 1.93s	remaining: 1m 58s
16:	learn: 0.6810855	total: 2.13s	remaining: 2m 2s
17:	learn: 0.6692978	total: 2.29s	remaining: 2m 4s
18:	learn: 0.6603673	total: 2.43s	remaining: 2m 5s
19

152:	learn: 0.4403373	total: 18.8s	remaining: 1m 44s
153:	learn: 0.4402025	total: 18.8s	remaining: 1m 43s
154:	learn: 0.4393434	total: 19s	remaining: 1m 43s
155:	learn: 0.4393119	total: 19s	remaining: 1m 42s
156:	learn: 0.4388860	total: 19.2s	remaining: 1m 42s
157:	learn: 0.4387282	total: 19.2s	remaining: 1m 42s
158:	learn: 0.4380548	total: 19.4s	remaining: 1m 42s
159:	learn: 0.4374224	total: 19.5s	remaining: 1m 42s
160:	learn: 0.4366628	total: 19.7s	remaining: 1m 42s
161:	learn: 0.4358775	total: 19.9s	remaining: 1m 42s
162:	learn: 0.4352856	total: 20.1s	remaining: 1m 43s
163:	learn: 0.4346384	total: 20.2s	remaining: 1m 43s
164:	learn: 0.4341636	total: 20.4s	remaining: 1m 43s
165:	learn: 0.4331841	total: 20.5s	remaining: 1m 43s
166:	learn: 0.4324974	total: 20.7s	remaining: 1m 43s
167:	learn: 0.4324662	total: 20.7s	remaining: 1m 42s
168:	learn: 0.4318275	total: 20.8s	remaining: 1m 42s
169:	learn: 0.4317280	total: 20.9s	remaining: 1m 41s
170:	learn: 0.4311413	total: 21s	remaining: 1m 41s

309:	learn: 0.3783245	total: 39.1s	remaining: 1m 27s
310:	learn: 0.3779677	total: 39.3s	remaining: 1m 26s
311:	learn: 0.3776412	total: 39.4s	remaining: 1m 26s
312:	learn: 0.3774973	total: 39.4s	remaining: 1m 26s
313:	learn: 0.3770581	total: 39.6s	remaining: 1m 26s
314:	learn: 0.3766438	total: 39.8s	remaining: 1m 26s
315:	learn: 0.3766273	total: 39.8s	remaining: 1m 26s
316:	learn: 0.3761340	total: 40s	remaining: 1m 26s
317:	learn: 0.3761145	total: 40s	remaining: 1m 25s
318:	learn: 0.3757222	total: 40.2s	remaining: 1m 25s
319:	learn: 0.3754326	total: 40.3s	remaining: 1m 25s
320:	learn: 0.3749786	total: 40.5s	remaining: 1m 25s
321:	learn: 0.3747553	total: 40.6s	remaining: 1m 25s
322:	learn: 0.3744022	total: 40.8s	remaining: 1m 25s
323:	learn: 0.3738675	total: 40.9s	remaining: 1m 25s
324:	learn: 0.3736812	total: 41.2s	remaining: 1m 25s
325:	learn: 0.3732727	total: 41.3s	remaining: 1m 25s
326:	learn: 0.3731170	total: 41.5s	remaining: 1m 25s
327:	learn: 0.3728004	total: 41.6s	remaining: 1m 2

465:	learn: 0.3403127	total: 58.6s	remaining: 1m 7s
466:	learn: 0.3399554	total: 58.8s	remaining: 1m 7s
467:	learn: 0.3395683	total: 58.9s	remaining: 1m 7s
468:	learn: 0.3394495	total: 59.1s	remaining: 1m 6s
469:	learn: 0.3394357	total: 59.1s	remaining: 1m 6s
470:	learn: 0.3391082	total: 59.3s	remaining: 1m 6s
471:	learn: 0.3390259	total: 59.3s	remaining: 1m 6s
472:	learn: 0.3390235	total: 59.3s	remaining: 1m 6s
473:	learn: 0.3390204	total: 59.3s	remaining: 1m 5s
474:	learn: 0.3389819	total: 59.3s	remaining: 1m 5s
475:	learn: 0.3388426	total: 59.5s	remaining: 1m 5s
476:	learn: 0.3386892	total: 59.7s	remaining: 1m 5s
477:	learn: 0.3386892	total: 59.7s	remaining: 1m 5s
478:	learn: 0.3384678	total: 59.9s	remaining: 1m 5s
479:	learn: 0.3382929	total: 1m	remaining: 1m 5s
480:	learn: 0.3380034	total: 1m	remaining: 1m 4s
481:	learn: 0.3378478	total: 1m	remaining: 1m 4s
482:	learn: 0.3375705	total: 1m	remaining: 1m 4s
483:	learn: 0.3374383	total: 1m	remaining: 1m 4s
484:	learn: 0.3372454	total

624:	learn: 0.3130067	total: 1m 20s	remaining: 48.4s
625:	learn: 0.3130060	total: 1m 20s	remaining: 48.2s
626:	learn: 0.3128506	total: 1m 20s	remaining: 48.1s
627:	learn: 0.3126651	total: 1m 21s	remaining: 48.1s
628:	learn: 0.3126646	total: 1m 21s	remaining: 47.9s
629:	learn: 0.3124870	total: 1m 21s	remaining: 47.7s
630:	learn: 0.3122439	total: 1m 21s	remaining: 47.6s
631:	learn: 0.3120500	total: 1m 21s	remaining: 47.5s
632:	learn: 0.3118759	total: 1m 21s	remaining: 47.4s
633:	learn: 0.3117841	total: 1m 21s	remaining: 47.2s
634:	learn: 0.3117537	total: 1m 21s	remaining: 47s
635:	learn: 0.3116105	total: 1m 21s	remaining: 46.9s
636:	learn: 0.3114653	total: 1m 22s	remaining: 46.7s
637:	learn: 0.3111677	total: 1m 22s	remaining: 46.7s
638:	learn: 0.3109383	total: 1m 22s	remaining: 46.6s
639:	learn: 0.3107524	total: 1m 22s	remaining: 46.4s
640:	learn: 0.3105525	total: 1m 22s	remaining: 46.4s
641:	learn: 0.3104549	total: 1m 22s	remaining: 46.2s
642:	learn: 0.3102564	total: 1m 22s	remaining: 4

781:	learn: 0.2896919	total: 1m 42s	remaining: 28.4s
782:	learn: 0.2895107	total: 1m 42s	remaining: 28.3s
783:	learn: 0.2894123	total: 1m 42s	remaining: 28.2s
784:	learn: 0.2894107	total: 1m 42s	remaining: 28s
785:	learn: 0.2892354	total: 1m 42s	remaining: 27.9s
786:	learn: 0.2892319	total: 1m 42s	remaining: 27.8s
787:	learn: 0.2891491	total: 1m 42s	remaining: 27.6s
788:	learn: 0.2890310	total: 1m 42s	remaining: 27.5s
789:	learn: 0.2888265	total: 1m 43s	remaining: 27.4s
790:	learn: 0.2886883	total: 1m 43s	remaining: 27.3s
791:	learn: 0.2884643	total: 1m 43s	remaining: 27.2s
792:	learn: 0.2884642	total: 1m 43s	remaining: 27s
793:	learn: 0.2883599	total: 1m 43s	remaining: 26.9s
794:	learn: 0.2883599	total: 1m 43s	remaining: 26.7s
795:	learn: 0.2882717	total: 1m 43s	remaining: 26.6s
796:	learn: 0.2881264	total: 1m 44s	remaining: 26.5s
797:	learn: 0.2880070	total: 1m 44s	remaining: 26.4s
798:	learn: 0.2877037	total: 1m 44s	remaining: 26.2s
799:	learn: 0.2876089	total: 1m 44s	remaining: 26.

938:	learn: 0.2709692	total: 2m 4s	remaining: 8.07s
939:	learn: 0.2707305	total: 2m 4s	remaining: 7.94s
940:	learn: 0.2705530	total: 2m 4s	remaining: 7.81s
941:	learn: 0.2704053	total: 2m 4s	remaining: 7.68s
942:	learn: 0.2702779	total: 2m 4s	remaining: 7.55s
943:	learn: 0.2701672	total: 2m 5s	remaining: 7.42s
944:	learn: 0.2700988	total: 2m 5s	remaining: 7.29s
945:	learn: 0.2700723	total: 2m 5s	remaining: 7.16s
946:	learn: 0.2699815	total: 2m 5s	remaining: 7.03s
947:	learn: 0.2699726	total: 2m 5s	remaining: 6.89s
948:	learn: 0.2698101	total: 2m 5s	remaining: 6.76s
949:	learn: 0.2697199	total: 2m 5s	remaining: 6.63s
950:	learn: 0.2696275	total: 2m 6s	remaining: 6.5s
951:	learn: 0.2695372	total: 2m 6s	remaining: 6.37s
952:	learn: 0.2695372	total: 2m 6s	remaining: 6.23s
953:	learn: 0.2694499	total: 2m 6s	remaining: 6.1s
954:	learn: 0.2694319	total: 2m 6s	remaining: 5.96s
955:	learn: 0.2692543	total: 2m 6s	remaining: 5.83s
956:	learn: 0.2691688	total: 2m 6s	remaining: 5.7s
957:	learn: 0.2





Pipeline(steps=[('preprocess',
                 Pipeline(steps=[('ct',
                                  ColumnTransformer(remainder='passthrough',
                                                    sparse_threshold=0,
                                                    transformers=[('categorical',
                                                                   Pipeline(steps=[('coalescer',
                                                                                    MyCategoryCoalescer(cat_cols=['funder',
                                                                                                                  'installer',
                                                                                                                  'subvillage',
                                                                                                                  'lga',
                                                                                                     

In [None]:
accuracy = cross_val_score(vote_pipeline, X, y, cv = 5).mean()
accuracy

Learning rate set to 0.096348
0:	learn: 1.0662106	total: 25.5ms	remaining: 25.4s
1:	learn: 1.0055175	total: 208ms	remaining: 1m 43s
2:	learn: 0.9641282	total: 386ms	remaining: 2m 8s
3:	learn: 0.9284220	total: 604ms	remaining: 2m 30s
4:	learn: 0.9116710	total: 608ms	remaining: 2m 1s
5:	learn: 0.8820628	total: 622ms	remaining: 1m 42s
6:	learn: 0.8492782	total: 756ms	remaining: 1m 47s
7:	learn: 0.8254330	total: 772ms	remaining: 1m 35s
8:	learn: 0.8012324	total: 947ms	remaining: 1m 44s
9:	learn: 0.7782813	total: 1.11s	remaining: 1m 50s
10:	learn: 0.7607554	total: 1.3s	remaining: 1m 56s
11:	learn: 0.7457564	total: 1.54s	remaining: 2m 7s
12:	learn: 0.7400120	total: 1.55s	remaining: 1m 58s
13:	learn: 0.7286522	total: 1.73s	remaining: 2m 1s
14:	learn: 0.7176508	total: 1.89s	remaining: 2m 4s
15:	learn: 0.7025285	total: 2.01s	remaining: 2m 3s
16:	learn: 0.6900635	total: 2.17s	remaining: 2m 5s
17:	learn: 0.6782761	total: 2.29s	remaining: 2m 4s
18:	learn: 0.6701938	total: 2.37s	remaining: 2m 2s
19

152:	learn: 0.4375658	total: 17.2s	remaining: 1m 35s
153:	learn: 0.4374834	total: 17.2s	remaining: 1m 34s
154:	learn: 0.4366718	total: 17.4s	remaining: 1m 34s
155:	learn: 0.4366372	total: 17.4s	remaining: 1m 34s
156:	learn: 0.4357443	total: 17.6s	remaining: 1m 34s
157:	learn: 0.4355994	total: 17.6s	remaining: 1m 33s
158:	learn: 0.4350086	total: 17.7s	remaining: 1m 33s
159:	learn: 0.4343317	total: 17.9s	remaining: 1m 33s
160:	learn: 0.4336795	total: 18s	remaining: 1m 33s
161:	learn: 0.4329200	total: 18.1s	remaining: 1m 33s
162:	learn: 0.4320716	total: 18.3s	remaining: 1m 33s
163:	learn: 0.4316295	total: 18.4s	remaining: 1m 33s
164:	learn: 0.4311422	total: 18.5s	remaining: 1m 33s
165:	learn: 0.4304690	total: 18.6s	remaining: 1m 33s
166:	learn: 0.4296733	total: 18.7s	remaining: 1m 33s
167:	learn: 0.4296453	total: 18.8s	remaining: 1m 32s
168:	learn: 0.4291134	total: 18.9s	remaining: 1m 32s
169:	learn: 0.4289988	total: 18.9s	remaining: 1m 32s
170:	learn: 0.4281587	total: 19.1s	remaining: 1m

308:	learn: 0.3704911	total: 35.7s	remaining: 1m 19s
309:	learn: 0.3701856	total: 35.9s	remaining: 1m 19s
310:	learn: 0.3697434	total: 36s	remaining: 1m 19s
311:	learn: 0.3695011	total: 36.1s	remaining: 1m 19s
312:	learn: 0.3693657	total: 36.2s	remaining: 1m 19s
313:	learn: 0.3690623	total: 36.3s	remaining: 1m 19s
314:	learn: 0.3686133	total: 36.5s	remaining: 1m 19s
315:	learn: 0.3685828	total: 36.5s	remaining: 1m 18s
316:	learn: 0.3678970	total: 36.6s	remaining: 1m 18s
317:	learn: 0.3678720	total: 36.7s	remaining: 1m 18s
318:	learn: 0.3674194	total: 36.8s	remaining: 1m 18s
319:	learn: 0.3669094	total: 36.9s	remaining: 1m 18s
320:	learn: 0.3663014	total: 37.1s	remaining: 1m 18s
321:	learn: 0.3659432	total: 37.3s	remaining: 1m 18s
322:	learn: 0.3656073	total: 37.4s	remaining: 1m 18s
323:	learn: 0.3650639	total: 37.6s	remaining: 1m 18s
324:	learn: 0.3647431	total: 37.7s	remaining: 1m 18s
325:	learn: 0.3645442	total: 37.9s	remaining: 1m 18s
326:	learn: 0.3642978	total: 38s	remaining: 1m 1

465:	learn: 0.3319020	total: 54.3s	remaining: 1m 2s
466:	learn: 0.3316974	total: 54.4s	remaining: 1m 2s
467:	learn: 0.3313906	total: 54.6s	remaining: 1m 2s
468:	learn: 0.3311284	total: 54.7s	remaining: 1m 1s
469:	learn: 0.3307859	total: 54.8s	remaining: 1m 1s
470:	learn: 0.3305874	total: 54.8s	remaining: 1m 1s
471:	learn: 0.3303364	total: 55s	remaining: 1m 1s
472:	learn: 0.3302645	total: 55s	remaining: 1m 1s
473:	learn: 0.3302601	total: 55s	remaining: 1m 1s
474:	learn: 0.3302549	total: 55s	remaining: 1m
475:	learn: 0.3302168	total: 55.1s	remaining: 1m
476:	learn: 0.3300163	total: 55.2s	remaining: 1m
477:	learn: 0.3296675	total: 55.4s	remaining: 1m
478:	learn: 0.3296675	total: 55.4s	remaining: 1m
479:	learn: 0.3295009	total: 55.6s	remaining: 1m
480:	learn: 0.3292071	total: 55.7s	remaining: 1m
481:	learn: 0.3287889	total: 55.9s	remaining: 1m
482:	learn: 0.3286466	total: 56.1s	remaining: 1m
483:	learn: 0.3282715	total: 56.3s	remaining: 60s
484:	learn: 0.3280302	total: 56.4s	remaining: 59.

624:	learn: 0.3008819	total: 1m 13s	remaining: 44.2s
625:	learn: 0.3008812	total: 1m 13s	remaining: 44s
626:	learn: 0.3007533	total: 1m 13s	remaining: 43.9s
627:	learn: 0.3004857	total: 1m 13s	remaining: 43.8s
628:	learn: 0.3004853	total: 1m 13s	remaining: 43.6s
629:	learn: 0.3002665	total: 1m 14s	remaining: 43.5s
630:	learn: 0.3000554	total: 1m 14s	remaining: 43.4s
631:	learn: 0.2998818	total: 1m 14s	remaining: 43.3s
632:	learn: 0.2997265	total: 1m 14s	remaining: 43.2s
633:	learn: 0.2996635	total: 1m 14s	remaining: 43.1s
634:	learn: 0.2996199	total: 1m 14s	remaining: 42.9s
635:	learn: 0.2995086	total: 1m 14s	remaining: 42.7s
636:	learn: 0.2993318	total: 1m 14s	remaining: 42.6s
637:	learn: 0.2990376	total: 1m 14s	remaining: 42.5s
638:	learn: 0.2988515	total: 1m 15s	remaining: 42.4s
639:	learn: 0.2986224	total: 1m 15s	remaining: 42.3s
640:	learn: 0.2983358	total: 1m 15s	remaining: 42.2s
641:	learn: 0.2982207	total: 1m 15s	remaining: 42.1s
642:	learn: 0.2980776	total: 1m 15s	remaining: 4

781:	learn: 0.2779239	total: 1m 33s	remaining: 26s
782:	learn: 0.2777626	total: 1m 33s	remaining: 25.8s
783:	learn: 0.2775692	total: 1m 33s	remaining: 25.7s
784:	learn: 0.2775663	total: 1m 33s	remaining: 25.6s
785:	learn: 0.2774203	total: 1m 33s	remaining: 25.5s
786:	learn: 0.2774147	total: 1m 33s	remaining: 25.3s
787:	learn: 0.2773024	total: 1m 33s	remaining: 25.2s
788:	learn: 0.2771348	total: 1m 33s	remaining: 25.1s
789:	learn: 0.2768593	total: 1m 34s	remaining: 25s
790:	learn: 0.2765231	total: 1m 34s	remaining: 24.9s
791:	learn: 0.2763866	total: 1m 34s	remaining: 24.8s
792:	learn: 0.2763865	total: 1m 34s	remaining: 24.6s
793:	learn: 0.2762627	total: 1m 34s	remaining: 24.5s
794:	learn: 0.2762627	total: 1m 34s	remaining: 24.4s
795:	learn: 0.2760685	total: 1m 34s	remaining: 24.3s
796:	learn: 0.2759316	total: 1m 34s	remaining: 24.2s
797:	learn: 0.2757501	total: 1m 34s	remaining: 24s
798:	learn: 0.2756661	total: 1m 35s	remaining: 23.9s
799:	learn: 0.2755215	total: 1m 35s	remaining: 23.8s

938:	learn: 0.2583280	total: 1m 53s	remaining: 7.36s
939:	learn: 0.2582204	total: 1m 53s	remaining: 7.25s
940:	learn: 0.2580948	total: 1m 53s	remaining: 7.13s
941:	learn: 0.2580694	total: 1m 53s	remaining: 7.01s
942:	learn: 0.2579520	total: 1m 53s	remaining: 6.88s
943:	learn: 0.2577112	total: 1m 54s	remaining: 6.77s
944:	learn: 0.2576552	total: 1m 54s	remaining: 6.65s
945:	learn: 0.2575840	total: 1m 54s	remaining: 6.53s
946:	learn: 0.2574647	total: 1m 54s	remaining: 6.41s
947:	learn: 0.2574545	total: 1m 54s	remaining: 6.29s
948:	learn: 0.2573184	total: 1m 54s	remaining: 6.17s
949:	learn: 0.2571324	total: 1m 54s	remaining: 6.05s
950:	learn: 0.2570455	total: 1m 55s	remaining: 5.93s
951:	learn: 0.2568910	total: 1m 55s	remaining: 5.81s
952:	learn: 0.2568910	total: 1m 55s	remaining: 5.69s
953:	learn: 0.2568108	total: 1m 55s	remaining: 5.57s
954:	learn: 0.2567931	total: 1m 55s	remaining: 5.44s
955:	learn: 0.2567177	total: 1m 55s	remaining: 5.32s
956:	learn: 0.2566101	total: 1m 55s	remaining:



Learning rate set to 0.096348
0:	learn: 1.0662106	total: 6.72ms	remaining: 6.71s
1:	learn: 1.0049880	total: 160ms	remaining: 1m 19s
2:	learn: 0.9619796	total: 346ms	remaining: 1m 54s
3:	learn: 0.9264835	total: 574ms	remaining: 2m 22s
4:	learn: 0.9097474	total: 579ms	remaining: 1m 55s
5:	learn: 0.8799344	total: 588ms	remaining: 1m 37s
6:	learn: 0.8470813	total: 716ms	remaining: 1m 41s
7:	learn: 0.8232466	total: 730ms	remaining: 1m 30s
8:	learn: 0.8000808	total: 809ms	remaining: 1m 29s
9:	learn: 0.7774065	total: 976ms	remaining: 1m 36s
10:	learn: 0.7587979	total: 1.16s	remaining: 1m 44s
11:	learn: 0.7440654	total: 1.3s	remaining: 1m 47s
12:	learn: 0.7284885	total: 1.42s	remaining: 1m 47s
13:	learn: 0.7146462	total: 1.62s	remaining: 1m 54s
14:	learn: 0.7005351	total: 1.85s	remaining: 2m 1s
15:	learn: 0.6970237	total: 1.86s	remaining: 1m 54s
16:	learn: 0.6847977	total: 2.07s	remaining: 1m 59s
17:	learn: 0.6729304	total: 2.23s	remaining: 2m 1s
18:	learn: 0.6634511	total: 2.37s	remaining: 2m

136:	learn: 0.4455080	total: 15.3s	remaining: 1m 36s
137:	learn: 0.4448341	total: 15.5s	remaining: 1m 36s
138:	learn: 0.4442134	total: 15.6s	remaining: 1m 36s
139:	learn: 0.4437498	total: 15.6s	remaining: 1m 36s
140:	learn: 0.4428618	total: 15.8s	remaining: 1m 36s
141:	learn: 0.4420177	total: 16s	remaining: 1m 36s
142:	learn: 0.4415102	total: 16.1s	remaining: 1m 36s
143:	learn: 0.4405624	total: 16.3s	remaining: 1m 36s
144:	learn: 0.4397510	total: 16.4s	remaining: 1m 36s
145:	learn: 0.4394058	total: 16.6s	remaining: 1m 37s
146:	learn: 0.4387608	total: 16.8s	remaining: 1m 37s
147:	learn: 0.4382709	total: 17s	remaining: 1m 37s
148:	learn: 0.4374149	total: 17.1s	remaining: 1m 37s
149:	learn: 0.4373955	total: 17.1s	remaining: 1m 37s
150:	learn: 0.4370644	total: 17.2s	remaining: 1m 36s
151:	learn: 0.4362261	total: 17.3s	remaining: 1m 36s
152:	learn: 0.4352638	total: 17.4s	remaining: 1m 36s
153:	learn: 0.4351450	total: 17.4s	remaining: 1m 35s
154:	learn: 0.4342585	total: 17.6s	remaining: 1m 3

293:	learn: 0.3701411	total: 34.7s	remaining: 1m 23s
294:	learn: 0.3698811	total: 34.9s	remaining: 1m 23s
295:	learn: 0.3694134	total: 35s	remaining: 1m 23s
296:	learn: 0.3691064	total: 35.2s	remaining: 1m 23s
297:	learn: 0.3689372	total: 35.2s	remaining: 1m 23s
298:	learn: 0.3689371	total: 35.2s	remaining: 1m 22s
299:	learn: 0.3683766	total: 35.4s	remaining: 1m 22s
300:	learn: 0.3683636	total: 35.4s	remaining: 1m 22s
301:	learn: 0.3679500	total: 35.5s	remaining: 1m 22s
302:	learn: 0.3678859	total: 35.5s	remaining: 1m 21s
303:	learn: 0.3678806	total: 35.6s	remaining: 1m 21s
304:	learn: 0.3675346	total: 35.7s	remaining: 1m 21s
305:	learn: 0.3672492	total: 35.9s	remaining: 1m 21s
306:	learn: 0.3669421	total: 36.1s	remaining: 1m 21s
307:	learn: 0.3668749	total: 36.1s	remaining: 1m 21s
308:	learn: 0.3665505	total: 36.2s	remaining: 1m 20s
309:	learn: 0.3662750	total: 36.3s	remaining: 1m 20s
310:	learn: 0.3656860	total: 36.5s	remaining: 1m 20s
311:	learn: 0.3654708	total: 36.6s	remaining: 1m

451:	learn: 0.3299668	total: 52.9s	remaining: 1m 4s
452:	learn: 0.3299572	total: 52.9s	remaining: 1m 3s
453:	learn: 0.3297179	total: 53.1s	remaining: 1m 3s
454:	learn: 0.3297111	total: 53.1s	remaining: 1m 3s
455:	learn: 0.3294106	total: 53.3s	remaining: 1m 3s
456:	learn: 0.3294080	total: 53.3s	remaining: 1m 3s
457:	learn: 0.3292200	total: 53.4s	remaining: 1m 3s
458:	learn: 0.3291622	total: 53.5s	remaining: 1m 3s
459:	learn: 0.3289799	total: 53.6s	remaining: 1m 2s
460:	learn: 0.3284573	total: 53.8s	remaining: 1m 2s
461:	learn: 0.3281233	total: 53.9s	remaining: 1m 2s
462:	learn: 0.3278951	total: 54.1s	remaining: 1m 2s
463:	learn: 0.3276614	total: 54.2s	remaining: 1m 2s
464:	learn: 0.3274075	total: 54.4s	remaining: 1m 2s
465:	learn: 0.3270360	total: 54.5s	remaining: 1m 2s
466:	learn: 0.3266930	total: 54.6s	remaining: 1m 2s
467:	learn: 0.3264233	total: 54.8s	remaining: 1m 2s
468:	learn: 0.3260656	total: 54.9s	remaining: 1m 2s
469:	learn: 0.3258238	total: 55s	remaining: 1m 2s
470:	learn: 0.

612:	learn: 0.2995678	total: 1m 12s	remaining: 45.7s
613:	learn: 0.2992407	total: 1m 12s	remaining: 45.6s
614:	learn: 0.2992315	total: 1m 12s	remaining: 45.4s
615:	learn: 0.2990998	total: 1m 12s	remaining: 45.3s
616:	learn: 0.2987727	total: 1m 12s	remaining: 45.2s
617:	learn: 0.2985237	total: 1m 12s	remaining: 45.1s
618:	learn: 0.2982551	total: 1m 13s	remaining: 45s
619:	learn: 0.2980942	total: 1m 13s	remaining: 44.9s
620:	learn: 0.2980942	total: 1m 13s	remaining: 44.7s
621:	learn: 0.2979020	total: 1m 13s	remaining: 44.6s
622:	learn: 0.2977307	total: 1m 13s	remaining: 44.4s
623:	learn: 0.2975904	total: 1m 13s	remaining: 44.3s
624:	learn: 0.2975903	total: 1m 13s	remaining: 44.1s
625:	learn: 0.2975161	total: 1m 13s	remaining: 44.1s
626:	learn: 0.2975154	total: 1m 13s	remaining: 43.9s
627:	learn: 0.2974457	total: 1m 13s	remaining: 43.8s
628:	learn: 0.2972354	total: 1m 14s	remaining: 43.7s
629:	learn: 0.2972350	total: 1m 14s	remaining: 43.5s
630:	learn: 0.2970534	total: 1m 14s	remaining: 4

769:	learn: 0.2770067	total: 1m 32s	remaining: 27.5s
770:	learn: 0.2770067	total: 1m 32s	remaining: 27.4s
771:	learn: 0.2768708	total: 1m 32s	remaining: 27.3s
772:	learn: 0.2767612	total: 1m 32s	remaining: 27.2s
773:	learn: 0.2767577	total: 1m 32s	remaining: 27s
774:	learn: 0.2766368	total: 1m 32s	remaining: 26.9s
775:	learn: 0.2765364	total: 1m 32s	remaining: 26.8s
776:	learn: 0.2763855	total: 1m 32s	remaining: 26.7s
777:	learn: 0.2762870	total: 1m 32s	remaining: 26.5s
778:	learn: 0.2761483	total: 1m 33s	remaining: 26.4s
779:	learn: 0.2760812	total: 1m 33s	remaining: 26.3s
780:	learn: 0.2759642	total: 1m 33s	remaining: 26.2s
781:	learn: 0.2758839	total: 1m 33s	remaining: 26.1s
782:	learn: 0.2757310	total: 1m 33s	remaining: 25.9s
783:	learn: 0.2755899	total: 1m 33s	remaining: 25.8s
784:	learn: 0.2754710	total: 1m 33s	remaining: 25.7s
785:	learn: 0.2754682	total: 1m 33s	remaining: 25.6s
786:	learn: 0.2753260	total: 1m 34s	remaining: 25.5s
787:	learn: 0.2753203	total: 1m 34s	remaining: 2

925:	learn: 0.2601627	total: 1m 51s	remaining: 8.94s
926:	learn: 0.2599920	total: 1m 52s	remaining: 8.83s
927:	learn: 0.2598756	total: 1m 52s	remaining: 8.71s
928:	learn: 0.2597154	total: 1m 52s	remaining: 8.59s
929:	learn: 0.2596160	total: 1m 52s	remaining: 8.47s
930:	learn: 0.2595419	total: 1m 52s	remaining: 8.36s
931:	learn: 0.2594121	total: 1m 52s	remaining: 8.24s
932:	learn: 0.2593445	total: 1m 53s	remaining: 8.12s
933:	learn: 0.2592646	total: 1m 53s	remaining: 8s
934:	learn: 0.2591727	total: 1m 53s	remaining: 7.88s
935:	learn: 0.2590540	total: 1m 53s	remaining: 7.76s
936:	learn: 0.2589307	total: 1m 53s	remaining: 7.64s
937:	learn: 0.2587908	total: 1m 53s	remaining: 7.52s
938:	learn: 0.2585914	total: 1m 53s	remaining: 7.4s
939:	learn: 0.2585198	total: 1m 54s	remaining: 7.28s
940:	learn: 0.2582627	total: 1m 54s	remaining: 7.16s
941:	learn: 0.2581035	total: 1m 54s	remaining: 7.04s
942:	learn: 0.2579783	total: 1m 54s	remaining: 6.92s
943:	learn: 0.2578766	total: 1m 54s	remaining: 6.8



Learning rate set to 0.096348
0:	learn: 1.0662139	total: 6.01ms	remaining: 6s
1:	learn: 1.0053053	total: 148ms	remaining: 1m 14s
2:	learn: 0.9622515	total: 314ms	remaining: 1m 44s
3:	learn: 0.9268280	total: 538ms	remaining: 2m 13s
4:	learn: 0.9101294	total: 544ms	remaining: 1m 48s
5:	learn: 0.8808428	total: 555ms	remaining: 1m 31s
6:	learn: 0.8483636	total: 680ms	remaining: 1m 36s
7:	learn: 0.8246628	total: 694ms	remaining: 1m 26s
8:	learn: 0.8009427	total: 861ms	remaining: 1m 34s
9:	learn: 0.7777861	total: 1.04s	remaining: 1m 42s
10:	learn: 0.7601659	total: 1.15s	remaining: 1m 43s
11:	learn: 0.7451269	total: 1.31s	remaining: 1m 47s
12:	learn: 0.7392809	total: 1.31s	remaining: 1m 39s
13:	learn: 0.7280006	total: 1.52s	remaining: 1m 47s
14:	learn: 0.7173827	total: 1.74s	remaining: 1m 53s
15:	learn: 0.7035043	total: 1.87s	remaining: 1m 55s
16:	learn: 0.6910581	total: 2.06s	remaining: 1m 59s
17:	learn: 0.6793138	total: 2.17s	remaining: 1m 58s
18:	learn: 0.6702787	total: 2.26s	remaining: 1m

137:	learn: 0.4467703	total: 17.3s	remaining: 1m 48s
138:	learn: 0.4465027	total: 17.4s	remaining: 1m 47s
139:	learn: 0.4455086	total: 17.6s	remaining: 1m 47s
140:	learn: 0.4447381	total: 17.8s	remaining: 1m 48s
141:	learn: 0.4442942	total: 17.9s	remaining: 1m 48s
142:	learn: 0.4434885	total: 18.1s	remaining: 1m 48s
143:	learn: 0.4427257	total: 18.2s	remaining: 1m 48s
144:	learn: 0.4422728	total: 18.4s	remaining: 1m 48s
145:	learn: 0.4418143	total: 18.6s	remaining: 1m 48s
146:	learn: 0.4412073	total: 18.8s	remaining: 1m 48s
147:	learn: 0.4405992	total: 19s	remaining: 1m 49s
148:	learn: 0.4405811	total: 19s	remaining: 1m 48s
149:	learn: 0.4400369	total: 19.1s	remaining: 1m 47s
150:	learn: 0.4393977	total: 19.2s	remaining: 1m 47s
151:	learn: 0.4385968	total: 19.3s	remaining: 1m 47s
152:	learn: 0.4384499	total: 19.3s	remaining: 1m 47s
153:	learn: 0.4376912	total: 19.5s	remaining: 1m 47s
154:	learn: 0.4376705	total: 19.5s	remaining: 1m 46s
155:	learn: 0.4369151	total: 19.7s	remaining: 1m 4

294:	learn: 0.3736016	total: 38.8s	remaining: 1m 32s
295:	learn: 0.3730502	total: 39s	remaining: 1m 32s
296:	learn: 0.3728249	total: 39s	remaining: 1m 32s
297:	learn: 0.3728249	total: 39s	remaining: 1m 31s
298:	learn: 0.3724946	total: 39.2s	remaining: 1m 31s
299:	learn: 0.3724805	total: 39.2s	remaining: 1m 31s
300:	learn: 0.3722021	total: 39.3s	remaining: 1m 31s
301:	learn: 0.3721628	total: 39.4s	remaining: 1m 30s
302:	learn: 0.3721582	total: 39.4s	remaining: 1m 30s
303:	learn: 0.3715389	total: 39.6s	remaining: 1m 30s
304:	learn: 0.3712072	total: 39.7s	remaining: 1m 30s
305:	learn: 0.3710042	total: 39.9s	remaining: 1m 30s
306:	learn: 0.3709593	total: 39.9s	remaining: 1m 30s
307:	learn: 0.3705006	total: 40.1s	remaining: 1m 30s
308:	learn: 0.3702615	total: 40.2s	remaining: 1m 29s
309:	learn: 0.3698634	total: 40.4s	remaining: 1m 29s
310:	learn: 0.3695413	total: 40.6s	remaining: 1m 29s
311:	learn: 0.3693967	total: 40.6s	remaining: 1m 29s
312:	learn: 0.3690906	total: 40.7s	remaining: 1m 29s

451:	learn: 0.3341572	total: 58.7s	remaining: 1m 11s
452:	learn: 0.3340199	total: 58.8s	remaining: 1m 11s
453:	learn: 0.3339978	total: 58.9s	remaining: 1m 10s
454:	learn: 0.3336943	total: 59.1s	remaining: 1m 10s
455:	learn: 0.3336912	total: 59.1s	remaining: 1m 10s
456:	learn: 0.3334656	total: 59.2s	remaining: 1m 10s
457:	learn: 0.3334205	total: 59.3s	remaining: 1m 10s
458:	learn: 0.3333261	total: 59.4s	remaining: 1m 10s
459:	learn: 0.3328160	total: 59.6s	remaining: 1m 9s
460:	learn: 0.3327030	total: 59.8s	remaining: 1m 9s
461:	learn: 0.3324239	total: 60s	remaining: 1m 9s
462:	learn: 0.3322457	total: 1m	remaining: 1m 9s
463:	learn: 0.3317032	total: 1m	remaining: 1m 9s
464:	learn: 0.3312836	total: 1m	remaining: 1m 9s
465:	learn: 0.3309754	total: 1m	remaining: 1m 9s
466:	learn: 0.3305846	total: 1m	remaining: 1m 9s
467:	learn: 0.3302714	total: 1m	remaining: 1m 9s
468:	learn: 0.3299438	total: 1m	remaining: 1m 9s
469:	learn: 0.3297724	total: 1m 1s	remaining: 1m 8s
470:	learn: 0.3294753	total

611:	learn: 0.3022550	total: 1m 20s	remaining: 50.9s
612:	learn: 0.3022405	total: 1m 20s	remaining: 50.7s
613:	learn: 0.3020158	total: 1m 20s	remaining: 50.6s
614:	learn: 0.3018102	total: 1m 20s	remaining: 50.5s
615:	learn: 0.3015824	total: 1m 20s	remaining: 50.4s
616:	learn: 0.3014017	total: 1m 20s	remaining: 50.2s
617:	learn: 0.3012994	total: 1m 21s	remaining: 50.1s
618:	learn: 0.3012994	total: 1m 21s	remaining: 49.9s
619:	learn: 0.3011781	total: 1m 21s	remaining: 49.8s
620:	learn: 0.3009798	total: 1m 21s	remaining: 49.6s
621:	learn: 0.3008244	total: 1m 21s	remaining: 49.5s
622:	learn: 0.3008243	total: 1m 21s	remaining: 49.3s
623:	learn: 0.3006543	total: 1m 21s	remaining: 49.2s
624:	learn: 0.3006538	total: 1m 21s	remaining: 49s
625:	learn: 0.3004150	total: 1m 21s	remaining: 48.9s
626:	learn: 0.3001092	total: 1m 22s	remaining: 48.8s
627:	learn: 0.3001086	total: 1m 22s	remaining: 48.6s
628:	learn: 0.2998297	total: 1m 22s	remaining: 48.5s
629:	learn: 0.2995553	total: 1m 22s	remaining: 4

767:	learn: 0.2790721	total: 1m 42s	remaining: 31.1s
768:	learn: 0.2790721	total: 1m 42s	remaining: 30.9s
769:	learn: 0.2788957	total: 1m 43s	remaining: 30.8s
770:	learn: 0.2787620	total: 1m 43s	remaining: 30.7s
771:	learn: 0.2787600	total: 1m 43s	remaining: 30.5s
772:	learn: 0.2785719	total: 1m 43s	remaining: 30.4s
773:	learn: 0.2784850	total: 1m 43s	remaining: 30.2s
774:	learn: 0.2782855	total: 1m 43s	remaining: 30.1s
775:	learn: 0.2781574	total: 1m 43s	remaining: 30s
776:	learn: 0.2780428	total: 1m 43s	remaining: 29.8s
777:	learn: 0.2780059	total: 1m 44s	remaining: 29.7s
778:	learn: 0.2778792	total: 1m 44s	remaining: 29.6s
779:	learn: 0.2777450	total: 1m 44s	remaining: 29.4s
780:	learn: 0.2773937	total: 1m 44s	remaining: 29.3s
781:	learn: 0.2772894	total: 1m 44s	remaining: 29.2s
782:	learn: 0.2771488	total: 1m 44s	remaining: 29.1s
783:	learn: 0.2771458	total: 1m 44s	remaining: 28.9s
784:	learn: 0.2769935	total: 1m 45s	remaining: 28.8s
785:	learn: 0.2769875	total: 1m 45s	remaining: 2

925:	learn: 0.2591586	total: 2m 5s	remaining: 10s
926:	learn: 0.2589379	total: 2m 5s	remaining: 9.91s
927:	learn: 0.2588200	total: 2m 6s	remaining: 9.78s
928:	learn: 0.2586759	total: 2m 6s	remaining: 9.65s
929:	learn: 0.2585492	total: 2m 6s	remaining: 9.52s
930:	learn: 0.2584531	total: 2m 6s	remaining: 9.39s
931:	learn: 0.2583518	total: 2m 6s	remaining: 9.25s
932:	learn: 0.2582836	total: 2m 7s	remaining: 9.12s
933:	learn: 0.2581466	total: 2m 7s	remaining: 8.99s
934:	learn: 0.2580514	total: 2m 7s	remaining: 8.86s
935:	learn: 0.2579349	total: 2m 7s	remaining: 8.73s
936:	learn: 0.2576597	total: 2m 7s	remaining: 8.59s
937:	learn: 0.2575945	total: 2m 8s	remaining: 8.46s
938:	learn: 0.2574394	total: 2m 8s	remaining: 8.33s
939:	learn: 0.2573247	total: 2m 8s	remaining: 8.2s
940:	learn: 0.2571950	total: 2m 8s	remaining: 8.06s
941:	learn: 0.2570568	total: 2m 8s	remaining: 7.92s
942:	learn: 0.2568561	total: 2m 8s	remaining: 7.79s
943:	learn: 0.2567291	total: 2m 9s	remaining: 7.66s
944:	learn: 0.2



Learning rate set to 0.096348
0:	learn: 1.0662629	total: 6.24ms	remaining: 6.23s
1:	learn: 1.0053069	total: 165ms	remaining: 1m 22s
2:	learn: 0.9638780	total: 346ms	remaining: 1m 55s
3:	learn: 0.9280433	total: 585ms	remaining: 2m 25s
4:	learn: 0.9113051	total: 590ms	remaining: 1m 57s
5:	learn: 0.8815810	total: 598ms	remaining: 1m 39s
6:	learn: 0.8487538	total: 735ms	remaining: 1m 44s
7:	learn: 0.8247986	total: 751ms	remaining: 1m 33s
8:	learn: 0.8003525	total: 929ms	remaining: 1m 42s
9:	learn: 0.7774032	total: 1.12s	remaining: 1m 50s
10:	learn: 0.7603855	total: 1.25s	remaining: 1m 52s
11:	learn: 0.7452717	total: 1.42s	remaining: 1m 56s
12:	learn: 0.7395526	total: 1.43s	remaining: 1m 48s
13:	learn: 0.7283487	total: 1.68s	remaining: 1m 58s
14:	learn: 0.7175164	total: 1.94s	remaining: 2m 7s
15:	learn: 0.7029750	total: 2.1s	remaining: 2m 9s
16:	learn: 0.6914709	total: 2.34s	remaining: 2m 15s
17:	learn: 0.6798209	total: 2.49s	remaining: 2m 15s
18:	learn: 0.6709894	total: 2.6s	remaining: 2m 

136:	learn: 0.4425481	total: 20.5s	remaining: 2m 9s
137:	learn: 0.4424962	total: 20.5s	remaining: 2m 8s
138:	learn: 0.4424806	total: 20.5s	remaining: 2m 7s
139:	learn: 0.4413740	total: 20.7s	remaining: 2m 6s
140:	learn: 0.4404879	total: 20.8s	remaining: 2m 6s
141:	learn: 0.4403642	total: 20.8s	remaining: 2m 5s
142:	learn: 0.4395023	total: 21s	remaining: 2m 5s
143:	learn: 0.4388653	total: 21.2s	remaining: 2m 6s
144:	learn: 0.4380257	total: 21.4s	remaining: 2m 6s
145:	learn: 0.4368572	total: 21.6s	remaining: 2m 6s
146:	learn: 0.4362410	total: 21.7s	remaining: 2m 6s
147:	learn: 0.4362318	total: 21.7s	remaining: 2m 5s
148:	learn: 0.4360330	total: 21.8s	remaining: 2m 4s
149:	learn: 0.4355514	total: 21.8s	remaining: 2m 3s
150:	learn: 0.4344214	total: 21.9s	remaining: 2m 3s
151:	learn: 0.4337629	total: 22.1s	remaining: 2m 3s
152:	learn: 0.4336603	total: 22.1s	remaining: 2m 2s
153:	learn: 0.4328119	total: 22.3s	remaining: 2m 2s
154:	learn: 0.4327830	total: 22.4s	remaining: 2m 1s
155:	learn: 0.

292:	learn: 0.3722568	total: 42.1s	remaining: 1m 41s
293:	learn: 0.3718806	total: 42.3s	remaining: 1m 41s
294:	learn: 0.3715839	total: 42.4s	remaining: 1m 41s
295:	learn: 0.3711939	total: 42.6s	remaining: 1m 41s
296:	learn: 0.3710402	total: 42.7s	remaining: 1m 40s
297:	learn: 0.3710401	total: 42.7s	remaining: 1m 40s
298:	learn: 0.3705037	total: 42.9s	remaining: 1m 40s
299:	learn: 0.3704962	total: 42.9s	remaining: 1m 40s
300:	learn: 0.3701476	total: 43s	remaining: 1m 39s
301:	learn: 0.3700902	total: 43s	remaining: 1m 39s
302:	learn: 0.3700856	total: 43s	remaining: 1m 38s
303:	learn: 0.3696055	total: 43.2s	remaining: 1m 38s
304:	learn: 0.3693555	total: 43.4s	remaining: 1m 38s
305:	learn: 0.3690840	total: 43.6s	remaining: 1m 38s
306:	learn: 0.3690299	total: 43.6s	remaining: 1m 38s
307:	learn: 0.3687395	total: 43.7s	remaining: 1m 38s
308:	learn: 0.3684844	total: 43.9s	remaining: 1m 38s
309:	learn: 0.3680201	total: 44.1s	remaining: 1m 38s
310:	learn: 0.3675933	total: 44.2s	remaining: 1m 37s

448:	learn: 0.3330375	total: 1m 3s	remaining: 1m 17s
449:	learn: 0.3328795	total: 1m 3s	remaining: 1m 17s
450:	learn: 0.3324835	total: 1m 3s	remaining: 1m 17s
451:	learn: 0.3324779	total: 1m 3s	remaining: 1m 17s
452:	learn: 0.3322554	total: 1m 4s	remaining: 1m 17s
453:	learn: 0.3322410	total: 1m 4s	remaining: 1m 17s
454:	learn: 0.3319669	total: 1m 4s	remaining: 1m 17s
455:	learn: 0.3319644	total: 1m 4s	remaining: 1m 16s
456:	learn: 0.3316683	total: 1m 4s	remaining: 1m 16s
457:	learn: 0.3316259	total: 1m 4s	remaining: 1m 16s
458:	learn: 0.3312970	total: 1m 4s	remaining: 1m 16s
459:	learn: 0.3308962	total: 1m 5s	remaining: 1m 16s
460:	learn: 0.3304566	total: 1m 5s	remaining: 1m 16s
461:	learn: 0.3301146	total: 1m 5s	remaining: 1m 16s
462:	learn: 0.3298889	total: 1m 5s	remaining: 1m 16s
463:	learn: 0.3294549	total: 1m 5s	remaining: 1m 16s
464:	learn: 0.3291574	total: 1m 6s	remaining: 1m 15s
465:	learn: 0.3289501	total: 1m 6s	remaining: 1m 15s
466:	learn: 0.3285711	total: 1m 6s	remaining: 

603:	learn: 0.3021744	total: 1m 30s	remaining: 59.1s
604:	learn: 0.3020528	total: 1m 30s	remaining: 59s
605:	learn: 0.3018920	total: 1m 30s	remaining: 58.9s
606:	learn: 0.3018844	total: 1m 30s	remaining: 58.7s
607:	learn: 0.3018746	total: 1m 30s	remaining: 58.4s
608:	learn: 0.3017093	total: 1m 30s	remaining: 58.3s
609:	learn: 0.3017080	total: 1m 30s	remaining: 58.1s
610:	learn: 0.3014605	total: 1m 31s	remaining: 58s
611:	learn: 0.3012505	total: 1m 31s	remaining: 57.9s
612:	learn: 0.3012393	total: 1m 31s	remaining: 57.7s
613:	learn: 0.3011149	total: 1m 31s	remaining: 57.5s
614:	learn: 0.3009194	total: 1m 31s	remaining: 57.4s
615:	learn: 0.3007454	total: 1m 31s	remaining: 57.2s
616:	learn: 0.3005146	total: 1m 31s	remaining: 57.1s
617:	learn: 0.3002950	total: 1m 32s	remaining: 57s
618:	learn: 0.3002950	total: 1m 32s	remaining: 56.7s
619:	learn: 0.3001336	total: 1m 32s	remaining: 56.6s
620:	learn: 0.2999683	total: 1m 32s	remaining: 56.4s
621:	learn: 0.2997469	total: 1m 32s	remaining: 56.3s

759:	learn: 0.2787109	total: 1m 56s	remaining: 36.7s
760:	learn: 0.2785444	total: 1m 56s	remaining: 36.6s
761:	learn: 0.2784310	total: 1m 56s	remaining: 36.4s
762:	learn: 0.2784305	total: 1m 56s	remaining: 36.2s
763:	learn: 0.2782554	total: 1m 56s	remaining: 36.1s
764:	learn: 0.2782362	total: 1m 56s	remaining: 35.9s
765:	learn: 0.2781876	total: 1m 56s	remaining: 35.7s
766:	learn: 0.2781200	total: 1m 56s	remaining: 35.5s
767:	learn: 0.2778475	total: 1m 57s	remaining: 35.4s
768:	learn: 0.2778475	total: 1m 57s	remaining: 35.2s
769:	learn: 0.2777184	total: 1m 57s	remaining: 35s
770:	learn: 0.2775571	total: 1m 57s	remaining: 34.9s
771:	learn: 0.2775535	total: 1m 57s	remaining: 34.7s
772:	learn: 0.2773874	total: 1m 57s	remaining: 34.5s
773:	learn: 0.2772848	total: 1m 57s	remaining: 34.4s
774:	learn: 0.2771421	total: 1m 57s	remaining: 34.3s
775:	learn: 0.2770198	total: 1m 58s	remaining: 34.1s
776:	learn: 0.2768767	total: 1m 58s	remaining: 33.9s
777:	learn: 0.2768043	total: 1m 58s	remaining: 3

916:	learn: 0.2602770	total: 2m 20s	remaining: 12.7s
917:	learn: 0.2602263	total: 2m 20s	remaining: 12.6s
918:	learn: 0.2601734	total: 2m 20s	remaining: 12.4s
919:	learn: 0.2600753	total: 2m 21s	remaining: 12.3s
920:	learn: 0.2599109	total: 2m 21s	remaining: 12.1s
921:	learn: 0.2597873	total: 2m 21s	remaining: 12s
922:	learn: 0.2597154	total: 2m 21s	remaining: 11.8s
923:	learn: 0.2595929	total: 2m 22s	remaining: 11.7s
924:	learn: 0.2595346	total: 2m 22s	remaining: 11.5s
925:	learn: 0.2594267	total: 2m 22s	remaining: 11.4s
926:	learn: 0.2592607	total: 2m 22s	remaining: 11.2s
927:	learn: 0.2591211	total: 2m 23s	remaining: 11.1s
928:	learn: 0.2590370	total: 2m 23s	remaining: 10.9s
929:	learn: 0.2589822	total: 2m 23s	remaining: 10.8s
930:	learn: 0.2588863	total: 2m 23s	remaining: 10.7s
931:	learn: 0.2587594	total: 2m 23s	remaining: 10.5s
932:	learn: 0.2586723	total: 2m 24s	remaining: 10.4s
933:	learn: 0.2585761	total: 2m 24s	remaining: 10.2s
934:	learn: 0.2584800	total: 2m 24s	remaining: 1



Learning rate set to 0.096348
0:	learn: 1.0663001	total: 6.08ms	remaining: 6.07s
1:	learn: 1.0055817	total: 147ms	remaining: 1m 13s
2:	learn: 0.9642062	total: 308ms	remaining: 1m 42s
3:	learn: 0.9283761	total: 512ms	remaining: 2m 7s
4:	learn: 0.9116350	total: 517ms	remaining: 1m 42s
5:	learn: 0.8821244	total: 530ms	remaining: 1m 27s
6:	learn: 0.8495188	total: 655ms	remaining: 1m 32s
7:	learn: 0.8257838	total: 670ms	remaining: 1m 23s
8:	learn: 0.8025724	total: 745ms	remaining: 1m 22s
9:	learn: 0.7803001	total: 921ms	remaining: 1m 31s
10:	learn: 0.7608301	total: 1.09s	remaining: 1m 38s
11:	learn: 0.7456018	total: 1.24s	remaining: 1m 42s
12:	learn: 0.7290866	total: 1.35s	remaining: 1m 42s
13:	learn: 0.7154185	total: 1.5s	remaining: 1m 45s
14:	learn: 0.7014376	total: 1.73s	remaining: 1m 53s
15:	learn: 0.6978841	total: 1.74s	remaining: 1m 47s
16:	learn: 0.6861193	total: 1.97s	remaining: 1m 53s
17:	learn: 0.6739063	total: 2.12s	remaining: 1m 55s
18:	learn: 0.6650172	total: 2.26s	remaining: 1

136:	learn: 0.4429629	total: 15.5s	remaining: 1m 37s
137:	learn: 0.4426179	total: 15.5s	remaining: 1m 37s
138:	learn: 0.4419876	total: 15.7s	remaining: 1m 37s
139:	learn: 0.4410069	total: 15.8s	remaining: 1m 37s
140:	learn: 0.4405120	total: 16s	remaining: 1m 37s
141:	learn: 0.4398129	total: 16.1s	remaining: 1m 37s
142:	learn: 0.4390658	total: 16.2s	remaining: 1m 37s
143:	learn: 0.4384389	total: 16.4s	remaining: 1m 37s
144:	learn: 0.4378772	total: 16.5s	remaining: 1m 37s
145:	learn: 0.4368874	total: 16.7s	remaining: 1m 37s
146:	learn: 0.4362560	total: 16.9s	remaining: 1m 37s
147:	learn: 0.4362343	total: 16.9s	remaining: 1m 37s
148:	learn: 0.4356539	total: 16.9s	remaining: 1m 36s
149:	learn: 0.4351167	total: 17s	remaining: 1m 36s
150:	learn: 0.4343126	total: 17.2s	remaining: 1m 36s
151:	learn: 0.4341770	total: 17.2s	remaining: 1m 35s
152:	learn: 0.4337432	total: 17.3s	remaining: 1m 35s
153:	learn: 0.4337166	total: 17.3s	remaining: 1m 35s
154:	learn: 0.4324684	total: 17.5s	remaining: 1m 3

293:	learn: 0.3699685	total: 35.4s	remaining: 1m 25s
294:	learn: 0.3695512	total: 35.6s	remaining: 1m 25s
295:	learn: 0.3694140	total: 35.6s	remaining: 1m 24s
296:	learn: 0.3694139	total: 35.7s	remaining: 1m 24s
297:	learn: 0.3691134	total: 35.9s	remaining: 1m 24s
298:	learn: 0.3691030	total: 35.9s	remaining: 1m 24s
299:	learn: 0.3688130	total: 36.1s	remaining: 1m 24s
300:	learn: 0.3687679	total: 36.1s	remaining: 1m 23s
301:	learn: 0.3687651	total: 36.1s	remaining: 1m 23s
302:	learn: 0.3683689	total: 36.4s	remaining: 1m 23s
303:	learn: 0.3680734	total: 36.6s	remaining: 1m 23s
304:	learn: 0.3677793	total: 36.8s	remaining: 1m 23s
305:	learn: 0.3677475	total: 36.8s	remaining: 1m 23s
306:	learn: 0.3675810	total: 37s	remaining: 1m 23s
307:	learn: 0.3672317	total: 37.2s	remaining: 1m 23s
308:	learn: 0.3667256	total: 37.4s	remaining: 1m 23s
309:	learn: 0.3664162	total: 37.5s	remaining: 1m 23s
310:	learn: 0.3663109	total: 37.6s	remaining: 1m 23s
311:	learn: 0.3659992	total: 37.7s	remaining: 1m

449:	learn: 0.3323489	total: 53.9s	remaining: 1m 5s
450:	learn: 0.3323440	total: 53.9s	remaining: 1m 5s
451:	learn: 0.3321149	total: 54s	remaining: 1m 5s
452:	learn: 0.3320990	total: 54.1s	remaining: 1m 5s
453:	learn: 0.3317842	total: 54.3s	remaining: 1m 5s
454:	learn: 0.3317809	total: 54.3s	remaining: 1m 5s
455:	learn: 0.3315732	total: 54.4s	remaining: 1m 4s
456:	learn: 0.3315416	total: 54.4s	remaining: 1m 4s
457:	learn: 0.3313212	total: 54.6s	remaining: 1m 4s
458:	learn: 0.3309754	total: 54.8s	remaining: 1m 4s
459:	learn: 0.3305702	total: 54.9s	remaining: 1m 4s
460:	learn: 0.3303695	total: 55.1s	remaining: 1m 4s
461:	learn: 0.3301739	total: 55.2s	remaining: 1m 4s
462:	learn: 0.3298719	total: 55.3s	remaining: 1m 4s
463:	learn: 0.3296402	total: 55.5s	remaining: 1m 4s
464:	learn: 0.3293640	total: 55.6s	remaining: 1m 3s
465:	learn: 0.3291455	total: 55.8s	remaining: 1m 3s
466:	learn: 0.3287813	total: 55.9s	remaining: 1m 3s
467:	learn: 0.3285631	total: 56s	remaining: 1m 3s
468:	learn: 0.32

609:	learn: 0.3010804	total: 1m 13s	remaining: 47.3s
610:	learn: 0.3007827	total: 1m 14s	remaining: 47.2s
611:	learn: 0.3007755	total: 1m 14s	remaining: 47s
612:	learn: 0.3006318	total: 1m 14s	remaining: 46.9s
613:	learn: 0.3003111	total: 1m 14s	remaining: 46.7s
614:	learn: 0.2999312	total: 1m 14s	remaining: 46.7s
615:	learn: 0.2997477	total: 1m 14s	remaining: 46.5s
616:	learn: 0.2995232	total: 1m 14s	remaining: 46.4s
617:	learn: 0.2995231	total: 1m 14s	remaining: 46.2s
618:	learn: 0.2993575	total: 1m 14s	remaining: 46.1s
619:	learn: 0.2991052	total: 1m 15s	remaining: 46s
620:	learn: 0.2989839	total: 1m 15s	remaining: 45.8s
621:	learn: 0.2989839	total: 1m 15s	remaining: 45.7s
622:	learn: 0.2988114	total: 1m 15s	remaining: 45.6s
623:	learn: 0.2988107	total: 1m 15s	remaining: 45.4s
624:	learn: 0.2986029	total: 1m 15s	remaining: 45.3s
625:	learn: 0.2985124	total: 1m 15s	remaining: 45.2s
626:	learn: 0.2985119	total: 1m 15s	remaining: 45s
627:	learn: 0.2983244	total: 1m 15s	remaining: 44.9s

766:	learn: 0.2766456	total: 1m 34s	remaining: 28.8s
767:	learn: 0.2766456	total: 1m 34s	remaining: 28.6s
768:	learn: 0.2765275	total: 1m 34s	remaining: 28.5s
769:	learn: 0.2763961	total: 1m 34s	remaining: 28.4s
770:	learn: 0.2763903	total: 1m 35s	remaining: 28.2s
771:	learn: 0.2762707	total: 1m 35s	remaining: 28.1s
772:	learn: 0.2761704	total: 1m 35s	remaining: 28s
773:	learn: 0.2760923	total: 1m 35s	remaining: 27.9s
774:	learn: 0.2760483	total: 1m 35s	remaining: 27.7s
775:	learn: 0.2759176	total: 1m 35s	remaining: 27.6s
776:	learn: 0.2758549	total: 1m 35s	remaining: 27.5s
777:	learn: 0.2757022	total: 1m 35s	remaining: 27.3s
778:	learn: 0.2755953	total: 1m 35s	remaining: 27.2s
779:	learn: 0.2754764	total: 1m 36s	remaining: 27.1s
780:	learn: 0.2752099	total: 1m 36s	remaining: 27s
781:	learn: 0.2750841	total: 1m 36s	remaining: 26.9s
782:	learn: 0.2750828	total: 1m 36s	remaining: 26.7s
783:	learn: 0.2749019	total: 1m 36s	remaining: 26.6s
784:	learn: 0.2748969	total: 1m 36s	remaining: 26.

923:	learn: 0.2587206	total: 1m 54s	remaining: 9.42s
924:	learn: 0.2586718	total: 1m 54s	remaining: 9.3s
925:	learn: 0.2585531	total: 1m 54s	remaining: 9.18s
926:	learn: 0.2583531	total: 1m 55s	remaining: 9.06s
927:	learn: 0.2582359	total: 1m 55s	remaining: 8.94s
928:	learn: 0.2581534	total: 1m 55s	remaining: 8.81s
929:	learn: 0.2580092	total: 1m 55s	remaining: 8.7s
930:	learn: 0.2579288	total: 1m 55s	remaining: 8.57s
931:	learn: 0.2578830	total: 1m 55s	remaining: 8.45s
932:	learn: 0.2578141	total: 1m 55s	remaining: 8.33s
933:	learn: 0.2577178	total: 1m 56s	remaining: 8.2s
934:	learn: 0.2576538	total: 1m 56s	remaining: 8.08s
935:	learn: 0.2575119	total: 1m 56s	remaining: 7.96s
936:	learn: 0.2573300	total: 1m 56s	remaining: 7.83s
937:	learn: 0.2572299	total: 1m 56s	remaining: 7.71s
938:	learn: 0.2570701	total: 1m 56s	remaining: 7.59s
939:	learn: 0.2569746	total: 1m 57s	remaining: 7.47s
940:	learn: 0.2568895	total: 1m 57s	remaining: 7.35s
941:	learn: 0.2568295	total: 1m 57s	remaining: 7.





0.8176430976430977

In [None]:
accuracy

0.8176430976430977

In [None]:
print("Accuracy of VOTING = {:.4f}".format(accuracy_score(y_test, y_pred_voting_pipeline)))

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

print(classification_report(y_test, y_pred_vote_pipeline))
pd.DataFrame(confusion_matrix(y_test, y_pred_vote_pipeline))

# *Stacking*

In [None]:
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression

#est_list = [('rf', rf), ('xgboost', xg), ('extra trees', xt), ('bagging', bag), ('catboost', cat), ('lgbm', lgbm)]
est_list = [('rf', rf), ('xgboost', xg), ('lgbm', lgbm)]

sclf = StackingClassifier(estimators = est_list,
                          final_estimator = LogisticRegression())

stacking_pipeline = Pipeline(steps = [('preprocess', preprocessor), ('stacking', sclf)])

stacking_pipeline.fit(X_train,y_train)
y_pred_stacking_pipeline = stacking_pipeline.predict(X_test)

In [None]:
print("Accuracy of STACKING = {:.4f}".format(accuracy_score(y_test, y_pred_stacking_pipeline)))

In [None]:
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression

est_list = [('rf', rf), ('xgboost', xg), ('extra trees', xt), ('bagging', bag), ('catboost', cat), ('lgbm', lgbm)]
#est_list = [('rf', rf), ('xgboost', xg), ('lgbm', lgbm)]

sclf = StackingClassifier(estimators = est_list,
                          final_estimator = LogisticRegression())

stacking_pipeline = Pipeline(steps = [('preprocess', preprocessor), ('stacking', sclf)])

stacking_pipeline.fit(X_train,y_train)
y_pred_stacking_pipeline = stacking_pipeline.predict(X_test)

In [None]:
print("Accuracy of STACKING = {:.4f}".format(accuracy_score(y_test, y_pred_stacking_pipeline)))

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

print(classification_report(y_test, y_pred_stacking_pipeline))
pd.DataFrame(confusion_matrix(y_test, y_pred_stacking_pipeline))

# **Predictions to CSV**

In [None]:
# Predictions
# Uncomment whichever model's prediction is desired

#RF
#y_pred_test = rf_pipeline.predict(test_set)

#XGBoost
#y_pred_test = xg_pipeline.predict(test_set)

#Extra Trees
#y_pred_test = xt_pipeline.predict(test_set)

#Stacking
#y_pred_test = stacking_pipeline.predict(test_set)

#Voting
y_pred_test = vote_pipeline.predict(test_set)

#{'functional': 0, 'functional needs repair': 1, 'non functional': 2}

In [None]:
predictions = pd.DataFrame(
                            {'id': test.id,
                           'status_group': y_pred_test}
                         )
predictions

Unnamed: 0,id,status_group
59400,50785,2
59401,51630,0
59402,17168,0
59403,45559,2
59404,49871,0
...,...,...
74245,39307,2
74246,18990,0
74247,28749,0
74248,33492,0


In [None]:
predictions.loc[predictions['status_group'] == 0, 'status_group'] = 'functional'
predictions.loc[predictions['status_group'] == 1, 'status_group'] = 'functional needs repair'
predictions.loc[predictions['status_group'] == 2, 'status_group'] = 'non functional'

In [None]:
predictions

Unnamed: 0,id,status_group
59400,50785,non functional
59401,51630,functional
59402,17168,functional
59403,45559,non functional
59404,49871,functional
...,...,...
74245,39307,non functional
74246,18990,functional
74247,28749,functional
74248,33492,functional


In [None]:
# Saving file
predictions.to_csv('my_submission.csv', header=True, index=False)

#from google.colab import files
#files.download('my_submission.csv')