**В этом ноутбуке** происходят попытки выбить максимальный скор из случайного леса. Discussion на kaggle показывают, что лес здесь неплохо справляется

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from tqdm import tqdm_notebook
from sklearn.ensemble import RandomForestClassifier
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.model_selection import train_test_split, KFold
from sklearn.metrics import f1_score
import gc

### 0. Скрипты для загрузки данных, обучения, кросс-валидации, загрузки предсказаний в файл

Загрузка данных

In [2]:
from feature_engineering import reduce_mem_usage, add_rolling_features
from feature_engineering import exponential_smoothing, signal_shifts
from feature_engineering import batch_stats2, add_minus_signal
from feature_engineering import delete_objects_after_rolling
from feature_engineering import add_quantiles, add_target_encoding

In [3]:
def prepare_df(df, shifts):
    df = reduce_mem_usage(df)
    df = signal_shifts(df, shifts)
    df = reduce_mem_usage(df)
    
    if 'open_channels' in df.columns:
        y = df['open_channels']
        df = df.drop(columns=['time'])
        return df, y
    else:
        df = df.drop(columns=['time'])
        return df

In [4]:
train = pd.read_csv('data/train_clean.csv')
test = pd.read_csv('data/test_clean.csv')

shifts = list(np.arange(-20, 0)) + list(np.arange(1, 21))

train["category"] = 0
test["category"] = 0
    
# train segments with more then 9 open channels classes
train.loc[2_000_000:2_500_000-1, 'category'] = 1
train.loc[4_500_000:5_000_000-1, 'category'] = 1
    
# test segments with more then 9 open channels classes (potentially)
test.loc[500_000:600_000-1, "category"] = 1
test.loc[700_000:800_000-1, "category"] = 1

X_train, y_train = prepare_df(train, shifts)
X_test = prepare_df(test, shifts)

y_train = np.array(y_train)

# add_quantiles(X_train, X_test, [3, 7, 15])
# add_target_encoding(X_train, X_test, [3, 7, 15])

X_train = reduce_mem_usage(X_train)
X_test = reduce_mem_usage(X_test)

X_train = X_train.drop(columns=['open_channels'])

Mem. usage decreased to 28.61 Mb (81.2% reduction)
Mem. usage decreased to 410.08 Mb (14.0% reduction)
Mem. usage decreased to  9.54 Mb (79.2% reduction)
Mem. usage decreased to 162.12 Mb (7.6% reduction)
Mem. usage decreased to 400.54 Mb (0.0% reduction)
Mem. usage decreased to 158.31 Mb (0.0% reduction)


Обучение леса

In [5]:
def fit_model(X_train, y_train, params):
    gc.collect()
    
    print('splitting...')
    X_train_, X_valid, y_train_, y_valid = train_test_split(X_train, y_train, 
                                                            test_size=0.3,
                                                            random_state=17)
    print('fit...')
    model = RandomForestClassifier(**params)
    model.fit(X_train_, y_train_)
    
    print('predict...')
    prediction = model.predict(X_valid)
    score = f1_score(y_valid, prediction, average = 'macro')
    
    print(f'score = {score}')
    
    return model


def fit_model_with_save(X_train, y_train, X_test, params, modelname):
    gc.collect()
    
    print('fit...')
    model = RandomForestClassifier(**params)
    model.fit(X_train, y_train)
    
    print('predict...')
    prediction = model.predict(X_test)
    np.save(modelname + '_test_preds.npy', prediction)
    
    print('saving predictions...')
    sample_df = pd.read_csv("data/sample_submission.csv", dtype={'time':str})
    sample_df['open_channels'] = prediction
    sample_df.to_csv(modelname + '.csv', index=False, float_format='%.4f')
    
    print('probapility...')
    probs = model.predict_proba(X_test)
    np.save(modelname + '_test_probs.npy', probs)
    
    return model

Кросс-валидация

In [6]:
def cv_loop(X_train, y_train, X_test, params, modelname):
    n_fold = 5
    folds = KFold(n_splits=n_fold, shuffle=True, random_state=17)
    
    oof = np.zeros(X_train.shape[0])
    oof_probs = np.zeros((X_train.shape[0], 11))
    
    prediction = np.zeros(X_test.shape[0])
    scores = []
    
    for training_index, validation_index in tqdm_notebook(folds.split(X_train), total=n_fold):
        gc.collect()
        
        # разбиение на трэйн и валидацию
        X_train_ = X_train.iloc[training_index]
        y_train_ = y_train[training_index]
        X_valid = X_train.iloc[validation_index]
        y_valid = y_train[validation_index]
        
        # обучение модели
        model = RandomForestClassifier(**params)
        model.fit(X_train_, y_train_)

        # скор на валидации
        preds = model.predict(X_valid)
        oof[validation_index] = preds
        score = f1_score(y_valid, preds, average = 'macro')
        scores.append(score)
        
        # вероятности на валидации
        probs = model.predict_proba(X_valid)
        oof_probs[validation_index] = probs
        
        # предсказание на тесте
        preds = model.predict(X_test)
        prediction += preds
        
        print(f'score: {score}')
        
    prediction /= n_fold
    prediction = np.round(np.clip(prediction, 0, 10)).astype(int)
                               
    np.save(modelname + '_cv_test_preds.npy', prediction)
    np.save(modelname + '_oof_preds.npy', oof)
    np.save(modelname + '_oof_probs.npy', oof_probs)
    
    sample_df = pd.read_csv("data/sample_submission.csv", dtype={'time':str})
    sample_df['open_channels'] = prediction
    sample_df.to_csv(modelname + '_cv.csv', index=False, float_format='%.4f')
    
    return scores, oof, prediction

### 1. Бейзлайн

Параметры для RF и для признаков возьмем из этого ноутбука: https://www.kaggle.com/sggpls/shifted-rfc-pipeline

In [7]:
%%time

params = {
    'n_estimators': 200,
    'max_depth': 19,
    'max_features': 10,
    'random_state': 17,
    'n_jobs': -1,
    'verbose': 2
}

scores, oof, prediction = cv_loop(X_train, y_train, X_test, params, 'rf1')

HBox(children=(IntProgress(value=0, max=5), HTML(value='')))

[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 8 concurrent workers.


building tree 1 of 200building tree 2 of 200building tree 3 of 200building tree 4 of 200building tree 5 of 200building tree 6 of 200building tree 7 of 200building tree 8 of 200







building tree 9 of 200
building tree 10 of 200
building tree 11 of 200
building tree 12 of 200
building tree 13 of 200
building tree 14 of 200
building tree 15 of 200
building tree 16 of 200
building tree 17 of 200
building tree 18 of 200
building tree 19 of 200
building tree 20 of 200
building tree 21 of 200
building tree 22 of 200
building tree 23 of 200
building tree 24 of 200
building tree 25 of 200
building tree 26 of 200
building tree 27 of 200
building tree 28 of 200
building tree 29 of 200
building tree 30 of 200
building tree 31 of 200
building tree 32 of 200


[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:  7.6min


building tree 33 of 200
building tree 34 of 200
building tree 35 of 200
building tree 36 of 200
building tree 37 of 200
building tree 38 of 200
building tree 39 of 200
building tree 40 of 200
building tree 41 of 200
building tree 42 of 200
building tree 43 of 200
building tree 44 of 200
building tree 45 of 200
building tree 46 of 200
building tree 47 of 200
building tree 48 of 200
building tree 49 of 200
building tree 50 of 200
building tree 51 of 200
building tree 52 of 200
building tree 53 of 200
building tree 54 of 200
building tree 55 of 200
building tree 56 of 200
building tree 57 of 200
building tree 58 of 200
building tree 59 of 200
building tree 60 of 200
building tree 61 of 200
building tree 62 of 200
building tree 63 of 200building tree 64 of 200

building tree 65 of 200
building tree 66 of 200
building tree 67 of 200
building tree 68 of 200
building tree 69 of 200
building tree 70 of 200
building tree 71 of 200
building tree 72 of 200
building tree 73 of 200
building tree 74

[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed: 36.7min


building tree 154 of 200
building tree 155 of 200
building tree 156 of 200
building tree 157 of 200
building tree 158 of 200
building tree 159 of 200
building tree 160 of 200
building tree 161 of 200
building tree 162 of 200
building tree 163 of 200
building tree 164 of 200
building tree 165 of 200
building tree 166 of 200
building tree 167 of 200
building tree 168 of 200
building tree 169 of 200
building tree 170 of 200
building tree 171 of 200
building tree 172 of 200
building tree 173 of 200
building tree 174 of 200
building tree 175 of 200
building tree 176 of 200
building tree 177 of 200
building tree 178 of 200
building tree 179 of 200
building tree 180 of 200
building tree 181 of 200
building tree 182 of 200
building tree 183 of 200
building tree 184 of 200
building tree 185 of 200
building tree 186 of 200
building tree 187 of 200
building tree 188 of 200
building tree 189 of 200
building tree 190 of 200
building tree 191 of 200
building tree 192 of 200
building tree 193 of 200


[Parallel(n_jobs=-1)]: Done 200 out of 200 | elapsed: 48.9min finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  25 tasks      | elapsed:    1.8s
[Parallel(n_jobs=8)]: Done 146 tasks      | elapsed:    9.6s
[Parallel(n_jobs=8)]: Done 200 out of 200 | elapsed:   13.0s finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  25 tasks      | elapsed:    1.7s
[Parallel(n_jobs=8)]: Done 146 tasks      | elapsed:    9.7s
[Parallel(n_jobs=8)]: Done 200 out of 200 | elapsed:   13.3s finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  25 tasks      | elapsed:    3.5s
[Parallel(n_jobs=8)]: Done 146 tasks      | elapsed:   17.8s
[Parallel(n_jobs=8)]: Done 200 out of 200 | elapsed:   24.2s finished


score: 0.9375721496312507


[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 8 concurrent workers.


building tree 1 of 200building tree 2 of 200building tree 3 of 200building tree 4 of 200building tree 5 of 200building tree 6 of 200

building tree 7 of 200building tree 8 of 200





building tree 9 of 200
building tree 10 of 200
building tree 11 of 200
building tree 12 of 200
building tree 13 of 200
building tree 14 of 200
building tree 15 of 200
building tree 16 of 200
building tree 17 of 200
building tree 18 of 200
building tree 19 of 200
building tree 20 of 200
building tree 21 of 200
building tree 22 of 200building tree 23 of 200

building tree 24 of 200
building tree 25 of 200
building tree 26 of 200
building tree 27 of 200
building tree 28 of 200
building tree 29 of 200
building tree 30 of 200
building tree 31 of 200
building tree 32 of 200
building tree 33 of 200

[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:  7.6min



building tree 34 of 200
building tree 35 of 200
building tree 36 of 200
building tree 37 of 200
building tree 38 of 200
building tree 39 of 200
building tree 40 of 200
building tree 41 of 200building tree 42 of 200

building tree 43 of 200
building tree 44 of 200
building tree 45 of 200
building tree 46 of 200
building tree 47 of 200
building tree 48 of 200
building tree 49 of 200
building tree 50 of 200
building tree 51 of 200
building tree 52 of 200
building tree 53 of 200
building tree 54 of 200
building tree 55 of 200
building tree 56 of 200
building tree 57 of 200
building tree 58 of 200
building tree 59 of 200
building tree 60 of 200
building tree 61 of 200
building tree 62 of 200
building tree 63 of 200
building tree 64 of 200
building tree 65 of 200
building tree 66 of 200
building tree 67 of 200
building tree 68 of 200
building tree 69 of 200
building tree 70 of 200
building tree 71 of 200
building tree 72 of 200
building tree 73 of 200
building tree 74 of 200
building tree 7

[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed: 36.2min


building tree 154 of 200
building tree 155 of 200
building tree 156 of 200
building tree 157 of 200
building tree 158 of 200
building tree 159 of 200
building tree 160 of 200
building tree 161 of 200
building tree 162 of 200
building tree 163 of 200
building tree 164 of 200
building tree 165 of 200
building tree 166 of 200
building tree 167 of 200
building tree 168 of 200
building tree 169 of 200
building tree 170 of 200
building tree 171 of 200
building tree 172 of 200
building tree 173 of 200
building tree 174 of 200
building tree 175 of 200
building tree 176 of 200
building tree 177 of 200
building tree 178 of 200
building tree 179 of 200
building tree 180 of 200
building tree 181 of 200
building tree 182 of 200
building tree 183 of 200
building tree 184 of 200
building tree 185 of 200
building tree 186 of 200
building tree 187 of 200
building tree 188 of 200
building tree 189 of 200
building tree 190 of 200
building tree 191 of 200
building tree 192 of 200
building tree 193 of 200


[Parallel(n_jobs=-1)]: Done 200 out of 200 | elapsed: 48.0min finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  25 tasks      | elapsed:    1.9s
[Parallel(n_jobs=8)]: Done 146 tasks      | elapsed:    9.9s
[Parallel(n_jobs=8)]: Done 200 out of 200 | elapsed:   13.5s finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  25 tasks      | elapsed:    1.9s
[Parallel(n_jobs=8)]: Done 146 tasks      | elapsed:    9.9s
[Parallel(n_jobs=8)]: Done 200 out of 200 | elapsed:   13.4s finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  25 tasks      | elapsed:    4.6s
[Parallel(n_jobs=8)]: Done 146 tasks      | elapsed:   18.9s
[Parallel(n_jobs=8)]: Done 200 out of 200 | elapsed:   25.3s finished


score: 0.937447504525803


[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 8 concurrent workers.


building tree 1 of 200building tree 2 of 200building tree 3 of 200building tree 4 of 200building tree 5 of 200building tree 6 of 200building tree 7 of 200building tree 8 of 200







building tree 9 of 200
building tree 10 of 200
building tree 11 of 200
building tree 12 of 200
building tree 13 of 200
building tree 14 of 200
building tree 15 of 200
building tree 16 of 200
building tree 17 of 200building tree 18 of 200

building tree 19 of 200
building tree 20 of 200
building tree 21 of 200
building tree 22 of 200building tree 23 of 200

building tree 24 of 200
building tree 25 of 200
building tree 26 of 200
building tree 27 of 200building tree 28 of 200

building tree 29 of 200
building tree 30 of 200
building tree 31 of 200
building tree 32 of 200


[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:  7.6min


building tree 33 of 200
building tree 34 of 200
building tree 35 of 200
building tree 36 of 200
building tree 37 of 200
building tree 38 of 200
building tree 39 of 200
building tree 40 of 200
building tree 41 of 200
building tree 42 of 200
building tree 43 of 200
building tree 44 of 200
building tree 45 of 200
building tree 46 of 200
building tree 47 of 200
building tree 48 of 200
building tree 49 of 200
building tree 50 of 200
building tree 51 of 200
building tree 52 of 200building tree 53 of 200

building tree 54 of 200
building tree 55 of 200
building tree 56 of 200
building tree 57 of 200
building tree 58 of 200
building tree 59 of 200
building tree 60 of 200
building tree 61 of 200
building tree 62 of 200
building tree 63 of 200
building tree 64 of 200
building tree 65 of 200
building tree 66 of 200
building tree 67 of 200
building tree 68 of 200
building tree 69 of 200
building tree 70 of 200
building tree 71 of 200
building tree 72 of 200
building tree 73 of 200
building tree 74

[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed: 36.2min


building tree 154 of 200
building tree 155 of 200
building tree 156 of 200
building tree 157 of 200
building tree 158 of 200
building tree 159 of 200
building tree 160 of 200
building tree 161 of 200
building tree 162 of 200
building tree 163 of 200
building tree 164 of 200
building tree 165 of 200
building tree 166 of 200
building tree 167 of 200
building tree 168 of 200
building tree 169 of 200
building tree 170 of 200
building tree 171 of 200
building tree 172 of 200
building tree 173 of 200
building tree 174 of 200
building tree 175 of 200
building tree 176 of 200
building tree 177 of 200
building tree 178 of 200
building tree 179 of 200
building tree 180 of 200
building tree 181 of 200
building tree 182 of 200
building tree 183 of 200
building tree 184 of 200
building tree 185 of 200
building tree 186 of 200
building tree 187 of 200
building tree 188 of 200
building tree 189 of 200
building tree 190 of 200
building tree 191 of 200
building tree 192 of 200
building tree 193 of 200


[Parallel(n_jobs=-1)]: Done 200 out of 200 | elapsed: 47.9min finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  25 tasks      | elapsed:    1.7s
[Parallel(n_jobs=8)]: Done 146 tasks      | elapsed:    9.6s
[Parallel(n_jobs=8)]: Done 200 out of 200 | elapsed:   13.1s finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  25 tasks      | elapsed:    1.9s
[Parallel(n_jobs=8)]: Done 146 tasks      | elapsed:   10.0s
[Parallel(n_jobs=8)]: Done 200 out of 200 | elapsed:   13.5s finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  25 tasks      | elapsed:    3.7s
[Parallel(n_jobs=8)]: Done 146 tasks      | elapsed:   18.1s
[Parallel(n_jobs=8)]: Done 200 out of 200 | elapsed:   24.3s finished


score: 0.9370197056398623


[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 8 concurrent workers.


building tree 1 of 200building tree 2 of 200
building tree 3 of 200building tree 4 of 200building tree 5 of 200
building tree 6 of 200building tree 7 of 200
building tree 8 of 200




building tree 9 of 200
building tree 10 of 200
building tree 11 of 200
building tree 12 of 200
building tree 13 of 200
building tree 14 of 200building tree 15 of 200

building tree 16 of 200
building tree 17 of 200
building tree 18 of 200
building tree 19 of 200
building tree 20 of 200
building tree 21 of 200
building tree 22 of 200
building tree 23 of 200building tree 24 of 200

building tree 25 of 200
building tree 26 of 200
building tree 27 of 200
building tree 28 of 200
building tree 29 of 200
building tree 30 of 200
building tree 31 of 200
building tree 32 of 200


[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:  7.5min


building tree 33 of 200
building tree 34 of 200
building tree 35 of 200
building tree 36 of 200
building tree 37 of 200
building tree 38 of 200
building tree 39 of 200
building tree 40 of 200
building tree 41 of 200
building tree 42 of 200
building tree 43 of 200
building tree 44 of 200
building tree 45 of 200
building tree 46 of 200
building tree 47 of 200
building tree 48 of 200
building tree 49 of 200
building tree 50 of 200
building tree 51 of 200
building tree 52 of 200
building tree 53 of 200
building tree 54 of 200building tree 55 of 200

building tree 56 of 200
building tree 57 of 200
building tree 58 of 200
building tree 59 of 200
building tree 60 of 200
building tree 61 of 200
building tree 62 of 200
building tree 63 of 200
building tree 64 of 200
building tree 65 of 200
building tree 66 of 200
building tree 67 of 200
building tree 68 of 200
building tree 69 of 200
building tree 70 of 200
building tree 71 of 200
building tree 72 of 200
building tree 73 of 200
building tree 74

[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed: 35.9min


building tree 154 of 200
building tree 155 of 200
building tree 156 of 200
building tree 157 of 200
building tree 158 of 200
building tree 159 of 200
building tree 160 of 200
building tree 161 of 200
building tree 162 of 200
building tree 163 of 200
building tree 164 of 200
building tree 165 of 200
building tree 166 of 200
building tree 167 of 200
building tree 168 of 200
building tree 169 of 200
building tree 170 of 200
building tree 171 of 200
building tree 172 of 200
building tree 173 of 200
building tree 174 of 200
building tree 175 of 200
building tree 176 of 200
building tree 177 of 200
building tree 178 of 200
building tree 179 of 200
building tree 180 of 200
building tree 181 of 200
building tree 182 of 200
building tree 183 of 200
building tree 184 of 200
building tree 185 of 200
building tree 186 of 200
building tree 187 of 200
building tree 188 of 200
building tree 189 of 200
building tree 190 of 200
building tree 191 of 200
building tree 192 of 200
building tree 193 of 200


[Parallel(n_jobs=-1)]: Done 200 out of 200 | elapsed: 47.8min finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  25 tasks      | elapsed:    1.7s
[Parallel(n_jobs=8)]: Done 146 tasks      | elapsed:    9.6s
[Parallel(n_jobs=8)]: Done 200 out of 200 | elapsed:   13.0s finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  25 tasks      | elapsed:    1.8s
[Parallel(n_jobs=8)]: Done 146 tasks      | elapsed:   10.0s
[Parallel(n_jobs=8)]: Done 200 out of 200 | elapsed:   13.6s finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  25 tasks      | elapsed:    4.7s
[Parallel(n_jobs=8)]: Done 146 tasks      | elapsed:   19.0s
[Parallel(n_jobs=8)]: Done 200 out of 200 | elapsed:   25.4s finished


score: 0.9389520974463337


[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 8 concurrent workers.


building tree 1 of 200building tree 2 of 200building tree 3 of 200building tree 4 of 200building tree 5 of 200


building tree 6 of 200building tree 7 of 200building tree 8 of 200




building tree 9 of 200
building tree 10 of 200
building tree 11 of 200
building tree 12 of 200
building tree 13 of 200
building tree 14 of 200
building tree 15 of 200
building tree 16 of 200
building tree 17 of 200
building tree 18 of 200
building tree 19 of 200
building tree 20 of 200
building tree 21 of 200
building tree 22 of 200
building tree 23 of 200
building tree 24 of 200
building tree 25 of 200
building tree 26 of 200
building tree 27 of 200
building tree 28 of 200
building tree 29 of 200
building tree 30 of 200
building tree 31 of 200
building tree 32 of 200
building tree 33 of 200

[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:  7.5min



building tree 34 of 200
building tree 35 of 200
building tree 36 of 200
building tree 37 of 200
building tree 38 of 200
building tree 39 of 200
building tree 40 of 200
building tree 41 of 200
building tree 42 of 200
building tree 43 of 200
building tree 44 of 200
building tree 45 of 200
building tree 46 of 200
building tree 47 of 200
building tree 48 of 200
building tree 49 of 200
building tree 50 of 200
building tree 51 of 200
building tree 52 of 200
building tree 53 of 200
building tree 54 of 200
building tree 55 of 200
building tree 56 of 200
building tree 57 of 200
building tree 58 of 200
building tree 59 of 200
building tree 60 of 200
building tree 61 of 200
building tree 62 of 200
building tree 63 of 200
building tree 64 of 200
building tree 65 of 200
building tree 66 of 200
building tree 67 of 200
building tree 68 of 200
building tree 69 of 200
building tree 70 of 200
building tree 71 of 200
building tree 72 of 200
building tree 73 of 200
building tree 74 of 200
building tree 7

[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed: 35.9min



building tree 155 of 200
building tree 156 of 200
building tree 157 of 200
building tree 158 of 200
building tree 159 of 200
building tree 160 of 200
building tree 161 of 200
building tree 162 of 200
building tree 163 of 200
building tree 164 of 200
building tree 165 of 200
building tree 166 of 200
building tree 167 of 200
building tree 168 of 200
building tree 169 of 200
building tree 170 of 200
building tree 171 of 200
building tree 172 of 200
building tree 173 of 200
building tree 174 of 200
building tree 175 of 200
building tree 176 of 200
building tree 177 of 200
building tree 178 of 200
building tree 179 of 200
building tree 180 of 200
building tree 181 of 200
building tree 182 of 200
building tree 183 of 200
building tree 184 of 200
building tree 185 of 200
building tree 186 of 200
building tree 187 of 200
building tree 188 of 200
building tree 189 of 200
building tree 190 of 200
building tree 191 of 200
building tree 192 of 200
building tree 193 of 200
building tree 194 of 200

[Parallel(n_jobs=-1)]: Done 200 out of 200 | elapsed: 47.9min finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  25 tasks      | elapsed:    1.7s
[Parallel(n_jobs=8)]: Done 146 tasks      | elapsed:    9.7s
[Parallel(n_jobs=8)]: Done 200 out of 200 | elapsed:   13.2s finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  25 tasks      | elapsed:    1.8s
[Parallel(n_jobs=8)]: Done 146 tasks      | elapsed:    9.8s
[Parallel(n_jobs=8)]: Done 200 out of 200 | elapsed:   13.4s finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  25 tasks      | elapsed:    3.3s
[Parallel(n_jobs=8)]: Done 146 tasks      | elapsed:   17.6s
[Parallel(n_jobs=8)]: Done 200 out of 200 | elapsed:   23.9s finished


score: 0.9378814800226223

Wall time: 4h 5min 23s


In [8]:
print(np.mean(scores))

0.9377745874531744


In [7]:
%%time
params = {
    'n_estimators': 200,
    'max_depth': 19,
    'max_features': 10,
    'random_state': 17,
    'n_jobs': -1,
    'verbose': 2
}
forest = fit_model_with_save(X_train, y_train, X_test, 
                             params, 'rf1')

fit...


[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 8 concurrent workers.


building tree 1 of 200building tree 2 of 200building tree 3 of 200building tree 4 of 200building tree 5 of 200building tree 6 of 200building tree 7 of 200

building tree 8 of 200





building tree 9 of 200building tree 10 of 200

building tree 11 of 200
building tree 12 of 200
building tree 13 of 200
building tree 14 of 200
building tree 15 of 200
building tree 16 of 200
building tree 17 of 200
building tree 18 of 200
building tree 19 of 200
building tree 20 of 200
building tree 21 of 200
building tree 22 of 200
building tree 23 of 200
building tree 24 of 200
building tree 25 of 200
building tree 26 of 200
building tree 27 of 200
building tree 28 of 200
building tree 29 of 200
building tree 30 of 200
building tree 31 of 200
building tree 32 of 200


[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed: 11.2min


building tree 33 of 200
building tree 34 of 200
building tree 35 of 200
building tree 36 of 200
building tree 37 of 200
building tree 38 of 200
building tree 39 of 200
building tree 40 of 200
building tree 41 of 200
building tree 42 of 200
building tree 43 of 200
building tree 44 of 200
building tree 45 of 200
building tree 46 of 200
building tree 47 of 200
building tree 48 of 200
building tree 49 of 200
building tree 50 of 200
building tree 51 of 200
building tree 52 of 200
building tree 53 of 200
building tree 54 of 200
building tree 55 of 200
building tree 56 of 200
building tree 57 of 200
building tree 58 of 200
building tree 59 of 200
building tree 60 of 200
building tree 61 of 200
building tree 62 of 200
building tree 63 of 200
building tree 64 of 200
building tree 65 of 200
building tree 66 of 200
building tree 67 of 200
building tree 68 of 200
building tree 69 of 200
building tree 70 of 200
building tree 71 of 200
building tree 72 of 200
building tree 73 of 200
building tree 74

[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed: 51.6min



building tree 155 of 200
building tree 156 of 200
building tree 157 of 200
building tree 158 of 200
building tree 159 of 200
building tree 160 of 200
building tree 161 of 200
building tree 162 of 200
building tree 163 of 200
building tree 164 of 200
building tree 165 of 200
building tree 166 of 200
building tree 167 of 200
building tree 168 of 200
building tree 169 of 200
building tree 170 of 200
building tree 171 of 200
building tree 172 of 200
building tree 173 of 200
building tree 174 of 200
building tree 175 of 200
building tree 176 of 200
building tree 177 of 200
building tree 178 of 200
building tree 179 of 200
building tree 180 of 200
building tree 181 of 200
building tree 182 of 200
building tree 183 of 200
building tree 184 of 200
building tree 185 of 200
building tree 186 of 200
building tree 187 of 200
building tree 188 of 200
building tree 189 of 200
building tree 190 of 200
building tree 191 of 200
building tree 192 of 200
building tree 193 of 200
building tree 194 of 200

[Parallel(n_jobs=-1)]: Done 200 out of 200 | elapsed: 69.0min finished


predict...


[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  25 tasks      | elapsed:    3.9s
[Parallel(n_jobs=8)]: Done 146 tasks      | elapsed:   19.3s
[Parallel(n_jobs=8)]: Done 200 out of 200 | elapsed:   25.7s finished


saving predictions...
probapility...


[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  25 tasks      | elapsed:    3.2s
[Parallel(n_jobs=8)]: Done 146 tasks      | elapsed:   17.6s
[Parallel(n_jobs=8)]: Done 200 out of 200 | elapsed:   24.2s finished


Wall time: 1h 9min 59s


**Результат:** 0.939 на public lb.

### 2. Добавим таргет енкодинг, подкрутим параметры

In [11]:
%%time

params = {
    'n_estimators': 500,
    'max_depth': 25,
    'max_features': 15,
    'random_state': 17,
    'n_jobs': -1,
    'verbose': 2
}

forest = fit_model(X_train, y_train, params)

splitting...
fit...


[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 8 concurrent workers.


building tree 1 of 500building tree 2 of 500building tree 3 of 500building tree 4 of 500building tree 5 of 500building tree 6 of 500building tree 7 of 500building tree 8 of 500







building tree 9 of 500
building tree 10 of 500
building tree 11 of 500
building tree 12 of 500
building tree 13 of 500
building tree 14 of 500
building tree 15 of 500
building tree 16 of 500
building tree 17 of 500
building tree 18 of 500
building tree 19 of 500
building tree 20 of 500
building tree 21 of 500
building tree 22 of 500
building tree 23 of 500
building tree 24 of 500
building tree 25 of 500
building tree 26 of 500
building tree 27 of 500
building tree 28 of 500
building tree 29 of 500
building tree 30 of 500
building tree 31 of 500
building tree 32 of 500
building tree 33 of 500

[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed: 12.1min



building tree 34 of 500
building tree 35 of 500
building tree 36 of 500
building tree 37 of 500
building tree 38 of 500
building tree 39 of 500
building tree 40 of 500
building tree 41 of 500
building tree 42 of 500
building tree 43 of 500
building tree 44 of 500
building tree 45 of 500
building tree 46 of 500
building tree 47 of 500
building tree 48 of 500
building tree 49 of 500
building tree 50 of 500
building tree 51 of 500
building tree 52 of 500
building tree 53 of 500
building tree 54 of 500
building tree 55 of 500
building tree 56 of 500
building tree 57 of 500
building tree 58 of 500
building tree 59 of 500
building tree 60 of 500
building tree 61 of 500
building tree 62 of 500
building tree 63 of 500
building tree 64 of 500
building tree 65 of 500
building tree 66 of 500
building tree 67 of 500
building tree 68 of 500
building tree 69 of 500
building tree 70 of 500
building tree 71 of 500
building tree 72 of 500
building tree 73 of 500
building tree 74 of 500
building tree 7

[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed: 75.5min


building tree 154 of 500
building tree 155 of 500
building tree 156 of 500
building tree 157 of 500
building tree 158 of 500
building tree 159 of 500
building tree 160 of 500
building tree 161 of 500
building tree 162 of 500
building tree 163 of 500
building tree 164 of 500
building tree 165 of 500
building tree 166 of 500
building tree 167 of 500
building tree 168 of 500
building tree 169 of 500
building tree 170 of 500
building tree 171 of 500
building tree 172 of 500
building tree 173 of 500
building tree 174 of 500
building tree 175 of 500
building tree 176 of 500
building tree 177 of 500
building tree 178 of 500
building tree 179 of 500
building tree 180 of 500
building tree 181 of 500
building tree 182 of 500
building tree 183 of 500
building tree 184 of 500
building tree 185 of 500
building tree 186 of 500
building tree 187 of 500
building tree 188 of 500
building tree 189 of 500
building tree 190 of 500
building tree 191 of 500
building tree 192 of 500
building tree 193 of 500


[Parallel(n_jobs=-1)]: Done 349 tasks      | elapsed: 162.9min


building tree 357 of 500
building tree 358 of 500
building tree 359 of 500
building tree 360 of 500
building tree 361 of 500
building tree 362 of 500
building tree 363 of 500
building tree 364 of 500
building tree 365 of 500
building tree 366 of 500
building tree 367 of 500
building tree 368 of 500
building tree 369 of 500
building tree 370 of 500
building tree 371 of 500
building tree 372 of 500
building tree 373 of 500
building tree 374 of 500
building tree 375 of 500
building tree 376 of 500
building tree 377 of 500
building tree 378 of 500
building tree 379 of 500
building tree 380 of 500
building tree 381 of 500
building tree 382 of 500
building tree 383 of 500
building tree 384 of 500
building tree 385 of 500
building tree 386 of 500
building tree 387 of 500
building tree 388 of 500
building tree 389 of 500
building tree 390 of 500
building tree 391 of 500
building tree 392 of 500
building tree 393 of 500
building tree 394 of 500
building tree 395 of 500
building tree 396 of 500


[Parallel(n_jobs=-1)]: Done 500 out of 500 | elapsed: 224.1min finished


predict...


[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  25 tasks      | elapsed:    4.5s
[Parallel(n_jobs=8)]: Done 146 tasks      | elapsed:   23.7s
[Parallel(n_jobs=8)]: Done 349 tasks      | elapsed:   59.6s
[Parallel(n_jobs=8)]: Done 500 out of 500 | elapsed:  1.4min finished


score = 0.9375128407643563
Wall time: 3h 45min 46s


In [12]:
for feature, imp in zip(X_train.columns, forest.feature_importances_):
    print(feature, imp)

signal 0.2810031591309477
category 0.10557577662222818
shift_-20 0.004632864496387041
shift_-19 0.003406895845358958
shift_-18 0.0027048016027770012
shift_-17 0.0020497408946804035
shift_-16 0.001785484209258236
shift_-15 0.0016448296427000225
shift_-14 0.0014370097477635424
shift_-13 0.0013014553553040565
shift_-12 0.0013215275808369784
shift_-11 0.0012317645641277744
shift_-10 0.0011798043295531532
shift_-9 0.0012400490906433485
shift_-8 0.0012042642453490972
shift_-7 0.0013310274181537538
shift_-6 0.0015851816879912362
shift_-5 0.001878256879192865
shift_-4 0.0024917497499253078
shift_-3 0.004939598532460965
shift_-2 0.01272673147324323
shift_-1 0.04282155384271084
shift_1 0.030177157010225886
shift_2 0.007917132601380212
shift_3 0.004828813324772249
shift_4 0.0022441743662006013
shift_5 0.0020569064093582576
shift_6 0.0017555993633171147
shift_7 0.0019812042905531916
shift_8 0.001295086443712815
shift_9 0.0012752507007509001
shift_10 0.0012460415743923575
shift_11 0.001308760799017