# TPS MAY 2021 - LightAutoML

## Libraries

Importing pandas to read csv data file, sklearn.metrics.log_loss because of the competition evaluation metric is multi-class logarithmic loss and lightautoml as model

In [1]:
import pandas as pd
import lightautoml
from lightautoml.automl.presets.tabular_presets import TabularAutoML, TabularUtilizedAutoML
from lightautoml.tasks import Task
from sklearn.metrics import log_loss
from sklearn.model_selection import train_test_split

In [2]:
train_set = pd.read_csv("train.csv")
test_set = pd.read_csv("test.csv")
submission = pd.read_csv("sample_submission.csv")

train = train_set.copy()
test = test_set.copy()

train.drop("id",axis=1,inplace=True)
test.drop("id",axis=1,inplace=True)

train["sum"] = train.sum(axis=1)
test["sum"] = test.sum(axis=1)

In [6]:
task = Task('multiclass')

roles = {
    'target': "target",
    'drop': ['id'],
}

automl = TabularUtilizedAutoML(task = task, 
                               timeout = 3600*3,
                               cpu_limit = 8,
                               general_params = {'use_algos': [['lgb_tuned', 'cb_tuned'], ['lgb_tuned', 'cb_tuned']]},
                               tuning_params = {'max_tuning_time': 1200},
                               reader_params = {'n_jobs': 8}
                               )

oof_pred = automl.fit_predict(train, roles = roles)

Current random state: {'reader_params': {'random_state': 42}, 'general_params': {'return_all_predictions': False}}
Found reader_params in kwargs, need to combine
Merged variant for reader_params = {'n_jobs': 8, 'random_state': 42}
Found general_params in kwargs, need to combine
Merged variant for general_params = {'use_algos': [['lgb_tuned', 'cb_tuned'], ['lgb_tuned', 'cb_tuned']], 'return_all_predictions': False}
Start automl preset with listed constraints:
- time: 10799.971042633057 seconds
- cpus: 8 cores
- memory: 16 gb

Train data shape: (200000, 77)
Feats was rejected during automatic roles guess: []


Layer 1 ...
Train process start. Time left 10736.713728189468 secs
Optuna may run 1250.9616421107887 secs


Copying TaskTimer may affect the parent PipelineTimer, so copy will create new unlimited TaskTimer


Start fitting Lvl_0_Pipe_0_Mod_0_LightGBM ...

===== Start working with fold 0 for Lvl_0_Pipe_0_Mod_0_LightGBM =====

Training until validation scores don't improve for 100 rounds
[100]	valid's multi_logloss: 1.76112
[200]	valid's multi_logloss: 1.75926
Early stopping, best iteration is:
[145]	valid's multi_logloss: 1.7583
Lvl_0_Pipe_0_Mod_0_LightGBM fitting and predicting completed
Start fitting Lvl_0_Pipe_0_Mod_0_LightGBM ...

===== Start working with fold 0 for Lvl_0_Pipe_0_Mod_0_LightGBM =====

Training until validation scores don't improve for 100 rounds
[100]	valid's multi_logloss: 1.75813
[200]	valid's multi_logloss: 1.75472
Early stopping, best iteration is:
[187]	valid's multi_logloss: 1.75448
Lvl_0_Pipe_0_Mod_0_LightGBM fitting and predicting completed
Start fitting Lvl_0_Pipe_0_Mod_0_LightGBM ...

===== Start working with fold 0 for Lvl_0_Pipe_0_Mod_0_LightGBM =====

Training until validation scores don't improve for 100 rounds
[100]	valid's multi_logloss: 1.76104
[200]	vali

Time limit exceeded after calculating fold 3


Lvl_0_Pipe_0_Mod_0_LightGBM fitting and predicting completed
Optuna may run 2501.346375040574 secs
Start fitting Lvl_0_Pipe_0_Mod_1_CatBoost ...

===== Start working with fold 0 for Lvl_0_Pipe_0_Mod_1_CatBoost =====

0:	learn: 2.1722042	test: 2.1724066	best: 2.1724066 (0)	total: 1.16s	remaining: 1h 17m 28s
100:	learn: 1.7667982	test: 1.7728217	best: 1.7728217 (100)	total: 46.7s	remaining: 30m
200:	learn: 1.7514369	test: 1.7602103	best: 1.7602103 (200)	total: 1m 28s	remaining: 27m 58s
300:	learn: 1.7456542	test: 1.7565636	best: 1.7565636 (300)	total: 2m 7s	remaining: 26m 12s
400:	learn: 1.7415509	test: 1.7544801	best: 1.7544801 (400)	total: 2m 48s	remaining: 25m 8s
500:	learn: 1.7372897	test: 1.7524982	best: 1.7524982 (500)	total: 3m 29s	remaining: 24m 25s
600:	learn: 1.7341532	test: 1.7516895	best: 1.7516895 (600)	total: 4m 6s	remaining: 23m 14s
700:	learn: 1.7315558	test: 1.7511515	best: 1.7511515 (700)	total: 4m 43s	remaining: 22m 14s
800:	learn: 1.7292146	test: 1.7507666	best: 1.750

200:	learn: 1.7525554	test: 1.7552921	best: 1.7552921 (200)	total: 1m 29s	remaining: 20m 41s
300:	learn: 1.7468305	test: 1.7512371	best: 1.7512371 (300)	total: 2m 9s	remaining: 19m 16s
400:	learn: 1.7428945	test: 1.7490671	best: 1.7490671 (400)	total: 2m 47s	remaining: 18m 6s
500:	learn: 1.7389049	test: 1.7472074	best: 1.7472074 (500)	total: 3m 27s	remaining: 17m 16s
600:	learn: 1.7356384	test: 1.7461096	best: 1.7461096 (600)	total: 4m 5s	remaining: 16m 19s
700:	learn: 1.7329930	test: 1.7455811	best: 1.7455811 (700)	total: 4m 39s	remaining: 15m 16s
800:	learn: 1.7306166	test: 1.7453562	best: 1.7453562 (800)	total: 5m 13s	remaining: 14m 21s
900:	learn: 1.7284882	test: 1.7451371	best: 1.7451371 (900)	total: 5m 47s	remaining: 13m 29s
1000:	learn: 1.7264281	test: 1.7450333	best: 1.7450307 (983)	total: 6m 21s	remaining: 12m 40s
1100:	learn: 1.7245115	test: 1.7448934	best: 1.7448934 (1100)	total: 6m 54s	remaining: 11m 54s
1200:	learn: 1.7226189	test: 1.7448255	best: 1.7448255 (1200)	total: 7

Time limit exceeded in one of the tasks. AutoML will blend level 1 models.                                         
Try to set higher time limits or use Profiler to find bottleneck and optimize Pipelines settings


Blending: Optimization starts with equal weights and score -1.7464614035516977
Blending, iter 0: score = -1.745814659547303, weights = [0.1762068  0.82379323]
Blending, iter 1: score = -1.745814659547303, weights = [0.1762068  0.82379323]
No score update. Terminated

Automl preset training completed in 5735.74 seconds.


NameError: name 'train_pred' is not defined

In [7]:
print('OOF score: {}'.format(log_loss(train["target"].values, oof_pred.data)))

OOF score: 1.745814659547303


In [8]:
test_pred = automl.predict(test)

In [9]:
submission.iloc[:, 1:] = test_pred.data
submission.head()

Unnamed: 0,id,Class_1,Class_2,Class_3,Class_4,Class_5,Class_6,Class_7,Class_8,Class_9
0,200000,0.066149,0.385762,0.153165,0.026798,0.012281,0.166385,0.022123,0.048477,0.118861
1,200001,0.037953,0.073033,0.056556,0.016847,0.012096,0.271066,0.080407,0.323016,0.129028
2,200002,0.021615,0.02941,0.023965,0.01087,0.007111,0.670531,0.03427,0.140684,0.061544
3,200003,0.047784,0.114457,0.08276,0.037325,0.016448,0.20528,0.080597,0.24749,0.16786
4,200004,0.040343,0.10742,0.079365,0.02417,0.012745,0.307115,0.063141,0.227226,0.138474


In [10]:
sub = submission.set_index("id")
sub.head()

Unnamed: 0_level_0,Class_1,Class_2,Class_3,Class_4,Class_5,Class_6,Class_7,Class_8,Class_9
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
200000,0.066149,0.385762,0.153165,0.026798,0.012281,0.166385,0.022123,0.048477,0.118861
200001,0.037953,0.073033,0.056556,0.016847,0.012096,0.271066,0.080407,0.323016,0.129028
200002,0.021615,0.02941,0.023965,0.01087,0.007111,0.670531,0.03427,0.140684,0.061544
200003,0.047784,0.114457,0.08276,0.037325,0.016448,0.20528,0.080597,0.24749,0.16786
200004,0.040343,0.10742,0.079365,0.02417,0.012745,0.307115,0.063141,0.227226,0.138474


In [11]:
sub.to_csv("sub2.csv")