## Условие задачи

На основании данных о пассажирах, которые воспользовались метро дважды за сутки, при
наличии информации о первом заходе в метро, необходимо предсказать, **на какой станции**
и **через какой промежуток времени**, этот пассажир воспользуется метро повторно. 

В данных присутствуют только те люди, которые совершили ровно две поездки в день,
при этом статистика валидаций взята за несколько дней.
- `train.csv` — файл, содержащий данные о валидациях для обучения;
-`test.csv` — файл, содержащий данные для предсказания;
- `subway. csv` — вспомогательный файл содержащий информацию о всех
возможных способах попасть со станции «А» на станцию «Б»; 

### Описание датасетов:
- `id` : уникальный идентификатор столбца;
- `ticket_id` : уникальный идентификатор билета, считается, что у одного билета один владелец
- `ticket_type_nm` : тип билета
- `entrance_id` : уникальный id входа в станцию
- `entrance_nm` : название
- `station_id` : уникальное id станции захода
- `station_nm` : наименование станции захода
- `line_id` : уникальный id ветки на, которой находится станция
- `line_nm` : наименование ветки, на которой находится станция
- `pass_dttm` : дата валидации
- `time_to_under` (столбец для предсказания) : сколько времени прошло между первой и второй валидацией
- `label (столбец для предсказания)` : id второй станции, на которой произошла валидация 

## Подготовка данных для задачи классификации

In [1]:
import pandas as pd, numpy as np, time
import xgboost as xgb
import catboost as cb
import matplotlib.pyplot as plt
from catboost import CatBoostRegressor, cv, Pool, CatBoostClassifier
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import silhouette_score

In [2]:
df = pd.read_csv('train_dataset_train.csv', sep=',')
df = df.drop(df.index[range(10000, 1091021)])

In [3]:
print(df.shape)

(10000, 12)


In [4]:
df.drop(columns=['id','ticket_id','station_nm','line_id','pass_dttm','time_to_under','entrance_nm'],inplace=True)

In [5]:
cols = ['ticket_type_nm', 'entrance_id', 'station_id','line_nm']
for item in cols:
    df[item] = df[item].astype("category").cat.codes +1

In [6]:
df.head()

Unnamed: 0,ticket_type_nm,entrance_id,station_id,line_nm,label
0,28,385,198,2,8001
1,38,53,30,4,9011
2,21,53,30,4,7022
3,2,53,30,4,2022
4,27,53,30,4,2017


## Классификация Catboost'ом

In [7]:
train, test, y_train, y_test = train_test_split(df.drop(["label"], axis=1), df["label"],
                                                random_state=10, test_size=0.2)

In [8]:
#cat_features_index = [0,1,2,3,4,5,6]

In [9]:
params = {'depth': [4, 7, 10],
          'learning_rate' : [0.03],
         'l2_leaf_reg': [3],
         'iterations': [10]}
#'verbose': False
cb = cb.CatBoostClassifier()
cb_model = GridSearchCV(cb, params, scoring="roc_auc", cv = 3)
cb_model.fit(train, y_train)



0:	learn: 5.5713420	total: 530ms	remaining: 4.77s
1:	learn: 5.5440303	total: 928ms	remaining: 3.71s
2:	learn: 5.5182055	total: 1.23s	remaining: 2.88s
3:	learn: 5.4944294	total: 1.64s	remaining: 2.45s
4:	learn: 5.4721407	total: 1.91s	remaining: 1.91s
5:	learn: 5.4522270	total: 2.19s	remaining: 1.46s
6:	learn: 5.4300917	total: 2.54s	remaining: 1.09s
7:	learn: 5.4089648	total: 2.86s	remaining: 714ms
8:	learn: 5.3892194	total: 3.18s	remaining: 353ms
9:	learn: 5.3700426	total: 3.48s	remaining: 0us


Traceback (most recent call last):
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\model_selection\_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\metrics\_scorer.py", line 220, in __call__
    return self._score(
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported



0:	learn: 5.5670966	total: 356ms	remaining: 3.2s
1:	learn: 5.5399091	total: 717ms	remaining: 2.87s
2:	learn: 5.5139078	total: 1s	remaining: 2.33s
3:	learn: 5.4906840	total: 1.33s	remaining: 2s
4:	learn: 5.4665904	total: 1.68s	remaining: 1.68s
5:	learn: 5.4455299	total: 1.98s	remaining: 1.32s
6:	learn: 5.4259217	total: 2.27s	remaining: 975ms
7:	learn: 5.4044883	total: 2.61s	remaining: 652ms
8:	learn: 5.3853018	total: 2.88s	remaining: 320ms
9:	learn: 5.3669764	total: 3.19s	remaining: 0us


Traceback (most recent call last):
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\model_selection\_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\metrics\_scorer.py", line 220, in __call__
    return self._score(
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported



0:	learn: 5.5737334	total: 380ms	remaining: 3.42s
1:	learn: 5.5460075	total: 780ms	remaining: 3.12s
2:	learn: 5.5193292	total: 1.1s	remaining: 2.56s
3:	learn: 5.4957777	total: 1.44s	remaining: 2.17s
4:	learn: 5.4727274	total: 1.79s	remaining: 1.79s
5:	learn: 5.4505157	total: 2.09s	remaining: 1.39s
6:	learn: 5.4276298	total: 2.46s	remaining: 1.05s
7:	learn: 5.4054386	total: 2.86s	remaining: 715ms
8:	learn: 5.3857787	total: 3.23s	remaining: 359ms
9:	learn: 5.3673168	total: 3.56s	remaining: 0us


Traceback (most recent call last):
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\model_selection\_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\metrics\_scorer.py", line 220, in __call__
    return self._score(
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported



0:	learn: 5.5693397	total: 1.03s	remaining: 9.28s
1:	learn: 5.5390992	total: 1.99s	remaining: 7.96s
2:	learn: 5.5089251	total: 2.91s	remaining: 6.78s
3:	learn: 5.4820387	total: 3.83s	remaining: 5.75s
4:	learn: 5.4593092	total: 4.79s	remaining: 4.79s
5:	learn: 5.4349747	total: 5.72s	remaining: 3.81s
6:	learn: 5.4107485	total: 6.63s	remaining: 2.84s
7:	learn: 5.3873460	total: 7.6s	remaining: 1.9s
8:	learn: 5.3658875	total: 8.51s	remaining: 945ms
9:	learn: 5.3434854	total: 9.38s	remaining: 0us


Traceback (most recent call last):
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\model_selection\_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\metrics\_scorer.py", line 220, in __call__
    return self._score(
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported



0:	learn: 5.5634607	total: 905ms	remaining: 8.14s
1:	learn: 5.5331895	total: 1.77s	remaining: 7.08s
2:	learn: 5.5043329	total: 2.59s	remaining: 6.05s
3:	learn: 5.4790313	total: 3.38s	remaining: 5.08s
4:	learn: 5.4585001	total: 3.67s	remaining: 3.67s
5:	learn: 5.4327344	total: 4.54s	remaining: 3.02s
6:	learn: 5.4077565	total: 5.36s	remaining: 2.3s
7:	learn: 5.3844938	total: 6.23s	remaining: 1.56s
8:	learn: 5.3629971	total: 7.07s	remaining: 785ms
9:	learn: 5.3414455	total: 7.96s	remaining: 0us


Traceback (most recent call last):
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\model_selection\_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\metrics\_scorer.py", line 220, in __call__
    return self._score(
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported



0:	learn: 5.5700879	total: 947ms	remaining: 8.52s
1:	learn: 5.5390000	total: 1.78s	remaining: 7.12s
2:	learn: 5.5096103	total: 2.78s	remaining: 6.48s
3:	learn: 5.4843124	total: 3.96s	remaining: 5.94s
4:	learn: 5.4607947	total: 4.91s	remaining: 4.91s
5:	learn: 5.4353040	total: 5.88s	remaining: 3.92s
6:	learn: 5.4112490	total: 6.79s	remaining: 2.91s
7:	learn: 5.3873552	total: 7.64s	remaining: 1.91s
8:	learn: 5.3647461	total: 8.51s	remaining: 945ms
9:	learn: 5.3421261	total: 9.39s	remaining: 0us


Traceback (most recent call last):
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\model_selection\_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\metrics\_scorer.py", line 220, in __call__
    return self._score(
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported



0:	learn: 5.5688479	total: 5.73s	remaining: 51.6s
1:	learn: 5.5388776	total: 11.2s	remaining: 44.8s
2:	learn: 5.5118543	total: 16.7s	remaining: 38.9s
3:	learn: 5.4829460	total: 22.9s	remaining: 34.4s
4:	learn: 5.4566541	total: 30.4s	remaining: 30.4s
5:	learn: 5.4292868	total: 36.4s	remaining: 24.3s
6:	learn: 5.4045231	total: 42.8s	remaining: 18.3s
7:	learn: 5.3796718	total: 48s	remaining: 12s
8:	learn: 5.3557558	total: 53.9s	remaining: 5.99s
9:	learn: 5.3335323	total: 1m	remaining: 0us


Traceback (most recent call last):
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\model_selection\_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\metrics\_scorer.py", line 220, in __call__
    return self._score(
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported



0:	learn: 5.5646823	total: 6.07s	remaining: 54.6s
1:	learn: 5.5342073	total: 12.2s	remaining: 48.6s
2:	learn: 5.5073681	total: 17.5s	remaining: 40.9s
3:	learn: 5.4805319	total: 23s	remaining: 34.4s
4:	learn: 5.4529718	total: 28.9s	remaining: 28.9s
5:	learn: 5.4261387	total: 35.2s	remaining: 23.5s
6:	learn: 5.3996349	total: 40.4s	remaining: 17.3s
7:	learn: 5.3753511	total: 46.7s	remaining: 11.7s
8:	learn: 5.3505898	total: 52.6s	remaining: 5.84s
9:	learn: 5.3295706	total: 58.6s	remaining: 0us


Traceback (most recent call last):
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\model_selection\_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\metrics\_scorer.py", line 220, in __call__
    return self._score(
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported



0:	learn: 5.5704436	total: 6.24s	remaining: 56.1s
1:	learn: 5.5407312	total: 11.5s	remaining: 45.8s
2:	learn: 5.5150425	total: 16.2s	remaining: 37.7s
3:	learn: 5.4882218	total: 21.4s	remaining: 32.1s
4:	learn: 5.4599115	total: 26.7s	remaining: 26.7s
5:	learn: 5.4321056	total: 31.8s	remaining: 21.2s
6:	learn: 5.4066432	total: 36.7s	remaining: 15.7s
7:	learn: 5.3814123	total: 41.5s	remaining: 10.4s
8:	learn: 5.3560823	total: 46.3s	remaining: 5.15s
9:	learn: 5.3360320	total: 50.3s	remaining: 0us


Traceback (most recent call last):
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\model_selection\_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\metrics\_scorer.py", line 220, in __call__
    return self._score(
  File "C:\Users\blach\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\metrics\_scorer.py", line 353, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported



0:	learn: 5.5725419	total: 344ms	remaining: 3.1s
1:	learn: 5.5468785	total: 779ms	remaining: 3.12s
2:	learn: 5.5146047	total: 1.2s	remaining: 2.8s
3:	learn: 5.4900099	total: 1.6s	remaining: 2.4s
4:	learn: 5.4649069	total: 2.05s	remaining: 2.05s
5:	learn: 5.4418292	total: 2.43s	remaining: 1.62s
6:	learn: 5.4177269	total: 2.9s	remaining: 1.24s
7:	learn: 5.3953764	total: 3.4s	remaining: 850ms
8:	learn: 5.3750754	total: 3.84s	remaining: 426ms
9:	learn: 5.3536626	total: 4.29s	remaining: 0us


In [10]:
predict=cb_model.predict(test)

In [11]:
print(" Results from Grid Search " )
print("\n The best estimator across ALL searched params:\n",cb_model.best_estimator_)
print("\n The best score across ALL searched params:\n",cb_model.best_score_)
print("\n The best parameters across ALL searched params:\n",cb_model.best_params_)

 Results from Grid Search 

 The best estimator across ALL searched params:
 <catboost.core.CatBoostClassifier object at 0x0000026ABE81AC50>

 The best score across ALL searched params:
 nan

 The best parameters across ALL searched params:
 {'depth': 4, 'iterations': 10, 'l2_leaf_reg': 3, 'learning_rate': 0.03}


In [12]:
clf = CatBoostClassifier(depth=4, iterations= 2000, l2_leaf_reg= 3, learning_rate= 0.03)
clf.fit(train,y_train)

0:	learn: 5.5725419	total: 488ms	remaining: 16m 15s
1:	learn: 5.5468785	total: 950ms	remaining: 15m 48s
2:	learn: 5.5146047	total: 1.38s	remaining: 15m 15s
3:	learn: 5.4900099	total: 1.76s	remaining: 14m 40s
4:	learn: 5.4649069	total: 2.13s	remaining: 14m 8s
5:	learn: 5.4418292	total: 2.47s	remaining: 13m 40s
6:	learn: 5.4177269	total: 2.89s	remaining: 13m 41s
7:	learn: 5.3953764	total: 3.24s	remaining: 13m 27s
8:	learn: 5.3750754	total: 3.62s	remaining: 13m 21s
9:	learn: 5.3536626	total: 4.13s	remaining: 13m 43s
10:	learn: 5.3343949	total: 4.67s	remaining: 14m 5s
11:	learn: 5.3158792	total: 5.09s	remaining: 14m 2s
12:	learn: 5.2986260	total: 5.51s	remaining: 14m 1s
13:	learn: 5.2840991	total: 5.9s	remaining: 13m 56s
14:	learn: 5.2699606	total: 6.3s	remaining: 13m 54s
15:	learn: 5.2563182	total: 6.88s	remaining: 14m 12s
16:	learn: 5.2416297	total: 7.39s	remaining: 14m 21s
17:	learn: 5.2251750	total: 7.9s	remaining: 14m 30s
18:	learn: 5.2110288	total: 8.43s	remaining: 14m 38s
19:	learn:

155:	learn: 4.5785749	total: 1m 10s	remaining: 13m 59s
156:	learn: 4.5758628	total: 1m 11s	remaining: 13m 58s
157:	learn: 4.5749145	total: 1m 11s	remaining: 13m 57s
158:	learn: 4.5725927	total: 1m 12s	remaining: 13m 56s
159:	learn: 4.5700815	total: 1m 12s	remaining: 13m 56s
160:	learn: 4.5676719	total: 1m 13s	remaining: 13m 55s
161:	learn: 4.5650307	total: 1m 13s	remaining: 13m 55s
162:	learn: 4.5638332	total: 1m 13s	remaining: 13m 53s
163:	learn: 4.5624714	total: 1m 14s	remaining: 13m 51s
164:	learn: 4.5605715	total: 1m 14s	remaining: 13m 50s
165:	learn: 4.5586587	total: 1m 15s	remaining: 13m 49s
166:	learn: 4.5568599	total: 1m 15s	remaining: 13m 47s
167:	learn: 4.5548137	total: 1m 15s	remaining: 13m 47s
168:	learn: 4.5531515	total: 1m 16s	remaining: 13m 46s
169:	learn: 4.5511398	total: 1m 16s	remaining: 13m 45s
170:	learn: 4.5488652	total: 1m 17s	remaining: 13m 45s
171:	learn: 4.5471694	total: 1m 17s	remaining: 13m 45s
172:	learn: 4.5447362	total: 1m 18s	remaining: 13m 44s
173:	learn

305:	learn: 4.3654606	total: 2m 14s	remaining: 12m 26s
306:	learn: 4.3645835	total: 2m 15s	remaining: 12m 26s
307:	learn: 4.3635398	total: 2m 15s	remaining: 12m 26s
308:	learn: 4.3620455	total: 2m 16s	remaining: 12m 25s
309:	learn: 4.3605671	total: 2m 16s	remaining: 12m 24s
310:	learn: 4.3591232	total: 2m 17s	remaining: 12m 24s
311:	learn: 4.3582152	total: 2m 17s	remaining: 12m 23s
312:	learn: 4.3576309	total: 2m 17s	remaining: 12m 22s
313:	learn: 4.3566210	total: 2m 18s	remaining: 12m 22s
314:	learn: 4.3554735	total: 2m 18s	remaining: 12m 22s
315:	learn: 4.3544376	total: 2m 19s	remaining: 12m 21s
316:	learn: 4.3528944	total: 2m 19s	remaining: 12m 21s
317:	learn: 4.3518997	total: 2m 20s	remaining: 12m 20s
318:	learn: 4.3504100	total: 2m 20s	remaining: 12m 20s
319:	learn: 4.3497045	total: 2m 20s	remaining: 12m 20s
320:	learn: 4.3489129	total: 2m 21s	remaining: 12m 19s
321:	learn: 4.3479901	total: 2m 21s	remaining: 12m 18s
322:	learn: 4.3472177	total: 2m 22s	remaining: 12m 17s
323:	learn

456:	learn: 4.2577604	total: 3m 10s	remaining: 10m 44s
457:	learn: 4.2572298	total: 3m 11s	remaining: 10m 44s
458:	learn: 4.2566773	total: 3m 11s	remaining: 10m 43s
459:	learn: 4.2561001	total: 3m 12s	remaining: 10m 42s
460:	learn: 4.2553326	total: 3m 12s	remaining: 10m 42s
461:	learn: 4.2547724	total: 3m 12s	remaining: 10m 41s
462:	learn: 4.2543573	total: 3m 13s	remaining: 10m 41s
463:	learn: 4.2538572	total: 3m 13s	remaining: 10m 40s
464:	learn: 4.2534378	total: 3m 13s	remaining: 10m 40s
465:	learn: 4.2525265	total: 3m 14s	remaining: 10m 39s
466:	learn: 4.2520830	total: 3m 14s	remaining: 10m 38s
467:	learn: 4.2516785	total: 3m 14s	remaining: 10m 38s
468:	learn: 4.2510331	total: 3m 15s	remaining: 10m 37s
469:	learn: 4.2503854	total: 3m 15s	remaining: 10m 36s
470:	learn: 4.2499651	total: 3m 15s	remaining: 10m 36s
471:	learn: 4.2497315	total: 3m 16s	remaining: 10m 35s
472:	learn: 4.2492381	total: 3m 16s	remaining: 10m 34s
473:	learn: 4.2487861	total: 3m 16s	remaining: 10m 34s
474:	learn

608:	learn: 4.1870183	total: 4m 7s	remaining: 9m 24s
609:	learn: 4.1865355	total: 4m 7s	remaining: 9m 23s
610:	learn: 4.1857888	total: 4m 7s	remaining: 9m 23s
611:	learn: 4.1854219	total: 4m 8s	remaining: 9m 22s
612:	learn: 4.1852907	total: 4m 8s	remaining: 9m 21s
613:	learn: 4.1846465	total: 4m 8s	remaining: 9m 21s
614:	learn: 4.1843093	total: 4m 9s	remaining: 9m 21s
615:	learn: 4.1839946	total: 4m 9s	remaining: 9m 20s
616:	learn: 4.1834441	total: 4m 10s	remaining: 9m 20s
617:	learn: 4.1832229	total: 4m 10s	remaining: 9m 19s
618:	learn: 4.1829918	total: 4m 10s	remaining: 9m 19s
619:	learn: 4.1823604	total: 4m 11s	remaining: 9m 18s
620:	learn: 4.1819906	total: 4m 11s	remaining: 9m 18s
621:	learn: 4.1816810	total: 4m 11s	remaining: 9m 17s
622:	learn: 4.1807802	total: 4m 12s	remaining: 9m 17s
623:	learn: 4.1801593	total: 4m 12s	remaining: 9m 16s
624:	learn: 4.1798800	total: 4m 12s	remaining: 9m 16s
625:	learn: 4.1794661	total: 4m 13s	remaining: 9m 15s
626:	learn: 4.1793293	total: 4m 13s	

761:	learn: 4.1207700	total: 5m 3s	remaining: 8m 12s
762:	learn: 4.1204626	total: 5m 3s	remaining: 8m 12s
763:	learn: 4.1200749	total: 5m 3s	remaining: 8m 11s
764:	learn: 4.1198879	total: 5m 4s	remaining: 8m 10s
765:	learn: 4.1195242	total: 5m 4s	remaining: 8m 10s
766:	learn: 4.1191841	total: 5m 4s	remaining: 8m 10s
767:	learn: 4.1186091	total: 5m 5s	remaining: 8m 9s
768:	learn: 4.1182554	total: 5m 5s	remaining: 8m 9s
769:	learn: 4.1175966	total: 5m 5s	remaining: 8m 8s
770:	learn: 4.1172520	total: 5m 6s	remaining: 8m 8s
771:	learn: 4.1169415	total: 5m 6s	remaining: 8m 7s
772:	learn: 4.1166467	total: 5m 6s	remaining: 8m 7s
773:	learn: 4.1164218	total: 5m 7s	remaining: 8m 6s
774:	learn: 4.1162170	total: 5m 7s	remaining: 8m 6s
775:	learn: 4.1156433	total: 5m 7s	remaining: 8m 5s
776:	learn: 4.1155465	total: 5m 8s	remaining: 8m 5s
777:	learn: 4.1153460	total: 5m 8s	remaining: 8m 4s
778:	learn: 4.1151533	total: 5m 8s	remaining: 8m 4s
779:	learn: 4.1148302	total: 5m 9s	remaining: 8m 3s
780:	l

914:	learn: 4.0627327	total: 5m 58s	remaining: 7m 5s
915:	learn: 4.0621910	total: 5m 59s	remaining: 7m 5s
916:	learn: 4.0617544	total: 5m 59s	remaining: 7m 4s
917:	learn: 4.0612006	total: 6m	remaining: 7m 4s
918:	learn: 4.0607224	total: 6m	remaining: 7m 3s
919:	learn: 4.0605818	total: 6m	remaining: 7m 3s
920:	learn: 4.0604608	total: 6m 1s	remaining: 7m 3s
921:	learn: 4.0596994	total: 6m 1s	remaining: 7m 2s
922:	learn: 4.0593998	total: 6m 1s	remaining: 7m 2s
923:	learn: 4.0588794	total: 6m 2s	remaining: 7m 1s
924:	learn: 4.0584638	total: 6m 2s	remaining: 7m 1s
925:	learn: 4.0582558	total: 6m 2s	remaining: 7m
926:	learn: 4.0578373	total: 6m 3s	remaining: 7m
927:	learn: 4.0575497	total: 6m 3s	remaining: 6m 59s
928:	learn: 4.0568854	total: 6m 3s	remaining: 6m 59s
929:	learn: 4.0564552	total: 6m 4s	remaining: 6m 59s
930:	learn: 4.0560662	total: 6m 4s	remaining: 6m 58s
931:	learn: 4.0557645	total: 6m 4s	remaining: 6m 58s
932:	learn: 4.0554416	total: 6m 5s	remaining: 6m 57s
933:	learn: 4.0552

1066:	learn: 4.0080505	total: 6m 55s	remaining: 6m 3s
1067:	learn: 4.0076520	total: 6m 55s	remaining: 6m 2s
1068:	learn: 4.0074816	total: 6m 55s	remaining: 6m 2s
1069:	learn: 4.0071909	total: 6m 56s	remaining: 6m 1s
1070:	learn: 4.0069357	total: 6m 56s	remaining: 6m 1s
1071:	learn: 4.0062909	total: 6m 57s	remaining: 6m 1s
1072:	learn: 4.0057247	total: 6m 57s	remaining: 6m
1073:	learn: 4.0053469	total: 6m 57s	remaining: 6m
1074:	learn: 4.0049008	total: 6m 58s	remaining: 5m 59s
1075:	learn: 4.0047249	total: 6m 58s	remaining: 5m 59s
1076:	learn: 4.0044924	total: 6m 58s	remaining: 5m 58s
1077:	learn: 4.0040945	total: 6m 59s	remaining: 5m 58s
1078:	learn: 4.0039045	total: 6m 59s	remaining: 5m 58s
1079:	learn: 4.0037053	total: 6m 59s	remaining: 5m 57s
1080:	learn: 4.0035367	total: 7m	remaining: 5m 57s
1081:	learn: 4.0033464	total: 7m	remaining: 5m 56s
1082:	learn: 4.0032272	total: 7m	remaining: 5m 56s
1083:	learn: 4.0029772	total: 7m 1s	remaining: 5m 55s
1084:	learn: 4.0021350	total: 7m 1s	r

1217:	learn: 3.9583576	total: 7m 52s	remaining: 5m 3s
1218:	learn: 3.9582863	total: 7m 53s	remaining: 5m 3s
1219:	learn: 3.9581314	total: 7m 53s	remaining: 5m 2s
1220:	learn: 3.9575153	total: 7m 53s	remaining: 5m 2s
1221:	learn: 3.9569947	total: 7m 54s	remaining: 5m 1s
1222:	learn: 3.9566092	total: 7m 54s	remaining: 5m 1s
1223:	learn: 3.9563317	total: 7m 55s	remaining: 5m 1s
1224:	learn: 3.9560345	total: 7m 55s	remaining: 5m
1225:	learn: 3.9555380	total: 7m 55s	remaining: 5m
1226:	learn: 3.9554028	total: 7m 56s	remaining: 5m
1227:	learn: 3.9546348	total: 7m 56s	remaining: 4m 59s
1228:	learn: 3.9542878	total: 7m 57s	remaining: 4m 59s
1229:	learn: 3.9538470	total: 7m 57s	remaining: 4m 59s
1230:	learn: 3.9533919	total: 7m 58s	remaining: 4m 58s
1231:	learn: 3.9531226	total: 7m 58s	remaining: 4m 58s
1232:	learn: 3.9527232	total: 7m 58s	remaining: 4m 57s
1233:	learn: 3.9525705	total: 7m 59s	remaining: 4m 57s
1234:	learn: 3.9522824	total: 7m 59s	remaining: 4m 57s
1235:	learn: 3.9518245	total:

1368:	learn: 3.9116977	total: 8m 55s	remaining: 4m 6s
1369:	learn: 3.9114087	total: 8m 56s	remaining: 4m 6s
1370:	learn: 3.9113263	total: 8m 56s	remaining: 4m 6s
1371:	learn: 3.9112417	total: 8m 57s	remaining: 4m 5s
1372:	learn: 3.9106271	total: 8m 57s	remaining: 4m 5s
1373:	learn: 3.9104079	total: 8m 58s	remaining: 4m 5s
1374:	learn: 3.9096796	total: 8m 58s	remaining: 4m 4s
1375:	learn: 3.9095358	total: 8m 59s	remaining: 4m 4s
1376:	learn: 3.9089327	total: 8m 59s	remaining: 4m 4s
1377:	learn: 3.9088354	total: 8m 59s	remaining: 4m 3s
1378:	learn: 3.9082704	total: 9m	remaining: 4m 3s
1379:	learn: 3.9079429	total: 9m 1s	remaining: 4m 3s
1380:	learn: 3.9074112	total: 9m 1s	remaining: 4m 2s
1381:	learn: 3.9073216	total: 9m 2s	remaining: 4m 2s
1382:	learn: 3.9071082	total: 9m 2s	remaining: 4m 2s
1383:	learn: 3.9065672	total: 9m 3s	remaining: 4m 1s
1384:	learn: 3.9063715	total: 9m 3s	remaining: 4m 1s
1385:	learn: 3.9058600	total: 9m 4s	remaining: 4m 1s
1386:	learn: 3.9056985	total: 9m 4s	rem

1518:	learn: 3.8679008	total: 10m 2s	remaining: 3m 10s
1519:	learn: 3.8677381	total: 10m 2s	remaining: 3m 10s
1520:	learn: 3.8676144	total: 10m 2s	remaining: 3m 9s
1521:	learn: 3.8675397	total: 10m 3s	remaining: 3m 9s
1522:	learn: 3.8673751	total: 10m 3s	remaining: 3m 9s
1523:	learn: 3.8672411	total: 10m 4s	remaining: 3m 8s
1524:	learn: 3.8670631	total: 10m 4s	remaining: 3m 8s
1525:	learn: 3.8669698	total: 10m 4s	remaining: 3m 7s
1526:	learn: 3.8665813	total: 10m 5s	remaining: 3m 7s
1527:	learn: 3.8660710	total: 10m 5s	remaining: 3m 7s
1528:	learn: 3.8654457	total: 10m 6s	remaining: 3m 6s
1529:	learn: 3.8652593	total: 10m 6s	remaining: 3m 6s
1530:	learn: 3.8648593	total: 10m 7s	remaining: 3m 5s
1531:	learn: 3.8647179	total: 10m 7s	remaining: 3m 5s
1532:	learn: 3.8645253	total: 10m 7s	remaining: 3m 5s
1533:	learn: 3.8643662	total: 10m 8s	remaining: 3m 4s
1534:	learn: 3.8642896	total: 10m 8s	remaining: 3m 4s
1535:	learn: 3.8641009	total: 10m 9s	remaining: 3m 4s
1536:	learn: 3.8637308	tot

1666:	learn: 3.8293600	total: 11m 8s	remaining: 2m 13s
1667:	learn: 3.8292778	total: 11m 8s	remaining: 2m 13s
1668:	learn: 3.8289047	total: 11m 9s	remaining: 2m 12s
1669:	learn: 3.8287629	total: 11m 9s	remaining: 2m 12s
1670:	learn: 3.8286171	total: 11m 9s	remaining: 2m 11s
1671:	learn: 3.8282241	total: 11m 10s	remaining: 2m 11s
1672:	learn: 3.8280922	total: 11m 10s	remaining: 2m 11s
1673:	learn: 3.8279423	total: 11m 11s	remaining: 2m 10s
1674:	learn: 3.8275135	total: 11m 11s	remaining: 2m 10s
1675:	learn: 3.8273464	total: 11m 12s	remaining: 2m 9s
1676:	learn: 3.8269718	total: 11m 12s	remaining: 2m 9s
1677:	learn: 3.8265209	total: 11m 13s	remaining: 2m 9s
1678:	learn: 3.8260005	total: 11m 13s	remaining: 2m 8s
1679:	learn: 3.8255115	total: 11m 13s	remaining: 2m 8s
1680:	learn: 3.8253928	total: 11m 14s	remaining: 2m 7s
1681:	learn: 3.8248190	total: 11m 14s	remaining: 2m 7s
1682:	learn: 3.8245827	total: 11m 15s	remaining: 2m 7s
1683:	learn: 3.8244717	total: 11m 15s	remaining: 2m 6s
1684:	

1814:	learn: 3.7924064	total: 12m 15s	remaining: 1m 14s
1815:	learn: 3.7917307	total: 12m 16s	remaining: 1m 14s
1816:	learn: 3.7913318	total: 12m 16s	remaining: 1m 14s
1817:	learn: 3.7910577	total: 12m 16s	remaining: 1m 13s
1818:	learn: 3.7907173	total: 12m 17s	remaining: 1m 13s
1819:	learn: 3.7904823	total: 12m 17s	remaining: 1m 12s
1820:	learn: 3.7902893	total: 12m 18s	remaining: 1m 12s
1821:	learn: 3.7900445	total: 12m 18s	remaining: 1m 12s
1822:	learn: 3.7895700	total: 12m 19s	remaining: 1m 11s
1823:	learn: 3.7894739	total: 12m 19s	remaining: 1m 11s
1824:	learn: 3.7889795	total: 12m 19s	remaining: 1m 10s
1825:	learn: 3.7887236	total: 12m 20s	remaining: 1m 10s
1826:	learn: 3.7886518	total: 12m 20s	remaining: 1m 10s
1827:	learn: 3.7884346	total: 12m 21s	remaining: 1m 9s
1828:	learn: 3.7882954	total: 12m 21s	remaining: 1m 9s
1829:	learn: 3.7881120	total: 12m 22s	remaining: 1m 8s
1830:	learn: 3.7877206	total: 12m 22s	remaining: 1m 8s
1831:	learn: 3.7875963	total: 12m 22s	remaining: 1m 

1964:	learn: 3.7540934	total: 13m 19s	remaining: 14.2s
1965:	learn: 3.7540033	total: 13m 19s	remaining: 13.8s
1966:	learn: 3.7536966	total: 13m 20s	remaining: 13.4s
1967:	learn: 3.7534944	total: 13m 20s	remaining: 13s
1968:	learn: 3.7533529	total: 13m 21s	remaining: 12.6s
1969:	learn: 3.7532690	total: 13m 21s	remaining: 12.2s
1970:	learn: 3.7531942	total: 13m 22s	remaining: 11.8s
1971:	learn: 3.7529737	total: 13m 22s	remaining: 11.4s
1972:	learn: 3.7527348	total: 13m 22s	remaining: 11s
1973:	learn: 3.7526919	total: 13m 23s	remaining: 10.6s
1974:	learn: 3.7524415	total: 13m 23s	remaining: 10.2s
1975:	learn: 3.7523305	total: 13m 24s	remaining: 9.77s
1976:	learn: 3.7520569	total: 13m 24s	remaining: 9.36s
1977:	learn: 3.7519290	total: 13m 25s	remaining: 8.95s
1978:	learn: 3.7517475	total: 13m 25s	remaining: 8.55s
1979:	learn: 3.7516069	total: 13m 25s	remaining: 8.14s
1980:	learn: 3.7514050	total: 13m 26s	remaining: 7.73s
1981:	learn: 3.7509079	total: 13m 26s	remaining: 7.33s
1982:	learn: 3

<catboost.core.CatBoostClassifier at 0x26abe81bb80>

In [13]:
predict=clf.predict(test)

In [14]:
from sklearn.metrics import r2_score, recall_score, mean_absolute_percentage_error
print('recall: ', recall_score(y_test, predict, average='micro'))

recall:  0.1245


## Регрессия Catboost'ом

In [15]:
df1 = pd.read_csv('train_dataset_train.csv')
df1.head()

Unnamed: 0,id,ticket_id,ticket_type_nm,entrance_id,entrance_nm,station_id,station_nm,line_id,line_nm,pass_dttm,time_to_under,label
0,1,40BD89EC85646EFB69E283F39C298E60,Пропуск FacePay,2402,Лефортово БКЛ,11007,Лефортово,11,Большая кольцевая,2022-09-12 05:00:13,216.316667,8001
1,2,126727A96489CC976A8C08E5CEB00542,СК учащегося 30 дней,110,Войковская ( Южный ),2006,Войковская,2,Замоскворецкая,2022-09-12 05:00:54,648.183333,9011
2,3,D28CE6A9E0E5B6D213470A97CFF32485,БСК дружинника г.Москвы,110,Войковская ( Южный ),2006,Войковская,2,Замоскворецкая,2022-09-12 05:00:55,865.333333,7022
3,4,015DA44B523C062B5BFEFF3FB0E64B9E,30 дней,110,Войковская ( Южный ),2006,Войковская,2,Замоскворецкая,2022-09-12 05:01:13,1048.233333,2022
4,5,95B19C6F3A504727AC3EA56EB7E3E80F,КОШЕЛЕК,110,Войковская ( Южный ),2006,Войковская,2,Замоскворецкая,2022-09-12 05:02:55,965.6,2017


In [16]:
df1.pass_dttm = pd.to_datetime(df1.pass_dttm)
df1['day'] = df1.pass_dttm.dt.dayofweek #день недели
df1['hour'] = df1.pass_dttm.dt.hour # час в формате 24
df1['workday'] = df1['day'].apply(lambda x: 0 if x == 5 or x == 6 else 1)
df1 = df1.drop(columns=['pass_dttm'])

In [17]:
df1['ticket_type_nm'] = df1.ticket_type_nm.apply(lambda x: x.split(' ')[0])

In [18]:
df1 = df1.drop(['ticket_id','id','station_id', 'line_id','entrance_id','entrance_nm','label'], axis=1)

In [19]:
df1.head()

Unnamed: 0,ticket_type_nm,station_nm,line_nm,time_to_under,day,hour,workday
0,Пропуск,Лефортово,Большая кольцевая,216.316667,0,5,1
1,СК,Войковская,Замоскворецкая,648.183333,0,5,1
2,БСК,Войковская,Замоскворецкая,865.333333,0,5,1
3,30,Войковская,Замоскворецкая,1048.233333,0,5,1
4,КОШЕЛЕК,Войковская,Замоскворецкая,965.6,0,5,1


In [20]:
from sklearn.model_selection import train_test_split
train1, test1, y_train1, y_test1 = train_test_split(df1.drop(['time_to_under'],  axis=1), df1['time_to_under'], test_size=0.4, random_state=42)

In [21]:
cat_features=['ticket_type_nm', 'station_nm', 'line_nm','workday']

In [28]:
model = CatBoostRegressor(iterations=100,
                          learning_rate=0.7,
                          cat_features=cat_features,
                          l2_leaf_reg=0.01,
                          random_strength=0.3,
                          depth=16)

In [29]:
model.fit(train1, y_train1)

0:	learn: 173.6210306	total: 662ms	remaining: 1m 5s
1:	learn: 164.0985769	total: 1.33s	remaining: 1m 5s
2:	learn: 162.2669224	total: 2.05s	remaining: 1m 6s
3:	learn: 161.5033264	total: 2.76s	remaining: 1m 6s
4:	learn: 160.9304360	total: 3.43s	remaining: 1m 5s
5:	learn: 160.8062707	total: 4.16s	remaining: 1m 5s
6:	learn: 160.7405279	total: 4.87s	remaining: 1m 4s
7:	learn: 160.0796024	total: 5.59s	remaining: 1m 4s
8:	learn: 159.6977277	total: 6.28s	remaining: 1m 3s
9:	learn: 159.4568933	total: 6.99s	remaining: 1m 2s
10:	learn: 159.2759096	total: 7.71s	remaining: 1m 2s
11:	learn: 159.1372187	total: 8.4s	remaining: 1m 1s
12:	learn: 158.8044680	total: 9.09s	remaining: 1m
13:	learn: 158.7326627	total: 9.72s	remaining: 59.7s
14:	learn: 158.5968657	total: 10.4s	remaining: 58.7s
15:	learn: 158.4561651	total: 11s	remaining: 57.9s
16:	learn: 158.3510418	total: 11.7s	remaining: 56.9s
17:	learn: 158.2143493	total: 12.3s	remaining: 56s
18:	learn: 158.1439755	total: 13s	remaining: 55.3s
19:	learn: 15

<catboost.core.CatBoostRegressor at 0x26abe81b6d0>

In [30]:
pred = model.predict(test1)
from sklearn.metrics import r2_score, recall_score, mean_absolute_percentage_error
print('R2: ', r2_score(y_test1, pred))

R2:  0.48590502986376616


In [31]:
R2= r2_score(y_test1, pred)
recall = recall_score(y_test, predict, average='micro')
print('result: ',(recall+R2)/2)

result:  0.3052025149318831
