<center>
<img src="../../img/ods_stickers.jpg">
## Открытый курс по машинному обучению. Сессия № 2
Автор материала: программист-исследователь Mail.ru Group, старший преподаватель Факультета Компьютерных Наук ВШЭ Юрий Кашницкий. Материал распространяется на условиях лицензии [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). Можно использовать в любых целях (редактировать, поправлять и брать за основу), кроме коммерческих, но с обязательным упоминанием автора материала.

# <center>Тема 10. Бустинг
## <center>Часть 8. Оценка результатов Xgboost

## Загрузка бибилиотек

In [1]:
import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

## Загрузка и подготовка данных

Посмотрим на примере данных по оттоку клиентов из телеком-компании.

In [2]:
# df = pd.read_csv("../../data/telecom_churn.csv")
df = pd.read_csv('https://github.com/Yorko/mlcourse.ai/raw/main/data/telecom_churn.csv')

In [3]:
df.head()

Unnamed: 0,State,Account length,Area code,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,Total eve calls,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls,Churn
0,KS,128,415,No,Yes,25,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
1,OH,107,415,No,Yes,26,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
2,NJ,137,415,No,No,0,243.4,114,41.38,121.2,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,Yes,No,0,299.4,71,50.9,61.9,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,OK,75,415,Yes,No,0,166.7,113,28.34,148.3,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False


**Штаты просто занумеруем, а признаки International plan (наличие международного роуминга), Voice mail plan (наличие голосовой почтыы) и целевой Churn сделаем бинарными.**

In [4]:
state_enc = LabelEncoder()
df["State"] = state_enc.fit_transform(df["State"])
df["International plan"] = (df["International plan"] == "Yes").astype("int")
df["Voice mail plan"] = (df["Voice mail plan"] == "Yes").astype("int")
df["Churn"] = (df["Churn"]).astype("int")

**Разделим данные на обучающую и тестовую выборки в отношении 7:3. Создадим соотв. объекты DMAtrix.**

In [5]:
X_train, X_test, y_train, y_test = train_test_split(
    df.drop("Churn", axis=1), df["Churn"], test_size=0.3, random_state=42
)
dtrain = xgb.DMatrix(X_train, y_train)
dtest = xgb.DMatrix(X_test, y_test)

**Зададим параметры Xgboost.**

In [6]:
params = {"objective": "binary:logistic", "max_depth": 3, "silent": 1, "eta": 0.5}

num_rounds = 10

**Будем отслеживать качество модели и на обучающей выборке, и на валидационной.**

In [7]:
watchlist = [(dtest, "test"), (dtrain, "train")]

## Использование встроенных метрик
В Xgboost реализованы большинство популярных метрик для классификации, регрессии и ранжирования:

- `rmse` - [root mean square error](https://www.wikiwand.com/en/Root-mean-square_deviation)
- `mae` - [mean absolute error](https://en.wikipedia.org/wiki/Mean_absolute_error?oldformat=true)
- `logloss` - [negative log-likelihood](https://en.wikipedia.org/wiki/Likelihood_function?oldformat=true)
- `error` (по умолчанию) - доля ошибок в бинарной классификации
- `merror` - доля ошибок в классификации на несколько классов
- `auc` - [area under curve](https://en.wikipedia.org/wiki/Receiver_operating_characteristic?oldformat=true)
- `ndcg` - [normalized discounted cumulative gain](https://en.wikipedia.org/wiki/Discounted_cumulative_gain?oldformat=true)
- `map` - [mean average precision](https://en.wikipedia.org/wiki/Information_retrieval?oldformat=true)

In [8]:
xgb_model = xgb.train(params, dtrain, num_rounds, watchlist)

[0]	test-logloss:0.31086	train-logloss:0.30749
[1]	test-logloss:0.25938	train-logloss:0.25928
[2]	test-logloss:0.22727	train-logloss:0.22436
[3]	test-logloss:0.20703	train-logloss:0.20584
[4]	test-logloss:0.19787	train-logloss:0.19319
[5]	test-logloss:0.18753	train-logloss:0.18212


Parameters: { "silent" } are not used.



[6]	test-logloss:0.17837	train-logloss:0.17186
[7]	test-logloss:0.17641	train-logloss:0.16887
[8]	test-logloss:0.17729	train-logloss:0.16287
[9]	test-logloss:0.17546	train-logloss:0.15867


**Чтоб отслеживать log_loss, просто добавим ее в словарь params.**

In [9]:
params["eval_metric"] = "logloss"
xgb_model = xgb.train(params, dtrain, num_rounds, watchlist)

Parameters: { "silent" } are not used.



[0]	test-logloss:0.31086	train-logloss:0.30749
[1]	test-logloss:0.25938	train-logloss:0.25928
[2]	test-logloss:0.22727	train-logloss:0.22436
[3]	test-logloss:0.20703	train-logloss:0.20584
[4]	test-logloss:0.19787	train-logloss:0.19319
[5]	test-logloss:0.18753	train-logloss:0.18212
[6]	test-logloss:0.17837	train-logloss:0.17186
[7]	test-logloss:0.17641	train-logloss:0.16887
[8]	test-logloss:0.17729	train-logloss:0.16287
[9]	test-logloss:0.17546	train-logloss:0.15867


**Можно отслеживать сразу несколько метрик.**

In [10]:
params["eval_metric"] = ["logloss", "auc"]
xgb_model = xgb.train(params, dtrain, num_rounds, watchlist)

[0]	test-logloss:0.31086	test-auc:0.83317	train-logloss:0.30749	train-auc:0.83325
[1]	test-logloss:0.25938	test-auc:0.90017	train-logloss:0.25928	train-auc:0.88749
[2]	test-logloss:0.22727	test-auc:0.90885	train-logloss:0.22436	train-auc:0.90003
[3]	test-logloss:0.20703	test-auc:0.91907	train-logloss:0.20584	train-auc:0.90734
[4]	test-logloss:0.19787	test-auc:0.92078	train-logloss:0.19319	train-auc:0.91043
[5]	test-logloss:0.18753	test-auc:0.92824	train-logloss:0.18212	train-auc:0.91802
[6]	test-logloss:0.17837	test-auc:0.93473	train-logloss:0.17186	train-auc:0.93259
[7]	test-logloss:0.17641	test-auc:0.93444	train-logloss:0.16887	train-auc:0.93352
[8]	test-logloss:0.17729	test-auc:0.92988	train-logloss:0.16287	train-auc:0.94616
[9]	test-logloss:0.17546	test-auc:0.93438	train-logloss:0.15867	train-auc:0.95481


Parameters: { "silent" } are not used.



## Создание собственной метрики качества

**Чтобы создать свою метрику качества, достаточно определить функцию, принимающую 2 аргумента: вектор предсказанных вероятностей и объект `DMatrix` с истинными метками.  
В этом примере функция вернет просто число объектов, на которых классификатор ошибся, когла относил к классу 1 при превышении предсказанной вероятности класса 1 порога 0.5.
Далее передаем эту функцию в xgb.train (параметр feval), если метрика тем лучше, чем меньше, надо дополнительно указать `maximize=False`.**


In [11]:
# custom evaluation metric
def misclassified(pred_probs, dmatrix):
    labels = dmatrix.get_label()  # obtain true labels
    preds = pred_probs > 0.5  # obtain predicted values
    return "misclassified", np.sum(labels != preds)

In [12]:
xgb_model = xgb.train(
    params, dtrain, num_rounds, watchlist, feval=misclassified, maximize=False
)

[0]	test-logloss:0.31086	test-auc:0.83317	test-misclassified:114.00000	train-logloss:0.30749	train-auc:0.83325	train-misclassified:277.00000
[1]	test-logloss:0.25938	test-auc:0.90017	test-misclassified:96.00000	train-logloss:0.25928	train-auc:0.88749	train-misclassified:222.00000
[2]	test-logloss:0.22727	test-auc:0.90885	test-misclassified:79.00000	train-logloss:0.22436	train-auc:0.90003	train-misclassified:165.00000
[3]	test-logloss:0.20703	test-auc:0.91907	test-misclassified:83.00000	train-logloss:0.20584	train-auc:0.90734	train-misclassified:163.00000
[4]	test-logloss:0.19787	test-auc:0.92078	test-misclassified:72.00000	train-logloss:0.19319	train-auc:0.91043	train-misclassified:127.00000
[5]	test-logloss:0.18753	test-auc:0.92824	test-misclassified:68.00000	train-logloss:0.18212	train-auc:0.91802	train-misclassified:120.00000
[6]	test-logloss:0.17837	test-auc:0.93473	test-misclassified:64.00000	train-logloss:0.17186	train-auc:0.93259	train-misclassified:110.00000
[7]	test-logloss:0.

Parameters: { "silent" } are not used.



**С помощью параметра evals_result можно сохранить значения метрик по итерациям.**

In [13]:
evals_result = {}
xgb_model = xgb.train(
    params,
    dtrain,
    num_rounds,
    watchlist,
    feval=misclassified,
    maximize=False,
    evals_result=evals_result,
)

[0]	test-logloss:0.31086	test-auc:0.83317	test-misclassified:114.00000	train-logloss:0.30749	train-auc:0.83325	train-misclassified:277.00000
[1]	test-logloss:0.25938	test-auc:0.90017	test-misclassified:96.00000	train-logloss:0.25928	train-auc:0.88749	train-misclassified:222.00000
[2]	test-logloss:0.22727	test-auc:0.90885	test-misclassified:79.00000	train-logloss:0.22436	train-auc:0.90003	train-misclassified:165.00000
[3]	test-logloss:0.20703	test-auc:0.91907	test-misclassified:83.00000	train-logloss:0.20584	train-auc:0.90734	train-misclassified:163.00000
[4]	test-logloss:0.19787	test-auc:0.92078	test-misclassified:72.00000	train-logloss:0.19319	train-auc:0.91043	train-misclassified:127.00000
[5]	test-logloss:0.18753	test-auc:0.92824	test-misclassified:68.00000	train-logloss:0.18212	train-auc:0.91802	train-misclassified:120.00000
[6]	test-logloss:0.17837	test-auc:0.93473	test-misclassified:64.00000	train-logloss:0.17186	train-auc:0.93259	train-misclassified:110.00000
[7]	test-logloss:0.

Parameters: { "silent" } are not used.



In [14]:
evals_result

{'test': OrderedDict([('logloss',
               [0.3108644165545702,
                0.2593764360100031,
                0.22726648916304112,
                0.20702744737640022,
                0.19786546132713556,
                0.1875348666496575,
                0.17836646699532865,
                0.17640563402697443,
                0.17728795894607902,
                0.17546417650673538]),
              ('auc',
               [0.8331674160145572,
                0.9001721732176808,
                0.9088461130468132,
                0.9190663478878182,
                0.920784000130558,
                0.9282380396732789,
                0.9347332947099575,
                0.9344436193911106,
                0.9298822531027898,
                0.9343783404460184]),
              ('misclassified',
               [114.0,
                96.0,
                79.0,
                83.0,
                72.0,
                68.0,
                64.0,
                66.0,
     

## Ранняя остановка
**Ранняя остановка используется для того, чтобы прекратить обучение модели, если ошибка за несколько итераций не уменьшилась.**

In [15]:
params["eval_metric"] = "error"
num_rounds = 1500

xgb_model = xgb.train(params, dtrain, num_rounds, watchlist, early_stopping_rounds=10)

[0]	test-error:0.09700	train-error:0.09344
[1]	test-error:0.09300	train-error:0.09173
[2]	test-error:0.06800	train-error:0.06087
[3]	test-error:0.06100	train-error:0.04929
[4]	test-error:0.06100	train-error:0.04758
[5]	test-error:0.05900	train-error:0.04629
[6]	test-error:0.05200	train-error:0.03901
[7]	test-error:0.05100	train-error:0.03815
[8]	test-error:0.05300	train-error:0.03772
[9]	test-error:0.05500	train-error:0.03729
[10]	test-error:0.05000	train-error:0.03686
[11]	test-error:0.05100	train-error:0.03558
[12]	test-error:0.05000	train-error:0.03429
[13]	test-error:0.05100	train-error:0.03343
[14]	test-error:0.05300	train-error:0.03386
[15]	test-error:0.05100	train-error:0.03343
[16]	test-error:0.05100	train-error:0.03215
[17]	test-error:0.05100	train-error:0.03129
[18]	test-error:0.04900	train-error:0.03129
[19]	test-error:0.04900	train-error:0.03172
[20]	test-error:0.05000	train-error:0.03043
[21]	test-error:0.05000	train-error:0.03000
[22]	test-error:0.04700	train-error:0.0295

Parameters: { "silent" } are not used.



[29]	test-error:0.04400	train-error:0.02915
[30]	test-error:0.04900	train-error:0.02786
[31]	test-error:0.04300	train-error:0.02743
[32]	test-error:0.04300	train-error:0.02615
[33]	test-error:0.04000	train-error:0.02615
[34]	test-error:0.04100	train-error:0.02615
[35]	test-error:0.04100	train-error:0.02486
[36]	test-error:0.04100	train-error:0.02443
[37]	test-error:0.04000	train-error:0.02486
[38]	test-error:0.04300	train-error:0.02443
[39]	test-error:0.04100	train-error:0.02400
[40]	test-error:0.04100	train-error:0.02315
[41]	test-error:0.04400	train-error:0.02229
[42]	test-error:0.04600	train-error:0.02272
[43]	test-error:0.04600	train-error:0.02272
[44]	test-error:0.04500	train-error:0.02143
[45]	test-error:0.04300	train-error:0.02057
[46]	test-error:0.04300	train-error:0.02057
[47]	test-error:0.04100	train-error:0.01972
[48]	test-error:0.04300	train-error:0.01886
[49]	test-error:0.04500	train-error:0.01843
[50]	test-error:0.04700	train-error:0.01929
[51]	test-error:0.04700	train-er

In [16]:
print("Booster best train score: {}".format(xgb_model.best_score))
print("Booster best iteration: {}".format(xgb_model.best_iteration))

Booster best train score: 0.00085726532361766
Booster best iteration: 124


## Кросс-валидация с Xgboost
**Продемонстрируем функцию xgboost.cv.**

In [17]:
num_rounds = 10
hist = xgb.cv(params, dtrain, num_rounds, nfold=10, metrics={"error"}, seed=42)
hist

Parameters: { "silent" } are not used.



Unnamed: 0,train-error-mean,train-error-std,test-error-mean,test-error-std
0,0.101537,0.011926,0.109743,0.022838
1,0.085299,0.007342,0.093436,0.019202
2,0.066771,0.007422,0.080168,0.018137
3,0.053579,0.004082,0.068583,0.018388
4,0.05015,0.002998,0.066014,0.015371
5,0.046531,0.003738,0.063439,0.015666
6,0.044483,0.003712,0.060867,0.015442
7,0.041482,0.002837,0.060443,0.015049
8,0.039482,0.002925,0.056152,0.013194
9,0.038815,0.002614,0.057437,0.013835


Замечания:

- по умолчанию на выходе DataFrame (можно поменять параметр `as_pandas`),
- метрики передатся как параметр (можно и несколько),
- можно использовать и свои метрики (параметры `feval` и `maximize`),
- можно также использовать раннюю остановку ( `early_stopping_rounds`)