# Предсказание бойца - победителя UFC
## Моделирование
### Межгалактический хакатон 2022
Команда "Meldonium" <br/>
Состав: Вячеслав Барков, Павел Мамаев, Сергей Глуховский, Алексей Недоливко, Андрей Рем, Иван Ершов


In [None]:
!pip install autogluon==0.4.1b20220516 &> /dev/null

In [None]:
import pandas as pd
import numpy as np
import pickle
from autogluon.tabular import TabularDataset, TabularPredictor

In [None]:
path = 'events_df.bin'
with open(path, 'rb') as f:
	data  = pickle.load(f)

Загрузим наши данные, отсортируем и исключим лишние данные

In [None]:
data = data.sort_values(by=['eventDate.date'])
test_with_odds = data.iloc[-900:].reset_index(drop=True)
data = data.drop(columns=['eventDate.date', 'f1_odds', 'f2_odds'])
mask = (data['f1_n_fights'] != 1) & (data['f2_n_fights'] != 1)
data = data[mask]

Выделим валидационную и тестовую выборки

In [None]:
train = data.iloc[:-1800].reset_index(drop=True)
val = data.iloc[-1800:-900].reset_index(drop=True)
test = data.iloc[-900:].reset_index(drop=True)

Для создания модели будем использовать фреймворк AutoGluon. Это поможет нам  создать модель с высокой точностью и оптимизированными гиперпараметрами


In [None]:
predictor = TabularPredictor(label='winner', path='models', eval_metric='f1')
presets = ['best_quality']
predictor.fit(train, tuning_data=val, presets=presets, use_bag_holdout=True)

Presets specified: ['best_quality']
Beginning AutoGluon training ...
AutoGluon will save models to "models/"
AutoGluon Version:  0.4.1b20220516
Python Version:     3.7.13
Operating System:   Linux
Train Data Rows:    4372
Train Data Columns: 42
Tuning Data Rows:    900
Tuning Data Columns: 42
Label Column: winner
Preprocessing data ...
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [False, True]
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Selected class <--> label mapping:  class 1 = True, class 0 = False
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    12144.96 MB
	Train Data (Original)  Memory Usage: 1.77 MB (0.0% of available memory)
	Inferring data type of each featur

<autogluon.tabular.predictor.predictor.TabularPredictor at 0x7f84e7e31450>

Посмотрим результаты нащих моделей проверив их на отложенной тестовой выборке

In [None]:
predictor.leaderboard(test, silent=True)

Unnamed: 0,model,score_test,score_val,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,LightGBMLarge_BAG_L1,0.68549,0.729904,0.159986,0.096905,55.483599,0.159986,0.096905,55.483599,1,True,13
1,RandomForestEntr_BAG_L1,0.680031,0.738654,0.250252,0.352688,5.470192,0.250252,0.352688,5.470192,1,True,6
2,RandomForestGini_BAG_L1,0.675969,0.732812,0.245542,0.401797,3.837518,0.245542,0.401797,3.837518,1,True,5
3,WeightedEnsemble_L2,0.675139,0.751819,1.315067,1.46153,236.62106,0.005858,0.002931,2.260274,2,True,14
4,ExtraTreesGini_BAG_L1,0.675112,0.720827,0.272422,0.379658,2.24666,0.272422,0.379658,2.24666,1,True,8
5,LightGBM_BAG_L1,0.674437,0.736089,0.06947,0.052317,28.566517,0.06947,0.052317,28.566517,1,True,4
6,ExtraTreesEntr_BAG_L1,0.673637,0.716925,0.270358,0.385585,1.855065,0.270358,0.385585,1.855065,1,True,9
7,CatBoost_BAG_L1,0.673108,0.744072,0.039966,0.054156,51.915914,0.039966,0.054156,51.915914,1,True,7
8,XGBoost_BAG_L1,0.672172,0.72989,0.159771,0.136088,60.032282,0.159771,0.136088,60.032282,1,True,11
9,LightGBMXT_BAG_L1,0.668301,0.737553,0.190041,0.120842,32.370029,0.190041,0.120842,32.370029,1,True,3


Проверим, какие факторы больше всего влияют на точность нашей модели

In [None]:
predictor.feature_importance(test)

Computing feature importance via permutation shuffling for 42 features using 900 rows with 5 shuffle sets...
	777.1s	= Expected runtime (155.42s per shuffle set)
	340.62s	= Actual runtime (Completed 5 of 5 shuffle sets)


Unnamed: 0,importance,stddev,p_value,n,p99_high,p99_low
f2_avg_fight_time,0.040554,0.013611,0.001318,5,0.068579,0.012529
f2_n_fights,0.036265,0.004321,2.4e-05,5,0.045161,0.027368
f1_n_fights,0.033744,0.003915,2.1e-05,5,0.041805,0.025683
f2_age,0.011427,0.001503,3.5e-05,5,0.014523,0.008332
f1_avg_fight_time,0.011347,0.002809,0.000416,5,0.017131,0.005564
f2_weight,0.006649,0.002698,0.002646,5,0.012204,0.001094
f2_win_ko_per_fight,0.006125,0.002262,0.001877,5,0.010782,0.001468
f1_height,0.005922,0.004568,0.022089,5,0.015328,-0.003484
f1_win_ko_per_fight,0.005563,0.002486,0.003735,5,0.010681,0.000444
f2_submission_attempts_per_min,0.005443,0.00332,0.010731,5,0.012279,-0.001393


Сохраним предсказания лучшей модели для дальнейшей работы с ними

In [None]:
test_with_odds['prediction'] = predictor.predict(test, model='LightGBMLarge_BAG_L1')

In [None]:
path = 'predictions.bin'
with open(path, 'wb') as f:
	pickle.dump(test_with_odds, f)