The Ulitmate Fighter Championship (UFC) is an America Mixed Martial Arts promotion at the frontier of the sport. Since 1993, The UFC has hosted over 500 events, each of which has up to 20 fights. The fights are made and announced by matchemakers and the fighters who are all members of the UFC roster. In the roster there are 8 weight classes for men and 4 weight classes for women, each weight class consisting of approximately 50 fighters. Fighters signed to the UFC typically have made a name for themselves in other combat sports or are elite prospects from smaller or regional MMA promotions, so the UFC has been able to boast the best of the best during its existance.

On a given fight night, the fights are made of about 5 fights on the undercard and 5 fights on the main card. The undercard typically consists of new signees and low ranked fights while the main card will include 3-4 mid-ranked fighters with 1-2 high profile bouts, also known as the main event. There is no restriction on which weight class must have a fight on card. Each of the fights consist of three 5 minute rounds (execpt the main event, which has five rounds) where the fight will continue until time runs out and the fight is determined by a decsion by judges or the ref determines that one fighter has ceased to inteligently defend themselves. The ref's decsion can come from several different situations incuding a submission, knockout, or technical knockout.



In [1]:
# Necessary libraries and imports to complete this tutorial
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import f
import seaborn as sns
from sklearn import model_selection
from sklearn import linear_model
from sklearn import preprocessing
from sklearn.preprocessing import LabelEncoder
import statsmodels.formula.api as smf
import warnings
warnings.filterwarnings('ignore')

READING DATA

In [2]:
data = pd.read_csv("data.csv")

data.head()

Unnamed: 0,R_fighter,B_fighter,Referee,date,location,Winner,title_bout,weight_class,no_of_rounds,B_current_lose_streak,...,R_win_by_KO/TKO,R_win_by_Submission,R_win_by_TKO_Doctor_Stoppage,R_wins,R_Stance,R_Height_cms,R_Reach_cms,R_Weight_lbs,B_age,R_age
0,Henry Cejudo,Marlon Moraes,Marc Goddard,2019-06-08,"Chicago, Illinois, USA",Red,True,Bantamweight,5,0.0,...,2.0,0.0,0.0,8.0,Orthodox,162.56,162.56,135.0,31.0,32.0
1,Valentina Shevchenko,Jessica Eye,Robert Madrigal,2019-06-08,"Chicago, Illinois, USA",Red,True,Women's Flyweight,5,0.0,...,0.0,2.0,0.0,5.0,Southpaw,165.1,167.64,125.0,32.0,31.0
2,Tony Ferguson,Donald Cerrone,Dan Miragliotta,2019-06-08,"Chicago, Illinois, USA",Red,False,Lightweight,3,0.0,...,3.0,6.0,1.0,14.0,Orthodox,180.34,193.04,155.0,36.0,35.0
3,Jimmie Rivera,Petr Yan,Kevin MacDonald,2019-06-08,"Chicago, Illinois, USA",Blue,False,Bantamweight,3,0.0,...,1.0,0.0,0.0,6.0,Orthodox,162.56,172.72,135.0,26.0,29.0
4,Tai Tuivasa,Blagoy Ivanov,Dan Miragliotta,2019-06-08,"Chicago, Illinois, USA",Blue,False,Heavyweight,3,0.0,...,2.0,0.0,0.0,3.0,Southpaw,187.96,190.5,264.0,32.0,26.0


Fighters are represented by red and blue, by the color of their corners. In general, the red cornor is assigned to the side more notable and thus more favored to win. This is also observed in how the blue cornor, by custom, always walks out first. In title bouts, the current champion is always red and walks out second. Each row is a compilation of both fighter stats up to the the current fight. The columns contain complied average stats of all the fights previously. 
*** ADD INFO ON COLUMN LABEL DESCRIPTION ***
The data consists of logistics about the fight and data specific to each fighter. The data pertaining to the red fighter is prefixed with 'R_' and for the blue fighter 'B_'. The 'opp_' tag indicates data done by the opponent on the fighter. The target variable is 'Winner' which is the only column that tells you the outcome of the fight.

Since there are well over 100 columns of data, we will select a sample of the columns to explore. 

TIDYING AND SELECTING DATA

# PICK COLUMNS, MAKE SMALLER DATA FRAMES FOR EDA

Immediatly, we can see that depending on the cricumstances of the fight, red fighters have been favored over the years. Though this isn't a completely reliable metric in predicting fights, there is a variance in the favoring when adjusting the fight logistics. There are three types of fights: Title fights (always 5 rounds), 5 round non-title fights, and 3 round fights (always non-title). There are 5144 fights in the data set and the the five round fights make up 502 of them. The variance in red favoring is distinct in each of these three fight categories:

In [30]:
# Count fights by nunmber of rounds and title fight indicator.
title_df = data.loc[data['title_bout'] == True]
five_df = data.loc[(data['no_of_rounds'] == 5) & (data['title_bout'] == False)]
three_df = data.loc[data['no_of_rounds'] == 3]

print('The number of title fights: ', title_df.shape[0])
print('The number of five round non-title fights: ', five_df.shape[0])
print('The number three round fights: ', three_df.shape[0])

print('Probability red fighter wins in title fight: ', title_df[title_df['Winner'] == 'Red'].shape[0]/title_df.shape[0])
print('Probability red fighter wins in 5 round non-title fight: ', five_df[five_df['Winner'] == 'Red'].shape[0]/five_df.shape[0])
print('Probability red fighter wins in a 3 round fight: ', three_df[three_df['Winner'] == 'Red'].shape[0]/three_df.shape[0])

The number of title fights:  335
The number of five round non-title fights:  177
The number three round fights:  4523
Probability red fighter wins in title fight:  0.8029850746268656
Probability red fighter wins in 5 round non-title fight:  0.5536723163841808
Probability red fighter wins in a 3 round fight:  0.6615078487729383


Since the data is averaged based on each fight, it is important to include the statstics of the opponent ('opp') historical data since not all fights are the same length. The data includes many columns specifying types of attacks landed and attempted and is seperated in to grappling and striking sections. The striking data has averages Body, Clinch, Distance, Ground, Head, Leg, Knockdowns, and Significant Strikes and grappling data for Pass, Reverasals, Submissions, and Takedowns. A visual representation of some of these can be found here: https://www.theguardian.com/sport/ng-interactive/2016/jul/09/mixed-martial-arts-fighting-techniques-guide-ufc#:~:text=Boxing%2C%20Kickboxing%20and%20Muay%20Thai,and%20legs%20to%20throw%20strikes.

To arrange the data, we will seperate the specific types of attacks from the totals. For matters of a different perspective, we will also create a dataframe that includes non-fight related metrics such as age, height, reach, stance, etc. Lastly, we will create two more dataframes storing the specifcs of the striking and grappling data.

In [45]:
metrics_df = data[['R_fighter','B_fighter','Winner','B_Stance','B_Height_cms','B_Reach_cms','B_age','B_wins','B_losses','R_Stance','R_Height_cms','R_Reach_cms','R_age','R_wins','R_losses']]
metrics_df.head()   

Unnamed: 0,R_fighter,B_fighter,Winner,B_Stance,B_Height_cms,B_Reach_cms,B_age,B_wins,B_losses,R_Stance,R_Height_cms,R_Reach_cms,R_age,R_wins,R_losses
0,Henry Cejudo,Marlon Moraes,Red,Orthodox,167.64,170.18,31.0,4.0,1.0,Orthodox,162.56,162.56,32.0,8.0,2.0
1,Valentina Shevchenko,Jessica Eye,Red,Orthodox,167.64,167.64,32.0,4.0,6.0,Southpaw,165.1,167.64,31.0,5.0,2.0
2,Tony Ferguson,Donald Cerrone,Red,Orthodox,185.42,185.42,36.0,23.0,8.0,Orthodox,180.34,193.04,35.0,14.0,1.0
3,Jimmie Rivera,Petr Yan,Blue,Switch,170.18,170.18,26.0,4.0,0.0,Orthodox,162.56,172.72,29.0,6.0,2.0
4,Tai Tuivasa,Blagoy Ivanov,Blue,Southpaw,180.34,185.42,32.0,1.0,1.0,Southpaw,187.96,190.5,26.0,3.0,1.0


In [44]:
totals_df = data[['R_fighter','B_fighter','weight_class','Winner','B_avg_TOTAL_STR_att','B_avg_TOTAL_STR_landed','B_avg_opp_TOTAL_STR_att','B_avg_opp_TOTAL_STR_landed','B_avg_TOTAL_STR_att','R_avg_TOTAL_STR_landed','R_avg_opp_TOTAL_STR_att','R_avg_opp_TOTAL_STR_landed']]
totals_df.head()

Unnamed: 0,R_fighter,B_fighter,weight_class,Winner,B_avg_TOTAL_STR_att,B_avg_TOTAL_STR_landed,B_avg_opp_TOTAL_STR_att,B_avg_opp_TOTAL_STR_landed,B_avg_TOTAL_STR_att.1,R_avg_TOTAL_STR_landed,R_avg_opp_TOTAL_STR_att,R_avg_opp_TOTAL_STR_landed
0,Henry Cejudo,Marlon Moraes,Bantamweight,Red,66.4,23.6,53.8,19.2,66.4,69.1,110.5,43.3
1,Valentina Shevchenko,Jessica Eye,Women's Flyweight,Red,158.7,69.6,151.5,75.4,158.7,102.857143,158.142857,82.285714
2,Tony Ferguson,Donald Cerrone,Lightweight,Red,103.709677,52.548387,100.387097,49.774194,103.709677,63.4,102.133333,38.6
3,Jimmie Rivera,Petr Yan,Bantamweight,Blue,154.75,86.75,104.75,34.25,154.75,50.75,115.125,48.875
4,Tai Tuivasa,Blagoy Ivanov,Heavyweight,Blue,204.0,62.0,205.5,90.0,204.0,32.75,60.5,27.75


In [42]:
grappling_df = data[['R_fighter','B_fighter','weight_class','Winner','R_avg_PASS','R_avg_REV','R_avg_SUB_ATT','R_avg_TD_att','R_avg_TD_landed','R_avg_TD_pct','B_avg_PASS','B_avg_REV','B_avg_SUB_ATT','B_avg_TD_att','B_avg_TD_landed','B_avg_TD_pct']]
grappling_df.head()

Unnamed: 0,R_fighter,B_fighter,weight_class,Winner,R_avg_PASS,R_avg_REV,R_avg_SUB_ATT,R_avg_TD_att,R_avg_TD_landed,R_avg_TD_pct,B_avg_PASS,B_avg_REV,B_avg_SUB_ATT,B_avg_TD_att,B_avg_TD_landed,B_avg_TD_pct
0,Henry Cejudo,Marlon Moraes,Bantamweight,Red,1.2,0.0,0.1,5.3,1.9,0.458,0.4,0.0,0.4,0.8,0.2,0.1
1,Valentina Shevchenko,Jessica Eye,Women's Flyweight,Red,1.714286,0.142857,0.428571,5.142857,2.428571,0.601429,0.8,0.0,0.7,1.0,0.5,0.225
2,Tony Ferguson,Donald Cerrone,Lightweight,Red,0.333333,0.133333,1.0,0.933333,0.4,0.277333,0.935484,0.096774,0.354839,2.16129,0.677419,0.295484
3,Jimmie Rivera,Petr Yan,Bantamweight,Blue,0.125,0.0,0.0,2.25,0.625,0.10375,0.5,0.25,0.25,2.5,1.25,0.2875
4,Tai Tuivasa,Blagoy Ivanov,Heavyweight,Blue,0.25,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [43]:
striking_df = data[['R_fighter','B_fighter','weight_class','Winner','R_avg_BODY_att','R_avg_BODY_landed','R_avg_CLINCH_att','R_avg_CLINCH_landed','R_avg_DISTANCE_att','R_avg_DISTANCE_landed','R_avg_GROUND_att','R_avg_GROUND_landed','R_avg_HEAD_att','R_avg_HEAD_landed','R_avg_LEG_att','R_avg_LEG_landed','R_avg_KD','B_avg_BODY_att','B_avg_BODY_landed','B_avg_CLINCH_att','B_avg_CLINCH_landed','B_avg_DISTANCE_att','B_avg_DISTANCE_landed','B_avg_GROUND_att','B_avg_GROUND_landed','B_avg_HEAD_att','B_avg_HEAD_landed','B_avg_LEG_att','B_avg_LEG_landed','B_avg_KD']]
striking_df.head()

Unnamed: 0,R_fighter,B_fighter,weight_class,Winner,R_avg_BODY_att,R_avg_BODY_landed,R_avg_CLINCH_att,R_avg_CLINCH_landed,R_avg_DISTANCE_att,R_avg_DISTANCE_landed,...,B_avg_CLINCH_landed,B_avg_DISTANCE_att,B_avg_DISTANCE_landed,B_avg_GROUND_att,B_avg_GROUND_landed,B_avg_HEAD_att,B_avg_HEAD_landed,B_avg_KD,B_avg_LEG_att,B_avg_LEG_landed
0,Henry Cejudo,Marlon Moraes,Bantamweight,Red,21.9,16.4,17.0,11.0,75.0,26.5,...,0.0,62.6,20.6,2.6,2.0,48.6,11.2,0.8,7.6,5.4
1,Valentina Shevchenko,Jessica Eye,Women's Flyweight,Red,12.0,7.714286,9.285714,6.857143,88.142857,36.142857,...,7.3,124.7,42.1,2.4,1.9,112.0,32.0,0.0,12.3,10.2
2,Tony Ferguson,Donald Cerrone,Lightweight,Red,13.866667,8.666667,2.866667,1.733333,116.133333,49.466667,...,4.387097,84.741935,38.580645,5.516129,3.806452,67.645161,23.258065,0.645161,14.0,12.193548
3,Jimmie Rivera,Petr Yan,Bantamweight,Blue,18.25,10.25,5.875,4.125,104.875,41.0,...,11.0,109.5,48.75,13.0,10.5,116.25,53.75,0.5,3.0,2.5
4,Tai Tuivasa,Blagoy Ivanov,Heavyweight,Blue,7.75,6.75,11.0,7.25,50.75,24.75,...,2.0,201.0,59.5,0.0,0.0,184.5,45.0,0.0,2.0,2.0


EDA