<h1>A UFC Fight Predictor</h1>
<p>By Jawad Amin</p>

<p>In this notebook, I will be using a dataset containing information about UFC events from 21/3/2010 to 02/10/2021.</p>

<p>Prior to implementing any machine learning methods to the dataset, we will intially explore/clean the data.</p>
 
   <h2>Limitations</h2>
   <ul><li>Fight results depend on many immeasurable/unobtainable statistics such as private circumstances
            affecting a particular fighter on the day of a fight.</li>
        <li>Controversial decisions, or fights where many would argue an early stoppage occured cannot be measured solely using this dataset, and  cannot be considered in my research.</li>
        </ul>

We will first import the pandas library in order to be able to convert the .csv file to a dataframe.

In [1]:
import pandas as pd

<h2>Cleaning the dataset</h2>
<p>Within the following cells, I remove columns that I believe are irrelevant to the prediction of the fight such as whether the fight was a title bout or not, and I also removed fights before 8/9/2018. I wanted relatively more modern fights in my dataset, due to the constantly evolving nature of the MMA ruleset.</p>

<p>I used one-hot encoding in order to convert all the values I wished to use in my model from objects/strings into numerical values.</p>

In [2]:
df = pd.read_csv('ufc-master.csv')
df = df.drop(['date','location','country','title_bout',
             'R_ev','B_ev','B_odds','total_title_bout_dif'], axis = 1)
df.fillna(0,inplace=True)
df.drop(df.iloc[:,72:100],inplace = True, axis = 1)
df.drop(df.iloc[:,69:],inplace = True, axis = 1)
df.drop(df.iloc[:,4:58],inplace = True, axis = 1)

In [3]:
df= df.iloc[:1500]

In [4]:
pd.set_option('display.max_columns', None)

In [5]:
winner_map = {'Red': 1, 'Blue': 0}
df['Winner'] = df['Winner'].map(winner_map)

In [6]:
df

Unnamed: 0,R_fighter,B_fighter,R_odds,Winner,win_dif,loss_dif,total_round_dif,ko_dif,sub_dif,height_dif,reach_dif,age_dif,sig_str_dif,avg_sub_att_dif,avg_td_dif
0,Thiago Santos,Johnny Walker,-150.0,1,-8,-6,-32,-7,0,10.16,15.24,-8,-0.530000,0.600000,-0.370000
1,Alex Oliveira,Niko Price,170.0,0,-5,-3,-20,0,-1,2.54,0.00,-1,2.190000,0.300000,-1.480000
2,Misha Cirkunov,Krzysztof Jotko,110.0,0,3,1,25,0,-5,-5.08,0.00,-2,-0.850000,-1.600000,-3.330000
3,Alexander Hernandez,Mike Breeden,-675.0,1,-4,-2,-12,-2,0,2.54,-5.08,3,0.250000,0.000000,-1.570000
4,Joe Solecki,Jared Gordon,-135.0,0,1,3,11,1,-2,0.00,-5.08,5,2.580000,-0.600000,-0.310000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1495,Aljamain Sterling,Cody Stamann,-170.0,1,-4,3,-18,-1,-2,-2.54,-17.78,1,35.700000,-0.700000,2.100000
1496,Geoff Neal,Frank Camacho,-210.0,1,0,-1,5,0,-1,-2.54,-5.08,-1,116.000000,-1.000000,2.000000
1497,Charles Byrd,Darren Stewart,-170.0,0,0,-4,8,1,-1,5.08,2.54,7,11.200000,-1.000000,0.200000
1498,Diego Sanchez,Craig White,175.0,1,-16,10,-69,-5,0,10.16,10.16,8,-27.296296,-0.666667,-1.037037


In [7]:
df.dtypes


R_fighter           object
B_fighter           object
R_odds             float64
Winner               int64
win_dif              int64
loss_dif             int64
total_round_dif      int64
ko_dif               int64
sub_dif              int64
height_dif         float64
reach_dif          float64
age_dif              int64
sig_str_dif        float64
avg_sub_att_dif    float64
avg_td_dif         float64
dtype: object

In [22]:
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=50,min_samples_split=10,
                           random_state=1)
train= df.iloc[750:1500]
test = df.iloc[:749]
predictors = ['R_odds','win_dif','loss_dif','total_round_dif','ko_dif','sub_dif','height_dif','reach_dif','age_dif','sig_str_dif','avg_sub_att_dif','avg_td_dif']
rf.fit(train[predictors],train["Winner"])



In [23]:
preds=rf.predict(test[predictors])
from sklearn.metrics import accuracy_score
acc = accuracy_score(test["Winner"],preds)

In [24]:
acc

0.6288384512683578

In [25]:
combined = pd.DataFrame(dict(actual=test["Winner"],prediction=preds))

In [26]:
pd.crosstab(index=combined["actual"],columns=combined["prediction"])

prediction,0,1
actual,Unnamed: 1_level_1,Unnamed: 2_level_1
0,133,174
1,104,338


In [27]:
from sklearn.metrics import precision_score
precision_score(test["Winner"],preds)

0.66015625

IndexError: list index out of range