# Predicting UFC Fights With Supervised Learning
Adam Pfister - Oregon, USA - November 7, 2019  

This project focuses on UFC fight prediction using supervised learning models. The data comes from Kaggle (https://www.kaggle.com/rajeevw/ufcdata). A big thank you to the originator of this data, Rajeev Warrier. It is detailed and well put-together with zero missing data.  

Below in quotes is info about the two original datasets directly from its Kaggle page:  
 
" This is a list of every UFC fight in the history of the organisation. Every row contains information about both fighters, fight details and the winner. The data was scraped from ufcstats website. After fightmetric ceased to exist, this came into picture. I saw that there was a lot of information on the website about every fight and every event and there were no existing ways of capturing all this. I used beautifulsoup to scrape the data and pandas to process it. It was a long and arduous process, please forgive any mistakes. I have provided the raw files incase anybody wants to process it differently. This is my first time creating a dataset, any suggestions and corrections are welcome! Incase anyone wants to check out the work, I have all uploaded all the code files, including the scraping module here.  

Each row is a compilation of both fighter stats. Fighters are represented by 'red' and 'blue' (for red and blue corner). So for instance, red fighter has the complied average stats of all the fights except the current one. The stats include damage done by the red fighter on the opponent and the damage done by the opponent on the fighter (represented by 'opp' in the columns) in all the fights this particular red fighter has had, except this one as it has not occured yet (in the data). Same information exists for blue fighter. The target variable is 'Winner' which is the only column that tells you what happened. Here are some column definitions. "


### Overview  
1. __Explore Original Datasets__  
    > 1. Size and shape
    > 2. Sample view
    > 3. Missing data  
2. __Create New Variables and Clean Data__  
    > 1. Combine and create new variables
    > 2. Parse date/time
    > 3. Create dummy binary columns for 'Winner' category
    > 4. (Optional) trim dataset to include only 2011-2019 and four men's weight classes: featherweight, lightweight, welterweight, middleweight 
    > 5. Create subset dataframe of key variables  
3. __Exploratory Data Analysis__  
    > 1. Basic statistics
    > 2. Bar plot  
        - total wins (red vs blue)
    > 3. Count plot  
        - weight classes
    > 4. Distribution plots  
        - total fights (red vs blue)
        - wins (red vs blue)
        - age (red vs blue)  
    > 5. Pair plots    
        - offense and defense (red vs blue) compared to red wins  
        - win % and finish % (red vs blue) compared to red wins  
    > 6. Correlation matrix of key variables
4. __Supervised Learning__  
    > 1. Define and preprocess data
    > 2. Support vector machine
    > 3. Naive Bayes
    > 4. Logistic regression
    > 5. Decision tree/random forest  
5. __Summary and Conclusion__
6. __Acknowledgments__

In [1]:
# import libraries
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

import warnings
warnings.filterwarnings('ignore')

In [2]:
# import original kaggle datasets
df_clean = pd.read_csv(r'C:\Users\AP\Desktop\ufc-fight-stats-clean.csv')
df_raw = pd.read_csv(r'C:\Users\AP\Desktop\ufc-fight-stats.csv')

# change all columns to lower case for ease and consistency of typing
df_clean.columns = map(str.lower, df_clean.columns)
df_raw.columns = map(str.lower, df_raw.columns)

------------------

### Explore Original Datasets

#### Pre-processed Dataset  

1. Size and shape
2. Sample view
3. Missing data

In [None]:
# basic size and shape of preprocessed dataset
df_clean.info()

#### Observations
- The dataset contains 160 columns and approximately 3600 rows.

In [None]:
# sample view of dataset
df_clean.head()

In [None]:
# quantify missing data
total_missing = df_clean.isnull().sum().sort_values(ascending=False)
percent_missing = (df_clean.isnull().sum()/df_clean.isnull().count()).sort_values(ascending=False)
missing_data = pd.concat([total_missing, percent_missing], axis=1, keys=['Count', 'Percent'])

missing_data.head()

#### Raw Dataset   

1. Size and shape
2. Sample view
3. Missing data

In [None]:
# basic size and shape of dataset
df_raw.info()

#### Observations
- The raw dataset has 145 columns and approximately 5100 rows.

In [None]:
# sample view of dataset
df_raw.head()

In [None]:
# quantify missing data
total_missing = df_raw.isnull().sum().sort_values(ascending=True)
percent_missing = (df_raw.isnull().sum()/df_raw.isnull().count()).sort_values(ascending=True)
missing_data = pd.concat([total_missing, percent_missing], axis=1, keys=['Count', 'Percent'])

missing_data.head()

#### Observations
- There are several differences between the two datasets. The raw set contains variables not found in the preprocessed version. This includes each fighter's name, who refereed the bout, and the date and location of the fight. The preprocessed version drops these variables and adds some more detailed fight metrics.  
  
- We need to combine some categories from each dataset. First, we will parse the date/time column in the raw set and add it to the preprocessed set.  

- No missing data! Thank you to the originator of this data, Rajeev Warrier.  

- Let's clean the data and create/combine new variables based on my intuitions from years of training and watching mixed martial arts. 

--------------------

# Create New Variables and Clean Data  

1. Combine and create new variables
2. Parse date/time
3. Create dummy binary columns for 'Winner' category
4. (Optional) trim dataset to include only 2011-2019 and four weight classes: featherweight, lightweight, welterweight, middleweight
5. Create subset dataframe of key variables

### Creat Key Variables    
- __Winner:__ winner of fight (red or blue corner)
- __Win red:__ binary, 1 for red win, 0 for red loss
- __Experience score:__ interaction between total fights and total rounds fought
- __Streak score:__ interaction between current and longest win streak
- __Win %:__ total wins divided by total fights
- __Finish %:__ percentage of fights that end in KO/TKO, submission, or doctor's stoppage
- __Decision %:__ percentage of fights that end in judges' decision
- __Offense score:__ interaction between % significant strikes landed, submission attempts, takedowns landed, and knockdowns
- __Defense score:__ interaction between % significant strikes absorbed, submission attempts against, and opponent takedowns landed

In [3]:
# create new variables
# r = red corner
# b = blue corner

# win %
df_clean['r_win_pct'] = df_clean.r_wins  / (df_clean.r_wins + df_clean.r_losses + df_clean.r_draw)
df_clean['b_win_pct'] = df_clean.b_wins / (df_clean.b_wins + df_clean.b_losses + df_clean.b_draw) 

# total fights
df_clean['r_total_fights'] = df_clean.r_wins + df_clean.r_losses + df_clean.r_draw
df_clean['b_total_fights'] = df_clean.b_wins + df_clean.b_losses + df_clean.b_draw 

# % fights finished by ko/tko, submission, or doctor stoppage
df_clean['r_finish_pct'] = (df_clean['r_win_by_ko/tko'] + df_clean.r_win_by_submission +
                            df_clean.r_win_by_tko_doctor_stoppage) / df_clean.r_total_fights
df_clean['b_finish_pct'] = (df_clean['b_win_by_ko/tko'] + df_clean.b_win_by_submission +
                            df_clean.b_win_by_tko_doctor_stoppage) / df_clean.b_total_fights

# % fights ended in decision
df_clean['r_decision_pct'] = (df_clean.r_win_by_decision_majority + df_clean.r_win_by_decision_split +
                              df_clean.r_win_by_decision_unanimous) / df_clean.r_total_fights
df_clean['b_decision_pct'] = (df_clean.b_win_by_decision_majority + df_clean.b_win_by_decision_split +
                              df_clean.b_win_by_decision_unanimous) / df_clean.b_total_fights

# % total strikes landed 
df_clean['r_total_str_pct'] = df_clean.r_avg_total_str_landed / df_clean.r_avg_total_str_att
df_clean['b_total_str_pct'] = df_clean.b_avg_total_str_landed / df_clean.b_avg_total_str_att

# average % total strikes absorbed
df_clean['r_opp_total_str_pct'] = df_clean.r_avg_opp_total_str_landed / df_clean.r_avg_opp_total_str_att
df_clean['b_opp_total_str_pct'] = df_clean.b_avg_opp_total_str_landed / df_clean.b_avg_opp_total_str_att

# overall streak score
# interaction between current and longest win streak
df_clean['r_streak'] = df_clean.r_current_win_streak * df_clean.r_longest_win_streak
df_clean['b_streak'] = df_clean.b_current_win_streak * df_clean.b_longest_win_streak

# overall offense score
# interaction between significant strikes landed,
# average knowckdowns, submission attempts, average takedowns landed
# divide by 100
df_clean['r_offense'] = df_clean.r_avg_sig_str_pct * df_clean.r_avg_kd * df_clean.r_avg_sub_att * df_clean.r_avg_td_pct
df_clean['b_offense'] = df_clean.b_avg_sig_str_pct * df_clean.r_avg_kd * df_clean.b_avg_sub_att * df_clean.b_avg_td_pct

# overall defense score
# interaction between % significant strikes absorbed, 
# average submission attempts against, and % opponent takedown landed
df_clean['r_defense'] = df_clean.r_avg_opp_sig_str_pct * df_clean.r_avg_opp_sub_att * df_clean.r_avg_opp_td_pct
df_clean['b_defense'] = df_clean.b_avg_opp_sig_str_pct * df_clean.b_avg_opp_sub_att * df_clean.b_avg_opp_td_pct

# overall experience score
# interaction between total fights and total rounds fought
df_clean['r_experience'] = df_clean.r_total_fights * df_clean.r_total_rounds_fought
df_clean['b_experience'] = df_clean.b_total_fights * df_clean.b_total_rounds_fought

In [4]:
# parse date/time into separate columns
df_clean['date'] = pd.to_datetime(df_raw['date'])

df_clean['day'] = df_clean.date.dt.day
df_clean['month'] = df_clean.date.dt.month
df_clean['year'] = df_clean.date.dt.year

In [5]:
# create binary winner columns 
df_dum_win = pd.concat([df_clean, pd.get_dummies(df_clean['winner'], prefix='win', dummy_na=True)], axis=1)

# combine dummy columns to raw dataset
df_clean = pd.concat([df_dum_win, df_raw], axis=1)

# convert columns to lowercase
df_clean.columns = map(str.lower, df_clean.columns)

In [6]:
# drop duplicate columns
df_clean = df_clean.loc[:,~df_clean.columns.duplicated()]

# drop nulls
df_clean.dropna(axis=0, inplace=True)

# ----- OPTIONAL ----- comment or un-comment the code to turn and turn off and run the cell again
# drop all rows before 2011 for lack of detailed stats
df_clean = df_clean[(df_clean['year'] > 2011) & (df_clean['year'] < 2020)]

# ----- OPTIONAL ----- comment or un-comment the code to turn and turn off and run the cell again
#drop all weight classes except featherweight(145 lb), lightweight(155 lb),
# welterweight(170 lb), and middleweight(185 lb)
#df_clean = df_clean.loc[df_clean.weight_class.isin(['Featherweight', 'Lightweight', 'Welterweight', 'Middleweight'])]

In [7]:
# create new dataframe of key variables and rearrange by similarity groups
df_keys = df_clean[['winner',
                    'win_red',
                    'r_experience',  
                    'r_streak',
                    'r_win_pct',
                    'r_finish_pct',
                    'r_decision_pct',
                    'r_offense',
                    'r_defense',
                    'b_experience',
                    'b_streak',
                    'b_win_pct',
                    'b_finish_pct',
                    'b_decision_pct',
                    'b_offense',
                    'b_defense',
                    ]]

In [None]:
# basic size and shape of newly created clean dataframe
df_clean.info()

#### Observations
- The new clean dataset contains approximately 200 columns and 3100 rows

In [None]:
# sample view of newly created clean dataframe 
df_clean.head()

In [None]:
# sample view of newly created subset of key variables dataframe
df_keys.info()

#### Observations
- The dataset of key variables for modeling has 16 columns and approximately 3300 rows
- All feature variables are continuous floats
- Target variable option #1: 'winner' as categorical (red or blue)
- Target variable option #2: 'win_red' as numerical (1 for red win, 0 for red loss)

In [None]:
# sample view of newly created subset of key variables
df_keys.head()

-----------------------

# Exploratory Data Analysis  

1. Basic stats
2. Bar plot  
    > - wins (red vs blue)
3. Count plot  
    > - weight classes
4. Distribution plots  
    > - total fights (red vs blue)
    > - total wins (red vs blue)
    > - age (red vs blue)
5. Pair plots   
    > - offense and defense (red vs blue) compared to red wins  
    > - win % and finish % (red vs blue) compared to red wins  
6. Correlation matrix

In [None]:
# basic statistics
df_keys.describe()

#### Observations 
- Except for the 'experience' and 'streak' variables, all standard deviations are small. Outliers should be checked for in these two variables.
- All of the variables besides 'experience' seem to contain zeros as their minimum. Something does not seem right here. Again, outliers should be investigated.

In [None]:
# bar chart red vs blue total wins
plt.figure(figsize=(8,4))
sns.countplot(df_clean.winner)
plt.title('Total Win Count')
plt.xlabel('Winner')
plt.ylabel('Count')

plt.show()

# total win count
count = df_clean.winner.value_counts()
print('Total Win Count')
print('')
print(count)
print('')
print('')

# win %
print('Win %')
print('')
print(count / (count[0] + count[1]))

#### Observations  

- Out of approximately 3100 total fights, the red corner has won just under 2000 of them, or 64%.
- The red corner is historically reserved for the favored, more experienced of the two fighters, so this makes sense.
- The above chart is simple but important. Remember our goal is to predict the outcome of a fight. Also remember that the red corner is typically the favored, more experienced fighter. This means that if your only strategy for predicting fights was always choosing the red corner, you would be correct 64% of the time. This number is now our baseline score to beat. If any of the machine learning models score better than 64% accuracy, it could be considered a success. Anything below 64% and the models are worthless because we could always fall back on choosing red every time. 

In [None]:
# countplot of weight classes
plt.figure(figsize=(8,4))
sns.countplot(df_clean.weight_class, order=df_clean.weight_class.value_counts().index)
plt.title('Total Fight Count by Weight Class')
plt.xlabel('Weight Class')
plt.xticks(rotation='vertical')
plt.ylabel('Fight Count')

plt.show()

# print totals
print(df_clean.weight_class.value_counts())

#### Observations  
- Lightweight (155 lbs) and welterweight (170 lbs) are the most common weight classes and are almost equal in count at approximately 560 each out of 3100 total fights, occuring 36% of the time.
- Featherweight (145 lbs) and middleweight (185 lbs) are the next two runnerups and also almost equal each other in count at approximately 375 fights each out of 3100 total fights, occuring 24% of the time.
- The featherweight, lightweight, welterweight, and middleweight divisions account for approximately 60% of all fights.

In [None]:
# distributions comparison

# total fights distribution
fig, ax = plt.subplots(1, figsize=(8, 4))
sns.distplot(df_clean.b_total_fights)
sns.distplot(df_clean.r_total_fights)
plt.title('Total Fights Distribution')
plt.xlabel('# Fights')
plt.legend(labels=['Blue','Red'], loc="upper right")

# wins distribution
fig, ax = plt.subplots(1, figsize=(8, 4))
sns.distplot(df_clean.b_wins)
sns.distplot(df_clean.r_wins)
plt.title('Wins Distribution')
plt.xlabel('# Wins')
plt.legend(labels=['Blue','Red'], loc="upper right")

# age distribution
fig, ax = plt.subplots(1, figsize=(8, 4))
sns.distplot(df_clean.b_age)
sns.distplot(df_clean.r_age)
plt.title('Age Distribution')
plt.xlabel('Age')
plt.legend(labels=['Blue','Red'], loc="upper right")

plt.show()

# calculate red and blue mean and mode ages
r_mean_age = df_clean.r_age.mean()
r_mode_age = df_clean.r_age.mode()
b_mean_age = df_clean.b_age.mean()
b_mode_age = df_clean.b_age.mode()

# print red and blue mean ages
print('Mean Fighter Age')
print('')
print('Red: ', (r_mean_age))
print('Blue: ', (b_mean_age))

#### Observations  
- The red and blue corner distributions have similar shapes to each other in their respective graphs.
- There are more blue fighters with < 5 wins than red fighters, and there are more red fighters with > 5 wins than blue fighters. This makes sense, as historically the red corner has been reserved for the favored, more experienced fighter.
- The mean age of red and blue are essentially equal at 30 years old. This is surprising. I would have expected the red corner to have a slightly higher mean age since the red corner is typically reserved for the favored, more experienced fighter.  

In [None]:
# pairplot red vs blue offense and defense
sns.pairplot(df_keys[['winner',
                      'b_offense',
                      'r_offense',
                      'b_defense',
                      'r_defense',
                      ]], hue='winner')

plt.show()

In [None]:
# pairplot red vs blue win % and finish % compared to red wins
sns.pairplot(df_keys[['winner',
                      'r_win_pct',
                      'b_win_pct',
                      'r_finish_pct',
                      'b_finish_pct',
                      ]], hue='winner')

plt.show()

In [None]:
# key variables correlation
corr = df_keys.corr()

# generate mask for upper triangle
mask = np.zeros_like(corr, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True

# plot heatmap correlation
plt.figure(figsize=(25,10))
sns.heatmap(corr, mask=mask, annot=True, cbar_kws={"shrink": .75}, center=0)

plt.show()

#### Observations 
- Surprisingly, none of the variables seem to be linearly correlated with the target variable. This does not mean we can rule out non-linear correlation at the moment.
- Some variables are correlated with each other. Most notably, 'win %' and 'finish %'. This makes sense since if a fighter has a higher 'finish %' it almost guarantees a relatively high 'win %'. It is probably not common to see a fighter with a high 'win %' and a very low 'finish %'. The UFC greatly values the entertainment factor when putting on shows, not just the caliber of fighters. A fighter with a high win % but always goes to decision typically gets cut from the promotion. It is not enough to win fights; one is also required to be entertaining as well.  

--------------------------

# Supervised Learning  
1. Define and preprocess data
2. Support vector machines
3. Naive Bayes
4. Logistic regression
5. Decision tree/random forest

In [8]:
# import libraries
import scipy
import sklearn
import statsmodels.api as sm
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn import linear_model
from sklearn import tree
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import BernoulliNB
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import mean_absolute_error
from statsmodels.tools.eval_measures import mse, rmse
from statsmodels.tsa.stattools import acf

In [13]:
# define and preprocess data before modeling

# target variable and feature set
Y = df_keys.win_red
X = df_keys[['r_experience',  
             'r_win_pct',
             'r_finish_pct',
             'r_offense',
             'r_defense',
             'b_experience',
             'b_win_pct',
             'b_finish_pct',
             'b_offense',
             'b_defense'
             ]]

# train/test split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=123)

# define standard scaler
sc = StandardScaler()

# fit standard scaler
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

### Why Support Vector Machines  
- Common algorithm for predicting a categorical outcome, which is our goal
- Good at finding solutions if variables are non-linearly separable, which is possible with our data

In [14]:
# support vector machines

# fit model
model = svm.SVC()
results = model.fit(X_train, y_train)

# predict
y_preds = results.predict(X_test)

# print results
print('Train Set Observations: {}'.format(X_train.shape[0]))
print('Test Set Observations:  {}'.format(X_test.shape[0]))
print('')
print('')
print('Support Vector Machine Accuracy Score')
print('')
print('Train Set: ', accuracy_score(y_train, model.predict(X_train)))
print('Test Set: ', accuracy_score(y_test, model.predict(X_test)))

Train Set Observations: 2484
Test Set Observations:  622


Support Vector Machine Accuracy Score

Train Set:  0.643719806763285
Test Set:  0.6784565916398714


#### Observations
- Train and test set are similar at approximately 64% and 68%, which indicates the model is not overfitting.
- Not a particularly high accuracy score, but so far it performs better than the baseline strategy of always choosing the red corner to win (64% accuracy).

### Why Naive Bayes
- Common  classification algorithm for predicting a categorical outcome, which is our goal
- Assumes independent variables, probably not the case with this dataset
- Curiosity without high hopes

In [None]:
# naive bayes

# fit to model
model = GaussianNB()
model.fit(X_train, y_train)

# print results
print('Naive Bayes Accuracy Score')
print('')
print('Train Set: ', accuracy_score(y_train, model.predict(X_train)))
print('Test Set: ', accuracy_score(y_test, model.predict(X_test)))

#### Observations
- Train and test set accuracy scores are similar, but 43% is a terrible score. You could achieve far better results by simply choosing the red corner to win every fight (64% accuracy).
- Naive Bayes may not be the best option here

### Why Logistic Regression  
- Common algorithm for predicting a categorical outcome, which is our goal
- Good at predicting the probability of binary outcomes, which is our goal

In [None]:
# logistic regression

# fit model
model = LogisticRegression()
model.fit(X_train, y_train)

print('Logistic Regression Accuracy Score')
print('')
print('Train Set: ', accuracy_score(y_train, model.predict(X_train)))
print('Test Set: ', accuracy_score(y_test, model.predict(X_test)))

#### Observations
- Test set accuracy increased by 5% over the train set. 
- Logistic regression and support vector machine have performed the best so far at 68%, beating our baseline score of 64% accuracy.

### Why Decision Tree and Random Forest
- Common algorithm for predicting a categorical outcome, which is our goal
- Good at learning non-linear relationships, which our dataset could potentially possess

In [None]:
# decision tree
tree_model = DecisionTreeClassifier()
rf_model = RandomForestClassifier()

# fit models
tree_model.fit(X_train, y_train)
rf_model.fit(X_train, y_train)

# print results
print('Decision Tree Accuracy Score')
print('')
print('Train Set: ', accuracy_score(y_train, tree_model.predict(X_train)))
print('Test Set: ', accuracy_score(y_test, tree_model.predict(X_test)))
print('')
print('')
print('Random Forest Accuracy Score')
print('')
print('Train Set: ', accuracy_score(y_train, rf_model.predict(X_train)))
print('Test Set: ', accuracy_score(y_test, rf_model.predict(X_test)))

#### Observations
- Accuracy for both decision tree and random forest train set were very high at 99% and 98%, respectively. This suggests the model could be overfitting. It performs well on the known training data, but severely underperforms on the new test set.
- Accuracy for both test sets fell dramatically to 57% and 56%. 
- The train and test sets could possibly have different distributions.

# Summary and Conclusion  

After loading the two original datasets, we discovered that there were some distinct variables in each, and we needed some from both. After joining the datasets, duplicate variables were dropped, which left a clean new set to work with. New variables were then created and combined. Finally, a subset dataframe of key variables was created for modeling. 

Next came exploratory data analysis. We found out that the red corner wins on average 64% of the time. We chose this as our baseline prediction score to beat. Some other interesting facts arose throughout this phase of the process:  
- Total fight count is dominated by just four weight classes: featherweight (145 lbs), lightweight (155 lbs), welterweight (170 lbs), and middleweight (185 lbs), and account for 60% of all fights.
- Mean fighter age is 30 years old, which was a bit surprising to learn. Most people think of fighting as a young man's game. This result appears to refute that statement.
- No single variable was found to be highly linearly correlated with the target variable. This was very surprising to find out. Professional fighting is a volatile sport. If red consistently wins greater than 50% there should presumably be some combination of features that puts them at a 64% win rate.

Our goal of this project was to predict the outcome of UFC fights using supervised learning. Four models were used: support vector machines, naive Bayes, logistic regression, and decision tree/random forest. Both naive Bayes and decision tree/random forest scored terribly and far below the baseline-to-beat of 64% accuracy. Support vector machines and logistic regression scored roughly equal to 64% on their train sets but scored on the test set with 68% accuracy. 

A score of 68% beats our initial baseline accuracy score of 64%. A small success but a success nonetheless. I believe this score could be improved by implementing the following strategy:
1. Address and correct outliers
2. Further refining or combining of features with a focus on win/finish %, height/reach advantage, and fighting style (striker, wrestler)
3. Identifying the "typical" fighter profile in more detail. So far we know it is a male approximately 30 years old who fights in one of the four main weight classes. 
4. Deeper exploratory data analysis to discover not-so-obvious correlations and connections between variables
5. Further model parameter tuning and experimenting with new models 

Professional fighting is an extremely volatile sport. Even a champion on a winning streak can lose from a split second minor mistake. Fighters commonly perform injured, severely impairing their potential while highlighting their opponent who may not warrant it. Even with unlimited amounts of data, it is entirely possible that predicting fights is a fool's errand. 

# Acknowledgments
- Rajeev Warrier and his Kaggle dataset (https://www.kaggle.com/rajeevw/ufcdata)
- Shubhabrata Roy (Thinkful mentor)
- Any of you who let me know about an error or typo in any of the above (for real, it would be appreciated)