In [2]:
import pandas as pd
import plotly.express as px

# Let's Start By Importing Our Nice, Clean Data

In [3]:
fighters_df = pd.read_csv('fighters_cleaned.csv')

# delete useless index column
fighters_df = fighters_df.iloc[: , 1:]
fighters_df.head()

Unnamed: 0,First,Last,Nickname,Ht.,Wt.,Reach,Stance,W,L,D,weight_class
0,Tom,Aaron,,70,155.0,71,Unknown,5.0,3.0,0.0,welterweight
1,Danny,Abbadi,The Assassin,71,155.0,71,Orthodox,4.0,6.0,0.0,welterweight
2,Nariman,Abbasov,Bayraktar,68,155.0,66,Orthodox,28.0,4.0,0.0,welterweight
3,David,Abbott,Tank,72,265.0,71,Switch,10.0,15.0,0.0,super heavyweight
4,Hamdy,Abdelwahab,The Hammer,74,264.0,72,Southpaw,5.0,0.0,0.0,heavyweight


I'll start by creating a scatter matrix of all of the numeric fighter stats in order to assess if there are any trends I might be able to use when training a  predictive model

In [34]:
# create a scatter matrix 
fig = px.scatter_matrix(fighters_df, dimensions =["Ht.", "Reach", "W", "L", "D"])
fig.show()

Hmm...The only super obvious trend we can see is that Reach increases with Height. Let's see if we can find any trends based on Stance:

In [69]:
# first we'll group the dataframe by stance and count the number of wins, losses, and draws for each stance
stance_grouped_df = fighters_df.groupby('Stance').agg({'W': 'sum', 'L': 'sum', 'D': 'sum'}).reset_index()
melted_df = pd.melt(stance_grouped_df, id_vars='Stance', var_name='Result', value_name='Count')

# next we will create a bar graph for the data and separate each bar by win, loss, or draw
fig2 = px.bar(melted_df, x='Stance', y='Count', color='Result', barmode='stack')
fig2.show()

Just by eye-balling it, it seems as if the fighters' stance makes little difference, and open + sideways stances are so rare, they barely register on our graph. 

I'll break this down into concrete numbers for a closer look, instead:

In [39]:
print(stance_grouped_df)

        Stance       W       L      D
0  Open Stance    13.0     6.0    0.0
1     Orthodox  4697.0  2134.0  128.0
2     Sideways     2.0     3.0    0.0
3     Southpaw  1441.0   702.0   75.0
4       Switch   383.0   156.0    6.0
5      Unknown   922.0   544.0   17.0


Now we're getting somewhere! Or so it seems. We can see a clear trend that orthodox stance is the most common, but that doesn't tell us much as far as predicting a fight outcome. 

Let's examine the chances of winning for each stance:

In [40]:
# we'll create a for-loop that pulls the sum of wins, losses, and draws for each stance, then appends a list with the percentages of each
data = []

for stance in stance_grouped_df['Stance'].unique():
    total_wins = stance_grouped_df[stance_grouped_df['Stance'] == stance]['W'].sum()
    total_losses = stance_grouped_df[stance_grouped_df['Stance'] == stance]['L'].sum()
    total_draws = stance_grouped_df[stance_grouped_df['Stance'] == stance]['D'].sum()
    total_fights = total_wins + total_losses + total_draws
    win_percentage = round((total_wins / total_fights) * 100, 2)
    loss_percentage = round((total_losses / total_fights) * 100, 2)
    draw_percentage = round((total_draws / total_fights) * 100, 2)
    
    data.append({'Stance': stance, 'Win Percentage': win_percentage, 'Loss Percentage': loss_percentage, 'Draw Percentage': draw_percentage})

# next, we can create a dataframe to easily view the data
stance_data = pd.DataFrame(data)
stance_data

Unnamed: 0,Stance,Win Percentage,Loss Percentage,Draw Percentage
0,Open Stance,68.42,31.58,0.0
1,Orthodox,67.5,30.67,1.84
2,Sideways,40.0,60.0,0.0
3,Southpaw,64.97,31.65,3.38
4,Switch,70.28,28.62,1.1
5,Unknown,62.17,36.68,1.15


Now we have a clearer idea of how stance may affect a fighter's chance of winning. 

Let's take this one step further and see if winning percentage for stance changes by weight class:

In [70]:
# we'll create a new grouped df to account for weight class and aggregate that data according to wins, losses, and draws
weight_grouped_df = fighters_df.groupby(['weight_class', 'Stance']).agg({'W': 'sum', 'L': 'sum', 'D': 'sum'}).reset_index()

# then, we calculate the total fights and percentages
weight_grouped_df['Total Fights'] = weight_grouped_df['W'] + weight_grouped_df['L'] + weight_grouped_df['D']
weight_grouped_df['Win Percentage'] = round(weight_grouped_df['W'] / weight_grouped_df['Total Fights'] * 100, 2)

# and we'll create a new df with the relevant columns, sort by weight class and stance
result_cols = ['Win Percentage']
result_df = weight_grouped_df[['weight_class', 'Stance'] + result_cols].sort_values(['weight_class', 'Stance'])

result_df.head()

Unnamed: 0,weight_class,Stance,Win Percentage
0,bantamweight,Orthodox,71.9
1,bantamweight,Southpaw,71.25
2,bantamweight,Switch,70.0
3,bantamweight,Unknown,75.0
4,featherweight,Orthodox,69.17


Finally, we'll create a heat map to show win percentage by weight and stance, and see if we can infer anything:

In [71]:
#we'll need to import more fancy graphs with graph_objs
import plotly.graph_objs as go

# create a pivoted df so our stances become our columns:
pivoted_df = weight_grouped_df.pivot(index='weight_class', columns='Stance', values='Win Percentage')

# create a heatmap!
heat = go.Figure(data=go.Heatmap(z=pivoted_df.values, x=pivoted_df.columns, y=pivoted_df.index, colorscale='Viridis'))
heat.update_layout(title='Win Percentage by Stance and Weight Class', xaxis_title='Stance', yaxis_title='Weight Class')

heat.show()

Voila! We have a much more detailed break-down of how stance affects a fighter's chance of winning by weight class. 

Right off the bat, we can see that being a flyweight switch gives you an 88% chance of winning, whereas the light heavyweight class really favors the open stance. And if you're a batamweight fighter, stance has a pretty negligible effect on whether you'll win. 

# Conclusion:
It was interesting to find that something like Stance, which I assumed would be innocuous, actually had pretty clear trends for certain weight classes.

My initial goal was to create a predictive model to assess winners of future fights. While it's clear that I need more data to create an accurate prediction, this analysis gives me a really good starting point as I continue building my model in the future.



*Note: while there are fighters on the official UFCSTATS website that are listed as 'super heavyweight', UFC has not officially made a Super Heavyweight class as of yet.*