# Introduction

This notebook utilises MMA sports data from a Kaggle dataset. The aim of this notebook was too practice my data cleaning and preparation skills whilst also answering some rudimentry questions I personally had about the UFC and MMA in general. 

Set up

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
pd.options.display.max_columns = 200

In [None]:
# Import datasets
events = pd.read_csv('data/ufc_event_data.csv')
fights = pd.read_csv('data/ufc_fight_data.csv')
fight_stats = pd.read_csv('data/ufc_fight_stat_data.csv')
fighters = pd.read_csv('data/ufc_fighter_data.csv')

# Data Cleaning and Preparation

The way that I approached preparing the data was to first, get a quick look through all tables to get an idea of relevant columns, data types to set, missing values. During this process I'm making a mental note of some joins/merges I will likely make.  

### Table 1: `events`
- Convert column to datetime
- Drop unnecessary columns

In [None]:
print(events.shape)
print(events.info())
display(events.head())

(665, 7)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 665 entries, 0 to 664
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   event_id       665 non-null    int64 
 1   event_name     665 non-null    object
 2   event_date     665 non-null    object
 3   event_city     665 non-null    object
 4   event_state    616 non-null    object
 5   event_country  665 non-null    object
 6   event_url      665 non-null    object
dtypes: int64(1), object(6)
memory usage: 36.5+ KB
None


Unnamed: 0,event_id,event_name,event_date,event_city,event_state,event_country,event_url
0,665,UFC Fight Night: Dawson vs. Green,2023-10-07,Las Vegas,Nevada,USA,http://ufcstats.com/event-details/c8a49ff2acb6...
1,664,UFC Fight Night: Fiziev vs. Gamrot,2023-09-23,Las Vegas,Nevada,USA,http://ufcstats.com/event-details/c945adc22c2b...
2,663,UFC Fight Night: Grasso vs. Shevchenko 2,2023-09-16,Las Vegas,Nevada,USA,http://ufcstats.com/event-details/8fa2b0657236...
3,662,UFC 293: Adesanya vs. Strickland,2023-09-09,Sydney,New South Wales,Australia,http://ufcstats.com/event-details/ece280745f87...
4,661,UFC Fight Night: Gane vs. Spivac,2023-09-02,Paris,Ile-de-France,France,http://ufcstats.com/event-details/ef61d9f5176b...


In [None]:
# Convert event_date to datetime and extract year
events['event_date'] = pd.to_datetime(events['event_date'])

# Keep only needed columns
events = events[['event_id', # 'event_name', 
                 'event_date', # 'event_city', 'event_state',
                 # 'event_country', 'event_url'
]].copy()

print(events.shape)
print(events.info())

### Table 2: `fights`
- Drop unneccesary columns
- Deal with missing values
- 

In [None]:
print(fights.shape)
print(fights.info())
display(fights.head())

(7218, 15)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7218 entries, 0 to 7217
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   fight_id        7218 non-null   int64  
 1   event_id        7218 non-null   int64  
 2   referee         7186 non-null   object 
 3   f_1             7199 non-null   float64
 4   f_2             7205 non-null   float64
 5   winner          7203 non-null   float64
 6   num_rounds      7218 non-null   object 
 7   title_fight     7218 non-null   object 
 8   weight_class    7205 non-null   object 
 9   gender          7218 non-null   object 
 10  result          7218 non-null   object 
 11  result_details  7201 non-null   object 
 12  finish_round    7218 non-null   int64  
 13  finish_time     7218 non-null   object 
 14  fight_url       7218 non-null   object 
dtypes: float64(3), int64(3), object(9)
memory usage: 846.0+ KB
None


Unnamed: 0,fight_id,event_id,referee,f_1,f_2,winner,num_rounds,title_fight,weight_class,gender,result,result_details,finish_round,finish_time,fight_url
0,7218,664,Herb Dean,2976.0,2884.0,2884.0,5,F,Lightweight,M,KO/TKO,to \n Leg Injury,2,2:03,http://ufcstats.com/fight-details/23a604f46028...
1,7217,664,Mark Smith,1662.0,2464.0,1662.0,3,F,Featherweight,M,Decision,Unanimous,3,5:00,http://ufcstats.com/fight-details/da1b37edb8cc...
2,7216,664,Kerry Hatley,981.0,179.0,981.0,3,F,Women's Strawweight,F,KO/TKO,Punches to Head From Mount,2,2:42,http://ufcstats.com/fight-details/d8335b728604...
3,7215,664,Dan Miragliotta,3831.0,2974.0,3831.0,3,F,Welterweight,M,Submission,Rear Naked Choke,2,4:32,http://ufcstats.com/fight-details/bf647be41de3...
4,7214,664,Herb Dean,1108.0,2320.0,2320.0,3,F,Featherweight,M,Submission,Guillotine Choke From Bottom Guard,1,3:12,http://ufcstats.com/fight-details/6e1bf1b163b3...


# Questions