# 🔹UFC Fight Predictor Feature Engineering

<div style="text-align: center;">
  🔹 <img src="../img/ufc_logo.png" width="50" /> 🔹
</div>

## 1. Import Libraries and Setup Environment

In [27]:
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Get the current working directory
current_dir = os.getcwd()

# Navigate to the project root
project_root = os.path.abspath(os.path.join(current_dir, '..'))

# Import from /src
sys.path.append(os.path.join(project_root, 'src'))
from utils.helpers import *

<div style="text-align: center;">
  🔹 <img src="../img/ufc_logo.png" width="50" /> 🔹
</div>

## 2. Load Data

In [2]:
# Define the path to the CSV file
file_path = os.path.join(project_root, 'data', 'processed', 'ufc_etl.csv')

# Load the CSV into a DataFrame
try:
    ufc_data = pd.read_csv(file_path)
    print_header(f"Data successfully loaded: {ufc_data.shape[0]} rows, {ufc_data.shape[1]} columns.", color='bright_green')
except Exception as e:
    print_header(f"Error loading training data: {e}", color='bright_red')

[92m╔════════════════════════════════════════════════════╗
║  Data successfully loaded: 6541 rows, 69 columns.  ║
╚════════════════════════════════════════════════════╝[0m


<div style="text-align: center;">
  🔹 <img src="../img/ufc_logo.png" width="50" /> 🔹
</div>

## 3. Feature Engineering

- **Feature Vector Construction**: The feature vector for each fight is represented as:
  $$x = \text{features-fighter-blue} - \text{features-fighter-red}$$
- The task is framed as a binary classification problem, where the model predicts either **0** (Fighter Red wins) or **1** (Fighter Blue wins).

#### Create the target value: **0** (Fighter Red wins) or **1** (Fighter Blue wins)

In [3]:
ufc_data['label'] = ufc_data['Winner'].apply(lambda x: 1 if x == 'Blue' else 0)
ufc_data=ufc_data.drop('Winner', axis=1)
ufc_data = pd.get_dummies(ufc_data, columns=['TitleBout', 'Gender'], drop_first=True)

### Categorical Data

### Stance
- **Orthodox Stance:** A fighter in orthodox stance leads with their left foot and left hand, making it the natural stance for right-handed individuals. 
- **Southpaw Stance:** A fighter in southpaw stance leads with their right foot and right hand, making it the natural stance for left-handed individuals. 
- **Open Stance Matchup:** When one fighter is orthodox and the other is southpaw, it creates an "open stance" matchup. This differs from a "closed stance" where both fighters are in the same stance (e.g., both orthodox or both southpaw)
- **Switch:** When a fighter switches from an orthodox to a southpaw stance or vice versa, it can disrupt their opponent's rhythm, causing them to miss or react incorrectly to strikes. 

In [4]:
ufc_data['BlueStance'].unique()

array(['Southpaw', 'Orthodox', 'Switch', 'Open Stance'], dtype=object)

In [5]:
# Para BlueStance
ufc_data['BlueStance_Orthodox'] = (ufc_data['BlueStance'] == 'Orthodox').astype(int)
ufc_data['BlueStance_Southpaw'] = (ufc_data['BlueStance'] == 'Southpaw').astype(int)
ufc_data['BlueStance_Switch'] = (ufc_data['BlueStance'] == 'Switch').astype(int)
ufc_data['BlueStance_OpenStance'] = (ufc_data['BlueStance'] == 'Open Stance').astype(int)

# Para RedStance (mismo enfoque)
ufc_data['RedStance_Orthodox'] = (ufc_data['RedStance'] == 'Orthodox').astype(int)
ufc_data['RedStance_Southpaw'] = (ufc_data['RedStance'] == 'Southpaw').astype(int)
ufc_data['RedStance_Switch'] = (ufc_data['RedStance'] == 'Switch').astype(int)
ufc_data['RedStance_OpenStance'] = (ufc_data['RedStance'] == 'Open Stance').astype(int)

#Borramos las columnas
ufc_data=ufc_data.drop(['BlueStance','RedStance'], axis=1)

# Create categorical column list

#categorical_columns.append('BlueStance_Orthodox', 'RedStance_Orthodox', 
#                            'BlueStance_Southpaw', 'RedStance_Southpaw', 
#                            'BlueStance_Switch', 'RedStance_Switch', 
#                            'BlueStance_OpenStance', 'Redtance_OpenStance')

### Better Rank

In [6]:
display(ufc_data['BetterRank'].unique())

array(['Red', 'neither', 'Blue'], dtype=object)

In [7]:
# Para BlueStance
ufc_data['BetterRank_Red'] = (ufc_data['BetterRank'] == 'Red').astype(int)
ufc_data['BetterRank_Blue'] = (ufc_data['BetterRank'] == 'Blue').astype(int)
ufc_data['BetterRank_neither'] = (ufc_data['BetterRank'] == 'neither').astype(int)
#Borramos las columnas
ufc_data=ufc_data.drop(['BetterRank'], axis=1)
#categorical_columns.append('BetterRank_Red', 'BetterRank_Blue', 'BetterRank_neither')

In [8]:
ufc_data['Gender_MALE'] = ufc_data['Gender_MALE'].astype(int)  # True → 1, False → 0
#categorical_columns.append('Gender_MALE')

In [9]:
categorical_columns = ['BlueStance_Orthodox', 'RedStance_Orthodox', 
                        'BlueStance_Southpaw', 'RedStance_Southpaw', 
                        'BlueStance_Switch', 'RedStance_Switch', 
                        'BlueStance_OpenStance', 'Redtance_OpenStance', 
                        'BetterRank_Red', 'BetterRank_Blue', 
                        'BetterRank_neither', 'Gender_MALE']

### Creación de features

In [10]:
#Tasa de Finalización (Red y Blue):
RedFinishRate = (ufc_data['RedWinsByKO'] + ufc_data['RedWinsBySubmission'] + ufc_data['RedWinsByTKODoctorStoppage']) / ufc_data['RedWins'].replace(0, 1)
BlueFinishRate = (ufc_data['BlueWinsByKO'] + ufc_data['BlueWinsBySubmission'] + ufc_data['BlueWinsByTKODoctorStoppage']) / ufc_data['BlueWins'].replace(0, 1)
ufc_data['FinishRate'] = BlueFinishRate - RedFinishRate
#numerical_columns.append('FinishRate')

In [11]:
#Win Ratio (Red y Blue)
RedWinRatio = ufc_data['RedWins'] / (ufc_data['RedWins'] + ufc_data['RedLosses']).replace(0, 1)
BlueWinRatio = ufc_data['BlueWins'] / (ufc_data['BlueWins'] + ufc_data['BlueLosses']).replace(0, 1)
ufc_data['WinRatio']= BlueWinRatio - RedWinRatio 

In [12]:
#Edad vs. Experiencia
RedExpPerYear = ufc_data['RedTotalRoundsFought'] / ufc_data['RedAge']
BlueExpPerYear = ufc_data['BlueTotalRoundsFought'] / ufc_data['BlueAge']
ufc_data['ExpPerYear']=BlueExpPerYear - RedExpPerYear 

In [13]:
#Reach Advantage Ratio
ufc_data['ReachAdvantageRatio'] = ufc_data['RedReachCms'] / ufc_data['BlueReachCms']

In [14]:
#Height/Reach Ratio (para cada peleador)
RedHeightReachRatio = ufc_data['RedHeightCms'] / ufc_data['RedReachCms']
BlueHeightReachRatio = ufc_data['BlueHeightCms'] / ufc_data['BlueReachCms']
ufc_data['HeightReachRatio']= BlueHeightReachRatio - RedHeightReachRatio

In [15]:
BlueWinsByDecision = ufc_data[['BlueWinsByDecisionMajority', 'BlueWinsByDecisionSplit', 'BlueWinsByDecisionUnanimous']].sum(axis=1)
RedWinsByDecision = ufc_data[['RedWinsByDecisionMajority', 'RedWinsByDecisionSplit', 'RedWinsByDecisionUnanimous']].sum(axis=1)
ufc_data['WinsByDecision']=BlueWinsByDecision- RedWinsByDecision

In [16]:
BlueDecisionRate = BlueWinsByDecision / ufc_data['BlueWins'].replace(0, 1)  # Evitar división por cero
RedDecisionRate = RedWinsByDecision  / ufc_data['RedWins'].replace(0, 1)  # Evitar división por cero
ufc_data['DecisionRate']= BlueDecisionRate - RedDecisionRate

### Selección de features

Dado lo siguiente:
-  LoseStreakDif: BlueCurrentLoseStreak - RedCurrentLoseStreak
-  WinStreakDif: BlueCurrentWinStreak - RedCurrentWinStreak
-  LongestWinStreakDif: BlueLongestWinStreak - 'RedLongestWinStreak
-  WinDif: BlueWins - RedWins
-  LossDif: BlueLosses - RedLosses
-  TotalRoundDif: BlueTotalRoundsFought - RedTotalRoundsFought
-  TotalTitleBoutDif: BlueTotalTitleBouts - RedTotalTitleBouts
-  KODif: BlueWinsByKO - RedWinsByKO
-  SubDif: BlueWinsBySubmission - RedWinsBySubmission
-  HeightDif: BlueHeightCms - RedHeightCms
-  ReachDif: BlueReachCms - RedReachCms
-  AgeDif: BlueAge - RedAge
  
Se eliminan ciertas columnas que pueden ser redundantes.

In [17]:
ufc_data=ufc_data.drop(['BlueCurrentLoseStreak', 'RedCurrentLoseStreak','BlueCurrentWinStreak',
                      'RedCurrentWinStreak','BlueLongestWinStreak', 'RedLongestWinStreak', 'BlueWins',
                      'RedWins','BlueLosses','RedLosses', 'BlueTotalRoundsFought','RedTotalRoundsFought',
                      'BlueTotalTitleBouts', 'RedTotalTitleBouts', 'BlueWinsByKO', 'RedWinsByKO', 'BlueWinsBySubmission',
                      'RedWinsBySubmission','BlueHeightCms','RedHeightCms','BlueReachCms','RedReachCms',
                      'BlueAge', 'RedAge'], axis=1)

También aquellas relacionadas con los features construidos:
- WinsByDecision
- DecisionRate
- FinishRate


In [18]:
ufc_data=ufc_data.drop(['BlueWinsByDecisionSplit', 'BlueWinsByDecisionUnanimous',
       'BlueWinsByTKODoctorStoppage', 'RedWinsByDecisionMajority',
       'RedWinsByDecisionSplit', 'RedWinsByDecisionUnanimous',
       'RedWinsByTKODoctorStoppage','BlueWinsByDecisionMajority'], axis=1)

### Columnas de baja varianza
- BlueDraws
- RedDraws
- BlueWeightLbs
- RedWeightLbs
- TitleBout_True 

In [19]:
ufc_data=ufc_data.drop(['BlueDraws','RedDraws','BlueWeightLbs','RedWeightLbs','TitleBout_True'], axis=1)

In [20]:
# Preview the first few records
display(ufc_data.head())
display(ufc_data.columns)
# Para ver los tipos de dato de cada columna:
display(ufc_data.dtypes)

Unnamed: 0,NumberOfRounds,BlueAvgSigStrLanded,BlueAvgSigStrPct,BlueAvgSubAtt,BlueAvgTDLanded,BlueAvgTDPct,RedAvgSigStrLanded,RedAvgSigStrPct,RedAvgSubAtt,RedAvgTDLanded,...,BetterRank_Red,BetterRank_Blue,BetterRank_neither,FinishRate,WinRatio,ExpPerYear,ReachAdvantageRatio,HeightReachRatio,WinsByDecision,DecisionRate
0,5,4.13,0.36,0.0,1.96,0.46,3.88,0.38,0.2,3.79,...,1,0,0,0.283333,-0.035714,-0.477778,0.947368,-0.065058,-4,-0.283333
1,3,7.36,0.56,1.1,1.24,0.23,4.67,0.5,0.4,1.04,...,0,0,1,0.293233,0.042614,-1.222222,1.0,0.028571,-9,-0.293233
2,3,3.32,0.48,0.2,2.26,0.28,4.44,0.53,0.4,0.54,...,1,0,0,0.5,0.095238,-0.077799,1.046154,0.028733,-2,-0.5
3,3,5.5,0.47,0.0,0.36,0.25,2.82,0.45,0.8,3.98,...,0,0,1,-0.1,-0.261905,0.453704,1.013158,0.025803,2,0.1
4,3,5.94,0.52,0.0,0.25,0.5,6.51,0.41,0.0,0.0,...,0,0,1,-0.607143,0.222222,-0.064516,1.014493,0.013872,2,0.607143


Index(['NumberOfRounds', 'BlueAvgSigStrLanded', 'BlueAvgSigStrPct',
       'BlueAvgSubAtt', 'BlueAvgTDLanded', 'BlueAvgTDPct',
       'RedAvgSigStrLanded', 'RedAvgSigStrPct', 'RedAvgSubAtt',
       'RedAvgTDLanded', 'RedAvgTDPct', 'LoseStreakDif', 'WinStreakDif',
       'LongestWinStreakDif', 'WinDif', 'LossDif', 'TotalRoundDif',
       'TotalTitleBoutDif', 'KODif', 'SubDif', 'HeightDif', 'ReachDif',
       'AgeDif', 'SigStrDif', 'AvgSubAttDif', 'AvgTDDif', 'TotalFightTimeSecs',
       'label', 'Gender_MALE', 'BlueStance_Orthodox', 'BlueStance_Southpaw',
       'BlueStance_Switch', 'BlueStance_OpenStance', 'RedStance_Orthodox',
       'RedStance_Southpaw', 'RedStance_Switch', 'RedStance_OpenStance',
       'BetterRank_Red', 'BetterRank_Blue', 'BetterRank_neither', 'FinishRate',
       'WinRatio', 'ExpPerYear', 'ReachAdvantageRatio', 'HeightReachRatio',
       'WinsByDecision', 'DecisionRate'],
      dtype='object')

NumberOfRounds             int64
BlueAvgSigStrLanded      float64
BlueAvgSigStrPct         float64
BlueAvgSubAtt            float64
BlueAvgTDLanded          float64
BlueAvgTDPct             float64
RedAvgSigStrLanded       float64
RedAvgSigStrPct          float64
RedAvgSubAtt             float64
RedAvgTDLanded           float64
RedAvgTDPct              float64
LoseStreakDif              int64
WinStreakDif               int64
LongestWinStreakDif        int64
WinDif                     int64
LossDif                    int64
TotalRoundDif              int64
TotalTitleBoutDif          int64
KODif                      int64
SubDif                     int64
HeightDif                float64
ReachDif                 float64
AgeDif                     int64
SigStrDif                float64
AvgSubAttDif             float64
AvgTDDif                 float64
TotalFightTimeSecs       float64
label                      int64
Gender_MALE                int64
BlueStance_Orthodox        int64
BlueStance

In [21]:
display(ufc_data.head())

Unnamed: 0,NumberOfRounds,BlueAvgSigStrLanded,BlueAvgSigStrPct,BlueAvgSubAtt,BlueAvgTDLanded,BlueAvgTDPct,RedAvgSigStrLanded,RedAvgSigStrPct,RedAvgSubAtt,RedAvgTDLanded,...,BetterRank_Red,BetterRank_Blue,BetterRank_neither,FinishRate,WinRatio,ExpPerYear,ReachAdvantageRatio,HeightReachRatio,WinsByDecision,DecisionRate
0,5,4.13,0.36,0.0,1.96,0.46,3.88,0.38,0.2,3.79,...,1,0,0,0.283333,-0.035714,-0.477778,0.947368,-0.065058,-4,-0.283333
1,3,7.36,0.56,1.1,1.24,0.23,4.67,0.5,0.4,1.04,...,0,0,1,0.293233,0.042614,-1.222222,1.0,0.028571,-9,-0.293233
2,3,3.32,0.48,0.2,2.26,0.28,4.44,0.53,0.4,0.54,...,1,0,0,0.5,0.095238,-0.077799,1.046154,0.028733,-2,-0.5
3,3,5.5,0.47,0.0,0.36,0.25,2.82,0.45,0.8,3.98,...,0,0,1,-0.1,-0.261905,0.453704,1.013158,0.025803,2,0.1
4,3,5.94,0.52,0.0,0.25,0.5,6.51,0.41,0.0,0.0,...,0,0,1,-0.607143,0.222222,-0.064516,1.014493,0.013872,2,0.607143


### Split Dataset and Standarize

In [22]:
ufc_train, ufc_test = split_and_standardize(ufc_data, categorical_columns)

[92m╔═══════════════════════════════════════════════════════════════════════╗
║  Numerical Data has been standardized and the dataset has been split  ║
╚═══════════════════════════════════════════════════════════════════════╝[0m


<div style="text-align: center;">
  🔹 <img src="../img/ufc_logo.png" width="50" /> 🔹
</div>

## 4. Check Standarize Data

In [23]:
display(ufc_train.head())

Unnamed: 0,NumberOfRounds,BlueAvgSigStrLanded,BlueAvgSigStrPct,BlueAvgSubAtt,BlueAvgTDLanded,BlueAvgTDPct,RedAvgSigStrLanded,RedAvgSigStrPct,RedAvgSubAtt,RedAvgTDLanded,...,BetterRank_Blue,BetterRank_neither,FinishRate,WinRatio,ExpPerYear,ReachAdvantageRatio,HeightReachRatio,WinsByDecision,DecisionRate,label
0,-0.322638,-0.679854,-1.93234,-0.789091,-1.042703,-1.448499,-0.162779,-0.108546,0.678116,-1.090421,...,0,1,0.161448,0.275168,0.164815,-0.0352,0.369952,0.268586,0.133795,1
1,-0.322638,0.000337,0.001642,-0.000719,0.003926,0.007668,-0.000186,-0.000476,-6.7e-05,0.001945,...,0,1,0.161448,0.275168,0.323718,-0.0352,1.528038,0.268586,0.133795,1
2,3.125866,1.072847,-1.094499,-0.789091,-0.844502,-0.329831,1.72352,0.079391,-0.54091,-0.505151,...,0,0,-0.755016,-0.415658,-0.941574,1.652897,2.497677,-0.940745,1.094677,0
3,-0.322638,1.117105,-0.475639,0.525708,-0.646302,-1.001032,-0.136702,1.039959,-0.053329,-0.700241,...,0,1,-0.5259,0.275168,1.034251,-1.5383,-1.625058,0.671696,0.854457,0
4,-0.322638,0.701062,1.590401,-0.789091,3.714107,1.236305,0.793407,1.113045,1.165697,2.811378,...,0,1,-0.5259,0.275168,0.006187,-0.949586,-0.810404,-0.134524,0.854457,0


In [24]:
display(ufc_test.head())

Unnamed: 0,NumberOfRounds,BlueAvgSigStrLanded,BlueAvgSigStrPct,BlueAvgSubAtt,BlueAvgTDLanded,BlueAvgTDPct,RedAvgSigStrLanded,RedAvgSigStrPct,RedAvgSubAtt,RedAvgTDLanded,...,BetterRank_Blue,BetterRank_neither,FinishRate,WinRatio,ExpPerYear,ReachAdvantageRatio,HeightReachRatio,WinsByDecision,DecisionRate,label
0,-0.322638,-0.883805,-0.313783,-0.789091,-0.646302,-0.419324,-0.883571,0.204682,0.239249,-1.090421,...,0,1,-1.900595,-1.106484,0.285577,-0.89021,-0.351033,0.268586,0.133795,1
1,-0.322638,-0.766958,1.209564,3.628809,2.374272,0.609851,-0.864795,-0.317365,-0.784773,-0.466133,...,0,0,0.906556,0.903192,-2.62476,-0.620407,-0.020673,-3.359407,-0.647427,1
2,-0.322638,0.678298,0.952499,-0.56362,-0.249901,0.623275,0.810795,0.016745,-0.297192,-0.570155,...,0,0,0.676958,0.011996,0.22898,-0.868565,-1.778912,-0.134524,-0.406702,1
3,-0.322638,-0.760585,0.257472,-0.00018,-0.765222,-0.419324,-0.90652,-0.212956,-0.053329,-0.723651,...,0,1,-0.354063,0.121651,2.013733,0.282847,0.369952,0.671696,0.674291,1
4,-0.322638,-0.711722,0.162263,-0.789091,-0.511526,0.028143,-0.751096,-0.004137,-0.784773,-0.37249,...,0,1,0.161448,1.656821,0.413542,2.483108,1.761807,0.671696,2.29578,0


<div style="text-align: center;">
  🔹 <img src="../img/ufc_logo.png" width="50" /> 🔹
</div>

## 5. Save

In [26]:
# Save the cleaned file
ufc_train.to_csv(f'{project_root}/data/processed/ufc_train.csv', index=False)
ufc_test.to_csv(f'{project_root}/data/processed/ufc_test.csv', index=False)
print_header("Feature Engineering file saved as 'ufc_train.csv' and 'ufc_test.csv'.", color = 'bright_green')

[92m╔═════════════════════════════════════════════════════════════════════════╗
║  Feature Engineering file saved as 'ufc_train.csv' and 'ufc_test.csv'.  ║
╚═════════════════════════════════════════════════════════════════════════╝[0m


<div style="text-align: center;">
     <img src="../img/ufc_logo.png" width="800" /> 
</div>