# 🔹UFC Fight Predictor Feature Engineering

<div style="text-align: center;">
  🔹 <img src="../img/ufc_logo.png" width="50" /> 🔹
</div>

## 1. Import Libraries and Setup Environment

In [1]:
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Get the current working directory
current_dir = os.getcwd()

# Navigate to the project root
project_root = os.path.abspath(os.path.join(current_dir, '..'))

# Import from /src
sys.path.append(os.path.join(project_root, 'src'))
from utils.helpers import *
from utils.io_model import *
from utils.ufc_data import UFCData

<div style="text-align: center;">
  🔹 <img src="../img/ufc_logo.png" width="50" /> 🔹
</div>

## 2. Load Data

In [2]:
# Define the path to the CSV file
file_path = os.path.join(project_root, 'data', 'processed', 'ufc_etl.csv')

# Load the CSV into a DataFrame
try:
    ufc_data = pd.read_csv(file_path)
    print_header(f"Data successfully loaded: {ufc_data.shape[0]} rows, {ufc_data.shape[1]} columns.", color='bright_green')
except Exception as e:
    print_header(f"Error loading training data: {e}", color='bright_red')

[92m╔════════════════════════════════════════════════════╗
║  Data successfully loaded: 6541 rows, 69 columns.  ║
╚════════════════════════════════════════════════════╝[0m


<div style="text-align: center;">
  🔹 <img src="../img/ufc_logo.png" width="50" /> 🔹
</div>

## 3. Feature Engineering

- **Feature Vector Construction**: The feature vector for each fight is represented as:
  $$x = \text{features-fighter-blue} - \text{features-fighter-red}$$
- The task is framed as a binary classification problem, where the model predicts either **0** (Fighter Red wins) or **1** (Fighter Blue wins).

#### Create the target value: **0** (Fighter Red wins) or **1** (Fighter Blue wins)

In [3]:
ufc_data['label'] = ufc_data['Winner'].apply(lambda x: 1 if x == 'Blue' else 0)
ufc_data=ufc_data.drop('Winner', axis=1)
#ufc_data = pd.get_dummies(ufc_data, columns=['TitleBout', 'Gender'], drop_first=True)

### Categorical Data

### Stance
- **Orthodox Stance:** A fighter in orthodox stance leads with their left foot and left hand, making it the natural stance for right-handed individuals. 
- **Southpaw Stance:** A fighter in southpaw stance leads with their right foot and right hand, making it the natural stance for left-handed individuals. 
- **Open Stance Matchup:** When one fighter is orthodox and the other is southpaw, it creates an "open stance" matchup. This differs from a "closed stance" where both fighters are in the same stance (e.g., both orthodox or both southpaw)
- **Switch:** When a fighter switches from an orthodox to a southpaw stance or vice versa, it can disrupt their opponent's rhythm, causing them to miss or react incorrectly to strikes. 

In [4]:
# ufc_data['BlueStance'].unique()

In [5]:
# # Para BlueStance
# ufc_data['BlueStance_Orthodox'] = (ufc_data['BlueStance'] == 'Orthodox').astype(int)
# ufc_data['BlueStance_Southpaw'] = (ufc_data['BlueStance'] == 'Southpaw').astype(int)
# ufc_data['BlueStance_Switch'] = (ufc_data['BlueStance'] == 'Switch').astype(int)
# ufc_data['BlueStance_OpenStance'] = (ufc_data['BlueStance'] == 'Open Stance').astype(int)

# # Para RedStance (mismo enfoque)
# ufc_data['RedStance_Orthodox'] = (ufc_data['RedStance'] == 'Orthodox').astype(int)
# ufc_data['RedStance_Southpaw'] = (ufc_data['RedStance'] == 'Southpaw').astype(int)
# ufc_data['RedStance_Switch'] = (ufc_data['RedStance'] == 'Switch').astype(int)
# ufc_data['RedStance_OpenStance'] = (ufc_data['RedStance'] == 'Open Stance').astype(int)

# #Borramos las columnas
# ufc_data=ufc_data.drop(['BlueStance','RedStance'], axis=1)

# # Create categorical column list

# #categorical_columns.append('BlueStance_Orthodox', 'RedStance_Orthodox', 
# #                            'BlueStance_Southpaw', 'RedStance_Southpaw', 
# #                            'BlueStance_Switch', 'RedStance_Switch', 
# #                            'BlueStance_OpenStance', 'Redtance_OpenStance')

### Better Rank

In [6]:
# display(ufc_data['BetterRank'].unique())

In [7]:
# # Para BlueStance
# ufc_data['BetterRank_Red'] = (ufc_data['BetterRank'] == 'Red').astype(int)
# ufc_data['BetterRank_Blue'] = (ufc_data['BetterRank'] == 'Blue').astype(int)
# ufc_data['BetterRank_neither'] = (ufc_data['BetterRank'] == 'neither').astype(int)
# #Borramos las columnas
# ufc_data=ufc_data.drop(['BetterRank'], axis=1)
# #categorical_columns.append('BetterRank_Red', 'BetterRank_Blue', 'BetterRank_neither')

In [8]:
# ufc_data['Gender_MALE'] = ufc_data['Gender_MALE'].astype(int)  # True → 1, False → 0
# #categorical_columns.append('Gender_MALE')

In [9]:
# categorical_columns = ['BlueStance_Orthodox', 'RedStance_Orthodox', 
#                         'BlueStance_Southpaw', 'RedStance_Southpaw', 
#                         'BlueStance_Switch', 'RedStance_Switch', 
#                         'BlueStance_OpenStance', 'Redtance_OpenStance', 
#                         'BetterRank_Red', 'BetterRank_Blue', 
#                         'BetterRank_neither', 'Gender_MALE']

### Creación de features

In [10]:
#Tasa de Finalización (Red y Blue):
RedFinishRate = (ufc_data['RedWinsByKO'] + ufc_data['RedWinsBySubmission'] + ufc_data['RedWinsByTKODoctorStoppage']) / ufc_data['RedWins'].replace(0, 1)
BlueFinishRate = (ufc_data['BlueWinsByKO'] + ufc_data['BlueWinsBySubmission'] + ufc_data['BlueWinsByTKODoctorStoppage']) / ufc_data['BlueWins'].replace(0, 1)
ufc_data['FinishRate'] = BlueFinishRate - RedFinishRate

In [11]:
#Win Ratio (Red y Blue)
RedWinRatio = ufc_data['RedWins'] / (ufc_data['RedWins'] + ufc_data['RedLosses']).replace(0, 1)
BlueWinRatio = ufc_data['BlueWins'] / (ufc_data['BlueWins'] + ufc_data['BlueLosses']).replace(0, 1)
ufc_data['WinRatio']= BlueWinRatio - RedWinRatio 

In [12]:
#Edad vs. Experiencia
RedExpPerYear = ufc_data['RedTotalRoundsFought'] / ufc_data['RedAge']
BlueExpPerYear = ufc_data['BlueTotalRoundsFought'] / ufc_data['BlueAge']
ufc_data['ExpPerYear']=BlueExpPerYear - RedExpPerYear 

In [13]:
# #Reach Advantage Ratio
ufc_data['ReachAdvantageRatio'] = ufc_data['RedReachCms'] / ufc_data['BlueReachCms']

In [14]:
#Height/Reach Ratio (para cada peleador)
RedHeightReachRatio = ufc_data['RedHeightCms'] / ufc_data['RedReachCms']
BlueHeightReachRatio = ufc_data['BlueHeightCms'] / ufc_data['BlueReachCms']
ufc_data['HeightReachRatio']= BlueHeightReachRatio - RedHeightReachRatio

In [15]:
BlueWinsByDecision = ufc_data[['BlueWinsByDecisionMajority', 'BlueWinsByDecisionSplit', 'BlueWinsByDecisionUnanimous']].sum(axis=1)
RedWinsByDecision = ufc_data[['RedWinsByDecisionMajority', 'RedWinsByDecisionSplit', 'RedWinsByDecisionUnanimous']].sum(axis=1)
ufc_data['WinsByDecision']=BlueWinsByDecision- RedWinsByDecision

In [16]:
BlueDecisionRate = BlueWinsByDecision / ufc_data['BlueWins'].replace(0, 1)  # Evitar división por cero
RedDecisionRate = RedWinsByDecision  / ufc_data['RedWins'].replace(0, 1)  # Evitar división por cero
ufc_data['DecisionRate']= BlueDecisionRate - RedDecisionRate

### Selección de features

Dado lo siguiente:
-  LoseStreakDif: BlueCurrentLoseStreak - RedCurrentLoseStreak
-  WinStreakDif: BlueCurrentWinStreak - RedCurrentWinStreak
-  LongestWinStreakDif: BlueLongestWinStreak - 'RedLongestWinStreak
-  WinDif: BlueWins - RedWins
-  LossDif: BlueLosses - RedLosses
-  TotalRoundDif: BlueTotalRoundsFought - RedTotalRoundsFought
-  TotalTitleBoutDif: BlueTotalTitleBouts - RedTotalTitleBouts
-  KODif: BlueWinsByKO - RedWinsByKO
-  SubDif: BlueWinsBySubmission - RedWinsBySubmission
-  HeightDif: BlueHeightCms - RedHeightCms
-  ReachDif: BlueReachCms - RedReachCms
-  AgeDif: BlueAge - RedAge
  
Se eliminan ciertas columnas que pueden ser redundantes.

In [17]:
ufc_data=ufc_data.drop(['BlueCurrentLoseStreak', 'RedCurrentLoseStreak','BlueCurrentWinStreak',
                      'RedCurrentWinStreak','BlueLongestWinStreak', 'RedLongestWinStreak', 'BlueWins',
                      'RedWins','BlueLosses','RedLosses', 'BlueTotalRoundsFought','RedTotalRoundsFought',
                      'BlueTotalTitleBouts', 'RedTotalTitleBouts', 'BlueWinsByKO', 'RedWinsByKO', 'BlueWinsBySubmission',
                      'RedWinsBySubmission','BlueHeightCms','RedHeightCms','BlueReachCms','RedReachCms',
                      'BlueAge', 'RedAge'], axis=1)

También aquellas relacionadas con los features construidos:
- WinsByDecision
- DecisionRate
- FinishRate


In [18]:
ufc_data=ufc_data.drop(['BlueWinsByDecisionSplit', 'BlueWinsByDecisionUnanimous',
       'BlueWinsByTKODoctorStoppage', 'RedWinsByDecisionMajority',
       'RedWinsByDecisionSplit', 'RedWinsByDecisionUnanimous',
       'RedWinsByTKODoctorStoppage','BlueWinsByDecisionMajority'], axis=1)

### Columnas de baja varianza
- BlueDraws
- RedDraws
- BlueWeightLbs
- RedWeightLbs
- TitleBout_True 

In [19]:
ufc_data=ufc_data.drop(['BlueDraws','RedDraws','BlueWeightLbs','RedWeightLbs'], axis=1)

In [20]:
# Preview the first few records
display(ufc_data.head())
display(ufc_data.columns)
# Para ver los tipos de dato de cada columna:
display(ufc_data.dtypes)

Unnamed: 0,TitleBout,Gender,NumberOfRounds,BlueAvgSigStrLanded,BlueAvgSigStrPct,BlueAvgSubAtt,BlueAvgTDLanded,BlueAvgTDPct,BlueStance,RedAvgSigStrLanded,...,BetterRank,TotalFightTimeSecs,label,FinishRate,WinRatio,ExpPerYear,ReachAdvantageRatio,HeightReachRatio,WinsByDecision,DecisionRate
0,False,MALE,5,4.13,0.36,0.0,1.96,0.46,Southpaw,3.88,...,Red,882.0,1,0.283333,-0.035714,-0.477778,0.947368,-0.065058,-4,-0.283333
1,False,MALE,3,7.36,0.56,1.1,1.24,0.23,Orthodox,4.67,...,neither,696.0,0,0.293233,0.042614,-1.222222,1.0,0.028571,-9,-0.293233
2,False,MALE,3,3.32,0.48,0.2,2.26,0.28,Orthodox,4.44,...,Red,717.0,0,0.5,0.095238,-0.077799,1.046154,0.028733,-2,-0.5
3,False,MALE,3,5.5,0.47,0.0,0.36,0.25,Orthodox,2.82,...,neither,824.0,1,-0.1,-0.261905,0.453704,1.013158,0.025803,2,0.1
4,False,MALE,3,5.94,0.52,0.0,0.25,0.5,Orthodox,6.51,...,neither,900.0,1,-0.607143,0.222222,-0.064516,1.014493,0.013872,2,0.607143


Index(['TitleBout', 'Gender', 'NumberOfRounds', 'BlueAvgSigStrLanded',
       'BlueAvgSigStrPct', 'BlueAvgSubAtt', 'BlueAvgTDLanded', 'BlueAvgTDPct',
       'BlueStance', 'RedAvgSigStrLanded', 'RedAvgSigStrPct', 'RedAvgSubAtt',
       'RedAvgTDLanded', 'RedAvgTDPct', 'RedStance', 'LoseStreakDif',
       'WinStreakDif', 'LongestWinStreakDif', 'WinDif', 'LossDif',
       'TotalRoundDif', 'TotalTitleBoutDif', 'KODif', 'SubDif', 'HeightDif',
       'ReachDif', 'AgeDif', 'SigStrDif', 'AvgSubAttDif', 'AvgTDDif',
       'BetterRank', 'TotalFightTimeSecs', 'label', 'FinishRate', 'WinRatio',
       'ExpPerYear', 'ReachAdvantageRatio', 'HeightReachRatio',
       'WinsByDecision', 'DecisionRate'],
      dtype='object')

TitleBout                 bool
Gender                  object
NumberOfRounds           int64
BlueAvgSigStrLanded    float64
BlueAvgSigStrPct       float64
BlueAvgSubAtt          float64
BlueAvgTDLanded        float64
BlueAvgTDPct           float64
BlueStance              object
RedAvgSigStrLanded     float64
RedAvgSigStrPct        float64
RedAvgSubAtt           float64
RedAvgTDLanded         float64
RedAvgTDPct            float64
RedStance               object
LoseStreakDif            int64
WinStreakDif             int64
LongestWinStreakDif      int64
WinDif                   int64
LossDif                  int64
TotalRoundDif            int64
TotalTitleBoutDif        int64
KODif                    int64
SubDif                   int64
HeightDif              float64
ReachDif               float64
AgeDif                   int64
SigStrDif              float64
AvgSubAttDif           float64
AvgTDDif               float64
BetterRank              object
TotalFightTimeSecs     float64
label   

In [21]:
display(ufc_data.head())

Unnamed: 0,TitleBout,Gender,NumberOfRounds,BlueAvgSigStrLanded,BlueAvgSigStrPct,BlueAvgSubAtt,BlueAvgTDLanded,BlueAvgTDPct,BlueStance,RedAvgSigStrLanded,...,BetterRank,TotalFightTimeSecs,label,FinishRate,WinRatio,ExpPerYear,ReachAdvantageRatio,HeightReachRatio,WinsByDecision,DecisionRate
0,False,MALE,5,4.13,0.36,0.0,1.96,0.46,Southpaw,3.88,...,Red,882.0,1,0.283333,-0.035714,-0.477778,0.947368,-0.065058,-4,-0.283333
1,False,MALE,3,7.36,0.56,1.1,1.24,0.23,Orthodox,4.67,...,neither,696.0,0,0.293233,0.042614,-1.222222,1.0,0.028571,-9,-0.293233
2,False,MALE,3,3.32,0.48,0.2,2.26,0.28,Orthodox,4.44,...,Red,717.0,0,0.5,0.095238,-0.077799,1.046154,0.028733,-2,-0.5
3,False,MALE,3,5.5,0.47,0.0,0.36,0.25,Orthodox,2.82,...,neither,824.0,1,-0.1,-0.261905,0.453704,1.013158,0.025803,2,0.1
4,False,MALE,3,5.94,0.52,0.0,0.25,0.5,Orthodox,6.51,...,neither,900.0,1,-0.607143,0.222222,-0.064516,1.014493,0.013872,2,0.607143


## 4. Initialize UFCData object

In [22]:
UFCData = UFCData(ufc_data)

In [23]:
UFCData.categorical_columns

['TitleBout', 'Gender', 'BlueStance', 'RedStance', 'BetterRank']

In [24]:
UFCData.multiclass_columns

['BlueStance', 'RedStance', 'BetterRank']

In [25]:
UFCData._X_train

Unnamed: 0,TitleBout,Gender,NumberOfRounds,BlueAvgSigStrLanded,BlueAvgSigStrPct,BlueAvgSubAtt,BlueAvgTDLanded,BlueAvgTDPct,BlueStance,RedAvgSigStrLanded,...,AvgTDDif,BetterRank,TotalFightTimeSecs,FinishRate,WinRatio,ExpPerYear,ReachAdvantageRatio,HeightReachRatio,WinsByDecision,DecisionRate
5014,False,MALE,3,7.0000,0.25000,0.000000,0.000000,0.000000,Southpaw,18.000000,...,0.0000,neither,900.0,0.000000,0.000000,-0.082963,1.000000,0.014286,0,0.000000
3094,False,MALE,3,19.8067,0.45313,0.499659,1.320165,0.325424,Southpaw,21.117464,...,0.0000,neither,314.0,0.000000,0.000000,0.000000,1.000000,0.055556,0,0.000000
3760,False,MALE,5,40.0000,0.33800,0.000000,0.250000,0.250000,Southpaw,54.166700,...,-0.5000,Red,1500.0,-0.444444,-0.250000,-0.660606,1.076923,0.090110,-3,0.444444
3951,False,MALE,3,40.8333,0.40300,0.833300,0.500000,0.100000,Orthodox,18.500000,...,0.0000,neither,900.0,-0.333333,0.000000,0.370968,0.931507,-0.056809,1,0.333333
6431,False,MALE,3,33.0000,0.62000,0.000000,6.000000,0.600000,Orthodox,36.333300,...,1.0000,neither,340.0,-0.333333,0.000000,-0.165782,0.958333,-0.027778,-1,0.333333
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3513,False,MALE,5,41.0000,0.50300,0.000000,1.000000,0.373000,Orthodox,45.000000,...,0.4286,Red,1500.0,-0.033333,0.035714,-0.431250,0.987179,-0.025475,-2,0.033333
2366,False,MALE,3,3.6100,0.51000,1.100000,1.650000,0.300000,Orthodox,3.080000,...,-1.3600,Red,900.0,-0.272727,0.096970,1.277584,1.038961,0.062013,6,0.272727
2982,False,MALE,3,34.5000,0.47500,0.500000,0.500000,0.165000,Southpaw,37.280000,...,-0.5400,neither,166.0,-0.916667,0.520000,-1.643182,0.985915,0.000201,1,0.916667
4536,False,FEMALE,3,19.8067,0.45313,0.499659,1.320165,0.325424,Orthodox,48.500000,...,-6.0000,Red,900.0,0.000000,-0.500000,-0.230769,1.000000,-0.015873,-1,-1.000000


<div style="text-align: center;">
  🔹 <img src="../img/ufc_logo.png" width="50" /> 🔹
</div>

## 5. Inicialize Modeling Data: Standarize Numerical Data & Encode Categorical Data

In [26]:
UFCData.standardize()

In [27]:
UFCData._X_train

Unnamed: 0,TitleBout,Gender,NumberOfRounds,BlueAvgSigStrLanded,BlueAvgSigStrPct,BlueAvgSubAtt,BlueAvgTDLanded,BlueAvgTDPct,BlueStance,RedAvgSigStrLanded,...,AvgTDDif,BetterRank,TotalFightTimeSecs,FinishRate,WinRatio,ExpPerYear,ReachAdvantageRatio,HeightReachRatio,WinsByDecision,DecisionRate
5014,False,MALE,3,7.0000,0.25000,0.000000,0.000000,0.000000,Southpaw,18.000000,...,0.0000,neither,900.0,0.000000,0.000000,-0.082963,1.000000,0.014286,0,0.000000
3094,False,MALE,3,19.8067,0.45313,0.499659,1.320165,0.325424,Southpaw,21.117464,...,0.0000,neither,314.0,0.000000,0.000000,0.000000,1.000000,0.055556,0,0.000000
3760,False,MALE,5,40.0000,0.33800,0.000000,0.250000,0.250000,Southpaw,54.166700,...,-0.5000,Red,1500.0,-0.444444,-0.250000,-0.660606,1.076923,0.090110,-3,0.444444
3951,False,MALE,3,40.8333,0.40300,0.833300,0.500000,0.100000,Orthodox,18.500000,...,0.0000,neither,900.0,-0.333333,0.000000,0.370968,0.931507,-0.056809,1,0.333333
6431,False,MALE,3,33.0000,0.62000,0.000000,6.000000,0.600000,Orthodox,36.333300,...,1.0000,neither,340.0,-0.333333,0.000000,-0.165782,0.958333,-0.027778,-1,0.333333
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3513,False,MALE,5,41.0000,0.50300,0.000000,1.000000,0.373000,Orthodox,45.000000,...,0.4286,Red,1500.0,-0.033333,0.035714,-0.431250,0.987179,-0.025475,-2,0.033333
2366,False,MALE,3,3.6100,0.51000,1.100000,1.650000,0.300000,Orthodox,3.080000,...,-1.3600,Red,900.0,-0.272727,0.096970,1.277584,1.038961,0.062013,6,0.272727
2982,False,MALE,3,34.5000,0.47500,0.500000,0.500000,0.165000,Southpaw,37.280000,...,-0.5400,neither,166.0,-0.916667,0.520000,-1.643182,0.985915,0.000201,1,0.916667
4536,False,FEMALE,3,19.8067,0.45313,0.499659,1.320165,0.325424,Orthodox,48.500000,...,-6.0000,Red,900.0,0.000000,-0.500000,-0.230769,1.000000,-0.015873,-1,-1.000000


In [28]:
UFCData._X_train_processed

Unnamed: 0,TitleBout,Gender,NumberOfRounds,BlueAvgSigStrLanded,BlueAvgSigStrPct,BlueAvgSubAtt,BlueAvgTDLanded,BlueAvgTDPct,BlueStance,RedAvgSigStrLanded,...,AvgTDDif,BetterRank,TotalFightTimeSecs,FinishRate,WinRatio,ExpPerYear,ReachAdvantageRatio,HeightReachRatio,WinsByDecision,DecisionRate
5014,False,MALE,-0.322638,-0.679854,-1.932340,-0.789091,-1.042703,-1.448499,Southpaw,-0.162779,...,0.102213,neither,0.708351,0.161448,0.275168,0.164815,-0.035200,0.369952,0.268586,0.133795
3094,False,MALE,-0.322638,0.000337,0.001642,-0.000719,0.003926,0.007668,Southpaw,-0.000186,...,0.102213,neither,-1.000753,0.161448,0.275168,0.323718,-0.035200,1.528038,0.268586,0.133795
3760,False,MALE,3.125866,1.072847,-1.094499,-0.789091,-0.844502,-0.329831,Southpaw,1.723520,...,-0.184490,Red,2.458287,-0.755016,-0.415658,-0.941574,1.652897,2.497677,-0.940745,1.094677
3951,False,MALE,-0.322638,1.117105,-0.475639,0.525708,-0.646302,-1.001032,Orthodox,-0.136702,...,0.102213,neither,0.708351,-0.525900,0.275168,1.034251,-1.538300,-1.625058,0.671696,0.854457
6431,False,MALE,-0.322638,0.701062,1.590401,-0.789091,3.714107,1.236305,Orthodox,0.793407,...,0.675620,neither,-0.924923,-0.525900,0.275168,0.006187,-0.949586,-0.810404,-0.134524,0.854457
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3513,False,MALE,3.125866,1.125959,0.476453,-0.789091,-0.249901,0.220554,Orthodox,1.245425,...,0.347976,Red,2.458287,0.092713,0.373858,-0.502277,-0.316550,-0.745772,-0.537635,0.205861
2366,False,MALE,-0.322638,-0.859905,0.543100,0.946513,0.265420,-0.106097,Orthodox,-0.940943,...,-0.677620,Red,0.708351,-0.400928,0.543125,2.770739,0.819810,1.709242,2.687248,0.723427
2982,False,MALE,-0.322638,0.780730,0.209867,-0.000180,-0.646302,-0.710178,Southpaw,0.842783,...,-0.207426,neither,-1.432404,-1.728758,1.712087,-2.823551,-0.344288,-0.025277,0.671696,2.115615
4536,False,FEMALE,-0.322638,0.000337,0.001642,-0.000719,0.003926,0.007668,Orthodox,1.427970,...,-3.338228,Red,0.708351,0.161448,-1.106484,-0.118286,-0.035200,-0.476341,-0.134524,-2.028191


In [29]:
UFCData.encode()

In [30]:
UFCData._X_train_processed

Unnamed: 0,TitleBout,Gender_MALE,BlueStance_Orthodox,BlueStance_Southpaw,BlueStance_Switch,RedStance_Open Stance,RedStance_Orthodox,RedStance_Southpaw,RedStance_Switch,BetterRank_Blue,...,AvgSubAttDif,AvgTDDif,TotalFightTimeSecs,FinishRate,WinRatio,ExpPerYear,ReachAdvantageRatio,HeightReachRatio,WinsByDecision,DecisionRate
5014,0,1,0,1,0,0,0,1,0,0,...,-1.025880,0.102213,0.708351,0.161448,0.275168,0.164815,-0.035200,0.369952,0.268586,0.133795
3094,0,1,0,1,0,0,1,0,0,0,...,0.079948,0.102213,-1.000753,0.161448,0.275168,0.323718,-0.035200,1.528038,0.268586,0.133795
3760,0,1,0,1,0,0,1,0,0,0,...,-0.104394,-0.184490,2.458287,-0.755016,-0.415658,-0.941574,1.652897,2.497677,-0.940745,1.094677
3951,0,1,1,0,0,0,1,0,0,0,...,0.448520,0.102213,0.708351,-0.525900,0.275168,1.034251,-1.538300,-1.625058,0.671696,0.854457
6431,0,1,1,0,0,0,0,1,0,0,...,-1.394452,0.675620,-0.924923,-0.525900,0.275168,0.006187,-0.949586,-0.810404,-0.134524,0.854457
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3513,0,1,1,0,0,0,1,0,0,0,...,-0.235987,0.347976,2.458287,0.092713,0.373858,-0.502277,-0.316550,-0.745772,-0.537635,0.205861
2366,0,1,1,0,0,0,1,0,0,0,...,-1.799959,-0.677620,0.708351,-0.400928,0.543125,2.770739,0.819810,1.709242,2.687248,0.723427
2982,0,1,0,1,0,0,1,0,0,0,...,0.102064,-0.207426,-1.432404,-1.728758,1.712087,-2.823551,-0.344288,-0.025277,0.671696,2.115615
4536,0,0,1,0,0,0,1,0,0,0,...,-1.578793,-3.338228,0.708351,0.161448,-1.106484,-0.118286,-0.035200,-0.476341,-0.134524,-2.028191


In [31]:
len(UFCData.numerical_columns)

34

In [32]:
len(UFCData.categorical_columns)

5

<div style="text-align: center;">
  🔹 <img src="../img/ufc_logo.png" width="50" /> 🔹
</div>

## 5. Save

In [34]:
save_data(UFCData)

✅ UFCData object saved to: /home/mfourier/ufc-predictor/data/processed/ufc_data.pkl


In [33]:
save_ufc_datasets(UFCData, project_root)

[92m╔═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║  Feature engineering files saved as:
  - ufc_train.csv
  - ufc_test.csv
  - ufc_train_processed.csv
  - ufc_test_processed.csv  ║
╚═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝[0m


<div style="text-align: center;">
     <img src="../img/ufc_logo.png" width="800" /> 
</div>