This project consists of a number of jupyter notebooks which provide the following:
-
Data Preprocessing and Exploratory Data Analysis (EDA):
- Used libraries:
pandas
,NumPy
,missingno
- Used libraries:
-
Plotly
for Interactive plots andSeaborn
/Matplotlib
for regular charts
-
Construction of DNN with Hyper-parameters tuning:
Keras
/keras-tuner
andTensorflow
-
Ensemble Method (Combining different ML models):
Tensorflow
/Keras
andScikit-Learn
-
App Development and Deployment
The aim is Data-driven decision making (DDDM) approach towards discovering recurring patterns in the data to predict the outcome of a sporting event in the future.
🥊 The Ultimate Fighting Championship (UFC) is currently one of the fastest-growing sports in the world (Telegraph, 2017) and organises events weekly.
The original dataset, data.csv
, found on Kaggle, contains the list of all UFC fights from 1993 to 2019. Each row represents information on match details, two fighters (blue and red), and the winner.
E.g: Demographics, body attributes, player current form, match details
- Dimensions: 5144 rows x 145 columns
- 9 categorical, 136 numerical features
- Target (categorical: Blue/Red) specifies the winner
- High dimensions
- Baseline - 67% (Similar Features Considered)
Performed tasks:
- Feature Selection
- Replacing empty string with NA
- Removing 'Draw' matches (Binary classification)
- Distinguishing numeric & symbolic fields
- Removing constant columns (due to no variation in them)
- Formating data to 3 Decimal Points
- 1-hot-encoding categorical fields
- Dimensionality Reduction with PCA (Principal Component Analysis)
- Missing Values Treatment
- Replacing missings with Median
- Prediction Missings via Linear Regression
- Dropping Remaining Missings
Trained multiple models separately and then combined them into one ensembled model to increase performance:
- Deep Neural Network (DNN)
- Support Vector Machine (SVM)
- Dicision Tree (DT)
- AdaBoost
- Random Forest (RF)
- ExtraTrees
- GradientBoosting
- Multi-Layer Perceptron (MLP)
- K-Nearest-Neighbours (KNN)
- Logistic Regression
- Linear Discriminant Analysis (LDA)
- XGB
Generated the latest fighter details and used trained models to predict matches. App deployed on heroku and available on (https://ai-predicts-ufc.herokuapp.com)
© TheDeepestLearners