Problem type: Multiclass Classification
Includes solution and jury presentation of BTK Akademi Datathon 2023. I attended the competition solo and ranked in top 10 by the jury's selection out of 359 competitors and 255 teams.
- A very detailed EDA phase followed by multiple pivot tables
- Feature engineering; extracting new numerical features, trying the experimental "Cluster feature" method and getting statistical features by cluster groups
- Feature selection with Sequential Feature Selection, RFECV, SHAP (not included in this repo)
- Model selection / model re-evaluation
- Detection and analysis of the sample that is being misclassified by each of the Random Forest, XGBoost, CatBoost, LightGBM models
- Hyperparameter tuning with Optuna
- Creating the final submission with decided final feature set and model architecture
- I also included every helper function that I use throughout different sections of the solution
- About the experimental "cluster feature" method: 1, 2
- SHAP implementation for multiclass classification