Problem type: Binary Classification
Includes detailed solution of Garanti BBVA Data Camp. I attended the competition solo and ranked 14th(top %9) out of 210 competitors and 174 teams.
- Cleaning education, skill, language, work experiences datasets which are included with erroneous data at a high rate and fixing a few problems on train - test sets.
- Feature engineering, followed by imputation and encoding.
- Using Random Forest cross-validation training and stacking, according to model selection during the competition and also with the best dataset decided by various feature selection techniques.
Main solution can be found both in English and Turkish, with the added bonus/unused work performed during competition.
I also added;
- The mentioned fi_forward_feature_selector function that I wrote and used to create the dataset that got me the second best private score out of my three final day submissions.
- The codes of HalvingGridSearch, TuneGridSearch and Optuna, when I was in search of tuning hyperparameters faster than standart GridSearch. While GridSearchCV does not use any optimization algorithm and tries all the combinations from the given parameter grid, HalvingGridSearch uses an algorithm called successive halving that makes it approximately 4x faster. Also TuneGridSearch is another faster alternative.
- The code of training curves with Yellowbrick library, to detect how hyperparameter values effect the model, and by that minimize the range of hyperparameters given to HalvingGridSearch to get even faster results.
- The code that I used to scrape an external data but unfortunately seemed unimportant after modelling and so remained unused during competition.
- Anıl Öztürk's well-explained Kaggle notebooks and competition solutions, especially this competition solution where I learned the method of training the model with CV and stacking.
- In-demand skills shared on Linkedin blog for years of 2018, 2019,2020
- Popular/useful languages shared on Linkedin blog for years of 2018, 2019, 2020
- World universities data, Turkey cities data, Turkey districts data