Skip to content

popraf/domain_classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Domain classifier task

The notebook 1_EDA_FeatureSelection.ipynb contain data cleaning code and a simple exploratory data analysis.

In notebook 2_Classification.ipynb the data has been classified using suggested columns by mutual information ranking, however overfitting of the algorithm is observable.

3rd notebook 3_FeaturesSelectionDropApproach.ipynb contains different approach based on previously carried out experiments, that is selection of numerical features along with dropping of nulls/nans.

By doing so, and disregarding balancing the train dataset with SMOTE/SMOTETomek, XGBoost (w/o hyperparameter tuning) achieved 76% F1 Score, 76% Balanced Accuracy, 87% ROC AUC on a validation set (the set is not highly representative, it contains 200 samples from each group). The results are available in 4_XGB_DropScaleWeights.ipynb.

After hyperparameters tuning, the algorithm performace increased to 80% F1 Score, 80% Balanced Accuracy, and 90% ROC AUC.

About

Recruitment task

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published