Kaggle Flavour of physics solution code for the discussion in this blog
It was not a top winning solution, the primary reason was I tried my best for
physical sounds features and avoiding invariant mass feature, and surely, if one overfit on simulated invariant mass, it would have much higher score. The reason why I didn't use invariant mass was discussed in the blog from the link above.
The code is a little bit messy. It has 6 parts:
- feature engineering is at
model_selection/feat.pywhich contains all important kinematic, pair-wise and quality selection features as mentioned in the blog post.
- feature selection is at
feat_selection/and one can run
append_feats.pyto generate the full long list of features, and let random forest
rf_feat_selection.pyto select them and print out the importance list. The final selected features are used in the feature engineering part.
- model selection is a generalized model selection framework at
search_v1.pysearches in the model directories and optimizes the final submission of xgboost model by the CV score. To run it, corresponding subdirectories should be created. It is created by littleboat and he has the full credit.
- ensemble is at
ensemblewhich is a simple weighted average of adding NN results (bad KS score) to a good result (UGrad mix with xgboost after model selection)
- xgboost parameter grid search is at
- Neural Network model building is at
nn/stacking_nn.pywhich is a neural network model training on stacking old features. It may take very long time to run. It is used in the final ensemble of boosting the UGrad+xgboost score to +0.0005 AUC.