Skip to content

Kaggle 'Microsoft Malware Classification Challenge' 3rd place solution

Notifications You must be signed in to change notification settings

geffy/kaggle-malware

Repository files navigation

Mikhail Trofimov, Dmitry Ulyanov, Stanislav Semenov.

Gets score 0.0040 on private leaderboard

How to reproduce submission

Don't forget to check paths in ./src/set_up.py!

./create_dirs.sh
cd ./src
./main.sh
cd ../

and run all the code in learning-main-model.ipynb, learning-4gr-only.ipynb, semi-supervised-trick.ipynb and final-submission-builder.ipynb.

Dependencies

  • python 2.7.9
  • ipython 3.1.0
  • sklearn 0.16.1
  • numpy 1.9.2
  • pandas 0.16.0
  • hickle 1.1.1
  • pypy 2.5.1 (with installed joblib 0.8.4)
  • scipy 0.15.1
  • xgboost-0.3

Hardware

We run this code on machine with 16 cores and 120 GB RAM. The most memory-consuming part is processing 4-gramms. All the others will require no more than 32 GB RAM.

About

Kaggle 'Microsoft Malware Classification Challenge' 3rd place solution

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published