Skip to content
MISSION: Ultra Large-Scale Feature Selection using Count-Sketches
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


MISSION: Ultra Large-Scale Feature Selection using Count-Sketches

An ICML 2018 paper by Amirali Aghazadeh*, Ryan Spring*, Daniel LeJeune, Gautam Dasarathy, Anshumali Shrivastava, Richard G. Baraniuk

* These authors contributed equally and are listed alphabetically.

Code Versions

  1. Mission Logistic Regression
  2. Fine-Grained Mission Softmax Regression
  3. Coarse-Grained Mission Softmax Regression
  4. Feature Hashing Softmax Regression


  • Mission streams in the dataset via Memory-Mapped I/O instead of loading everything directly into memory -
    Necessary for Tera-Scale Datasets
  • AVX SIMD optimization for fast Softmax Regression
  • The code is currently optimized for the Splice-Site and DNA Metagenomics datasets.

Mission Softmax Regression

  1. Fine-Grained Feature Set - Each class maintains a separate feature set, so there is a top-k heap for each class.
  2. Coarse-Grained Feature Set - All the classes share a common set of features, so there is only one top-k heap. -
    Each feature is measured by its L1 Norm for all classes.
  3. Data Parallelism - Each worker maintains a separate heap, while aggregating gradients in the same count-sketch.


  1. KDD 2012
  2. RCV1
  3. Webspam - Trigram
  4. DNA Metagenomics
  5. Criteo 1TB
  6. Splice-Site 3.2TB
You can’t perform that action at this time.