Skip to content
MISSION: Ultra Large-Scale Feature Selection using Count-Sketches
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
extra
src
LICENSE
README.md

README.md

MISSION

MISSION: Ultra Large-Scale Feature Selection using Count-Sketches

An ICML 2018 paper by Amirali Aghazadeh*, Ryan Spring*, Daniel LeJeune, Gautam Dasarathy, Anshumali Shrivastava, Richard G. Baraniuk

* These authors contributed equally and are listed alphabetically.

Code Versions

  1. Mission Logistic Regression
  2. Fine-Grained Mission Softmax Regression
  3. Coarse-Grained Mission Softmax Regression
  4. Feature Hashing Softmax Regression

Optimizations

  • Mission streams in the dataset via Memory-Mapped I/O instead of loading everything directly into memory -
    Necessary for Tera-Scale Datasets
  • AVX SIMD optimization for fast Softmax Regression
  • The code is currently optimized for the Splice-Site and DNA Metagenomics datasets.

Mission Softmax Regression

  1. Fine-Grained Feature Set - Each class maintains a separate feature set, so there is a top-k heap for each class.
  2. Coarse-Grained Feature Set - All the classes share a common set of features, so there is only one top-k heap. -
    Each feature is measured by its L1 Norm for all classes.
  3. Data Parallelism - Each worker maintains a separate heap, while aggregating gradients in the same count-sketch.

Datasets

  1. KDD 2012
  2. RCV1
  3. Webspam - Trigram
  4. DNA Metagenomics
  5. Criteo 1TB
  6. Splice-Site 3.2TB
You can’t perform that action at this time.