This is the code repository for the experiments in the following paper: "Understanding Sparse JL for Feature Hashing", to appear at NeurIPS 2019. The paper is available at https://papers.nips.cc/paper/9656-understanding-sparse-jl-for-feature-hashing.
Feature hashing and other random projection schemes are commonly used to reduce the dimensionality of feature vectors. The goal is to efficiently project a high-dimensional feature vector living in
In this paper, we demonstrate the benefits of using sparsity
These experiments evaluate the performance of sparse Johnson-Lindenstrauss transforms on synthetic and real-world datasets. The experiments mainly serve to validate and illustrate the theoretical findings in the paper, and graphs can be found in the paper. Moreover, the rationale behind the experimental design is discussed in the paper.
To run these experiments, use Python 3.
The synthetic data experiments illustrate the tradeoff between the projected dimension
These experiments illustrate the tradeoff between dimension
See news20.py for the code for the News20 dataset. For the Enron dataset, download docword.enron.txt.gz on https://archive.ics.uci.edu/ml/machine-learning-databases/bag-of-words/, save the file as docword.enron.txt.gz, and then see enron.py.