Code for the Fly Bloom Filter Classifier (FBFC) and the scripts for running the experiments presented in the paper Fruit-fly Inspired Neighborbood Encoding for Classification at SIGKDD conference on KDD, 2021.
This section details the setup of the compute environment for executing the provided scripts.
python3.8
pip
virtualenv
$ mkdir fbfc
$ virtualenv -p /usr/bin/python3.8 fbfc
$ source fbfc/bin/activate
(fbfc) $ export PYTHONPATH=`pwd`
(fbfc) $ pip install --upgrade pip
(fbfc) $ pip install -r requirements.txt
The Flyhash operation for any is defined as:
where is the sparse binary projection matrix with nonzero entries in each row of the matrix, and is the winner-take-all operation that sets the top- entries in a vector to 1 and the rest of the entries in the vector to 0, where .
Given a data set , with , the per-class binary Flyhash Bloom Filters (FBFs) , can be generated as follows:
This above process can be visualized in a simple 2-class toy classification problem in the following figure:
The above binary FBFs are not robust to labeling noise, we can use a non-binary FBF for each class defined as:
where is the FBF decay rate that controls how much impact one example has, with corresponding to the binary FBF.
For a test point , the predicted label is done as follows:
The inference can be visualized in the above toy classification problem as follows:
The experiments for the paper are detailed in expts.md.
Please use the following citation for the paper:
Sinha, Kaushik, and Parikshit Ram. "Fruit-fly Inspired Neighborhood Encoding for Classification." Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021.
or
@inproceedings{sinha2021fruit,
title={Fruit-fly Inspired Neighborhood Encoding for Classification},
author={Sinha, Kaushik and Ram, Parikshit},
booktitle={Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery \& Data Mining},
pages={1470--1480},
year={2021}
}