Tensorflow ProtoNN for Multi-label learning (supports both single/multi-gpu usage)
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
cfgs
datasets/eurlex
experiments/eurlex
model
preprocess
trainer
.gitignore
LICENSE
README.md
eurlex_multigpu_train.py
eurlex_train.py
run_eurlex_with_preprocessing.ipynb

README.md

tf-protoNN


This repository contains the code for ProtoNN (a KNN based algorithm) implemented in Tensorflow for large-scale multi-label learning. This repository also has a script to run the training on multiple GPUs.

Note: some modifications have been made to improve run-time and performance on large-scale datasets. For more details about ProtoNN, please refer to ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices. If you are seeking to reproduce the results in the original paper, please use the official code provided by the authors.

Extreme multi-label (XML) algorithms

Unlike multi-class or binary classification, extreme multi-label (XML) algorithms tag data points with a subset of labels (rather than just a single label) from an extremely large label-set. XML problems usually deal with a large number of labels (103 - 106 labels) and a large number of dimensions and training points.

For datasets, check: XML-repository

Required packages

  1. Tensorflow
  2. FAISS
  3. Numpy
  4. Scipy
  5. Easydict

Usage

Check the ipython notebook to run the code on Eurlex-4k dataset. To change the parameters, modify the config file.

To run on a new dataset:

  1. Create a new folder with the directory name. Place two separate files train_data.mat and test_data.mat in that directory. Note that each of these files must have two variables: X with shape: (num instances, num features) and Y with shape (num instances, num labels)

  2. Create a config file in cfgs folder with the required parameters.

  3. For single GPU: Modify eurlex_train.py -> train.py (import the correct config file). For training on multiple GPUs modify eurlex_multigpu_train.py -> train.py and run python train.py