This repository contains the code for ProtoNN (a KNN based algorithm) implemented in Tensorflow for large-scale multi-label learning. This repository also has a script to run the training on multiple GPUs.
Note: some modifications have been made to improve run-time and performance on large-scale datasets. For more details about ProtoNN, please refer to ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices. If you are seeking to reproduce the results in the original paper, please use the official code provided by the authors.
Extreme multi-label (XML) algorithms
Unlike multi-class or binary classification, extreme multi-label (XML) algorithms tag data points with a subset of labels (rather than just a single label) from an extremely large label-set. XML problems usually deal with a large number of labels (103 - 106 labels) and a large number of dimensions and training points.
For datasets, check: XML-repository
To run on a new dataset:
Create a new folder with the directory name. Place two separate files train_data.mat and test_data.mat in that directory. Note that each of these files must have two variables: X with shape: (num instances, num features) and Y with shape (num instances, num labels)
Create a config file in cfgs folder with the required parameters.