Skip to content

Package sticker provides a framework for multi-label classification.

License

Notifications You must be signed in to change notification settings

hiro4bbh/sticker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Package sticker provides a framework for multi-label classification.

sticker logo

Build Status Report Status

Copyright 2017- Tatsuhiro Aoshima (hiro4bbh@gmail.com).

Introduction

Package sticker provides a framework for multi-label classification.

sticker is written in golang, so everyone can easily modify and compile it on almost every environments. You can see sticker's document on GoDoc.

Installation

First, download golang, and install it. Next, get and install sticker as follows:

go get github.com/hiro4bbh/sticker
go install github.com/hiro4bbh/sticker/sticker-util

Everything has been installed, then you can try sticker's utility command-line tool sticker-util now!

Prepare Datasets

First of all, you should prepare datasets. sticker assumes the following directory structure for a dataset:

+ dataset-root
|-- train.txt: training dataset
|-- text.txt: test dataset
|-- feature_map.txt: feature map (optional)
|-- label_map.txt: label map (optional)

Training and test datasets must be formatted as ReadTextDataset can handle (see GoDoc for data format). Feature and label maps should enumerate the name of each feature and label per line in order of identifier, respectively.

You can check the summary of the dataset at localhost:8080/summary as follows (you can change the port number with option addr):

sticker-util -verbose -debug <dataset-root> @summarize -table=<table-filename-relative-to-root>

If featureMap and labelMap is empty string, then feature and label maps are ignored, respectively.

Implemented Models

LabelNearest: Sparse Weighted Nearest-Neighbor Method

LabelNearest is Sparse Weighted Nearest-Neighbor Method (Aoshima+ 2018) which achieved SOTA performances on several XMLC datasets (Bhatia+ 2016). Recently, the model can process each data entry faster in 15.1 (AmazonCat-13K), 1.14 (Wiki10-31K), 4.88 (Delicious-200K), 15.1 (WikiLSHTC-325K), 4.19 (Amazon-670K), and 15.5 ms (Amazon-3M) on average, under the same settings of the paper (compare to the original result).

For example, you can test this method on Amazon-3M dataset (Bhatia+ 2016) as follows:

sticker-util -verbose -debug ./data/Amazon-3M/ @trainNearest @testNearest -S=75 -alpha=2.0 -beta=1

See the help of @trainNearest and @testNearest for the sub-command options.

LabelNear: A faster implementation of LabelNearest

LabelNear is a faster implementation of LabelNearest which uses the optimal Densified One Permutation Hashing (DOPH) and the reservoir sampling. This method can process every data entry in about 1 ms with little performance degradation. You can see the results on several XMLC datasets (Bhatia+ 2016) at Dropbox.

Almost parameters and options are same with the ones of LabelNearest. See the help of @trainNear and @testNear for details.

Other Models

Implemented in core

  • LabelConst: Multi-label constant model (see GoDoc)
  • LabelOne: One-versus-rest classifier for multi-label ranking (see GoDoc)

Implemented in plugin

  • LabelBoost: Multi-label Boosting model (see GoDoc)
  • LabelForest: Variously-modified FastXML model (see GoDoc)
  • LabelNext: Your next-generation model (you can add your own train and test commands, see plugin/next/init.go)

Implemented Binary Classifiers

In core (recommended)

  • L1Logistic_PrimalSGD: L1-logistic regression with stochastic gradient descent (SGD) solving the primal problem (see GoDoc)
  • L1SVC_PrimalSGD: L1-Support Vector Classifier with SGD solving the primal problem (see GoDoc)

In plugin (not-recommended; for comparison only)

  • L1SVC_DualCD: L1-Support Vector Classifier with coordinate descent (CD) solving the dual problem (see GoDoc)
  • L2SVC_PrimalCD: L2-Support Vector Classifier with CD solving the primal problem (see GoDoc)

References

  • (Aoshima+ 2018) T. Aoshima, K. Kobayashi, and M. Minami. "Revisiting the Vector Space Model: Sparse Weighted Nearest-Neighbor Method for Extreme Multi-Label Classification." arXiv:1802.03938, 2018.
  • (Bhatia+ 2016) K. Bhatia, H. Jain, Y. Prabhu, and M. Varma. The Extreme Classification Repository. 2016. Retrieved January 4, 2018 from http://manikvarma.org/downloads/XC/XMLRepository.html

About

Package sticker provides a framework for multi-label classification.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published