Extweme Wabbit implements Probabilistic Label Tree (PLT) for algorithm extreme multi-label classification in Vowpal Wabbit
Clone or download
Pull request Compare This branch is 8 commits ahead, 1146 commits behind VowpalWabbit:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
R
acinclude.d
big_tests
c_test Bi-annual PR (VowpalWabbit#1270) Jul 15, 2017
cluster
cs
demo
deploy_vw
doc
explore/cpp
java
library
logo_assets
python
rapidjson
test
utl
vowpalwabbit
xml_experiments
.editorconfig
.gitignore
.gitkeep
.travis.yml
AUTHORS
INSTALL
LICENSE
Makefile
Makefile.am
README.deploy.txt
README.md
README.windows.txt
appveyor.yml
autogen.sh
configure.ac
deployvw.bat
libvw.pc.in
libvw_c_wrapper.pc.in
mkdevdist.sh

README.md

Extweme Wabbit

This fork implements Probabilistic Label Trees (PLTs) in Vowpal Wabbit for extreme multi-label classification.

PLT options

--plt arg               Use PLT for multi-label learning with arg labels
--kary_tree arg (=2)    Use an arg-ary tree. By default the tree is binary
--top_k arg (=1)        Predict arg top labels
--threshold arg         Predict labels with marginal probabilities greater than arg

We recommended to use --sgd with --plt for the fastest learning and the best memory efficiency.

Example of usage

# To train:
vw --plt <num labels> <train dataset> -f <output model> --sgd -l <learning rate> --kary_tree <tree arity> --passes <num epochs> -b <number of bits in the feature table> -c

# To test:
vw -t -i <model file> <test dataset> --top_k <k top label> -p <prediction file>

More examples and scripts to replicate results on datasets from The Extreme Classification Repository can be found in the xml_experiments directory.

References

PLTs have been introduced in the following article:

Kalina Jasinska, Krzysztof Dembczynski, Robert Busa-Fekete, Karlson Pfannschmidt, Timo Klerx, Eyke Hullermeier: Extreme F-measure Maximization using Sparse Probability Estimates. Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1435-1444, 2016.

@inproceedings{Jasinska_et_al_2016,
  title = 	 {Extreme F-measure Maximization using Sparse Probability Estimates},
  author = 	 {Kalina Jasinska and Krzysztof Dembczynski and Robert Busa-Fekete and Karlson Pfannschmidt and Timo Klerx and Eyke Hullermeier},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {1435--1444},
  year = 	 {2016},
  editor = 	 {Maria Florina Balcan and Kilian Q. Weinberger},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  publisher = 	 {PMLR},
}

Please cite this article when you use PLTs in your research. Remark that this implementation of PLTs does not contain procedures for optimizing the macro F-measure.

Vowpal Wabbit

/*
Copyright (c) by respective owners including Yahoo!, Microsoft, and
individual contributors. All rights reserved.  Released under a BSD (revised)
license as described in the file LICENSE.
 */

Vowpal Wabbit

Build Status Windows Build Status Coverage Status

This is the vowpal wabbit fast online learning code. For Windows, look at README.windows.txt

Prerequisite software

These prerequisites are usually pre-installed on many platforms. However, you may need to consult your favorite package manager (yum, apt, MacPorts, brew, ...) to install missing software.

  • Boost library, with the Boost::Program_Options library option enabled.
  • The zlib compression library + headers. In linux distros: package zlib-devel (Red Hat/CentOS), or zlib1g-dev (Ubuntu/Debian)
  • lsb-release (RedHat/CentOS: redhat-lsb-core, Debian: lsb-release, Ubuntu: you're all set, OSX: not required)
  • GNU autotools: autoconf, automake, libtool, autoheader, et. al. This is not a strict prereq. On many systems (notably Ubuntu with libboost-program-options-dev installed), the provided Makefile works fine.
  • (optional) git if you want to check out the latest version of vowpal wabbit, work on the code, or even contribute code to the main project.

Getting the code

You can download the latest version from here. The very latest version is always available via 'github' by invoking one of the following:

## For the traditional ssh-based Git interaction:
$ git clone git://github.com/JohnLangford/vowpal_wabbit.git

## For HTTP-based Git interaction
$ git clone https://github.com/JohnLangford/vowpal_wabbit.git

Compiling

You should be able to build the vowpal wabbit on most systems with:

$ make
$ make test    # (optional)

If that fails, try:

$ ./autogen.sh
$ make
$ make test    # (optional)
$ make install

Note that ./autogen.sh requires automake (see the prerequisites, above.)

./autogen.sh's command line arguments are passed directly to configure as if they were configure arguments and flags.

Note that ./autogen.sh will overwrite the supplied Makefile, including the Makefiles in sub-directories, so keeping a copy of the Makefiles may be a good idea before running autogen.sh. If your original Makefiles were overwritten by autogen.sh calling automake, you may always get the originals back from git using:

git checkout Makefile */Makefile

Be sure to read the wiki: https://github.com/JohnLangford/vowpal_wabbit/wiki for the tutorial, command line options, etc.

The 'cluster' directory has it's own documentation for cluster parallel use, and the examples at the end of test/Runtests give some example flags.

C++ Optimization

The default C++ compiler optimization flags are very aggressive. If you should run into a problem, consider creating and running configure with the --enable-debug option, e.g.:

$ ./configure --enable-debug

or passing your own compiler flags via the OPTIM_FLAGS make variable:

$ make OPTIM_FLAGS="-O0 -g"

Ubuntu/Debian specific info

On Ubuntu/Debian/Mint and similar the following sequence should work for building the latest from github:

# -- Get libboost program-options and zlib:
apt-get install libboost-program-options-dev zlib1g-dev

# -- Get the python libboost bindings (python subdir) - optional:
apt-get install libboost-python-dev

# -- Get the vw source:
git clone git://github.com/JohnLangford/vowpal_wabbit.git

# -- Build:
cd vowpal_wabbit
make
make test       # (optional)
make install

Ubuntu advanced build options (clang and static)

If you prefer building with clang instead of gcc (much faster build and slighly faster executable), install clang and change the make step slightly:

apt-get install clang

make CXX=clang++

A statically linked vw executable that is not sensitive to boost version upgrades and can be safely copied between different Linux versions (e.g. even from Ubuntu to Red-Hat) can be built and tested with:

make CXX='clang++ -static' clean vw test     # ignore warnings

Mac OS X-specific info

OSX requires glibtools, which is available via the brew or MacPorts package managers.

Complete brew install of 8.0

brew install vowpal-wabbit

The homebrew formula for VW is located on github.

Manual install of Vowpal Wabbit

OSX Dependencies (if using Brew):

brew install libtool
brew install autoconf
brew install automake
brew install boost
brew install boost-python

OSX Dependencies (if using MacPorts):

## Install glibtool and other GNU autotool friends:
$ port install libtool autoconf automake

## Build Boost for Mac OS X 10.8 and below
$ port install boost +no_single +no_static +openmpi +python27 configure.cxx_stdlib=libc++ configure.cxx=clang++

## Build Boost for Mac OS X 10.9 and above
$ port install boost +no_single +no_static +openmpi +python27

OSX Manual compile:

Mac OS X 10.8 and below: configure.cxx_stdlib=libc++ and configure.cxx=clang++ ensure that clang++ uses the correct C++11 functionality while building Boost. Ordinarily, clang++ relies on the older GNU g++ 4.2 series header files and stdc++ library; libc++ is the clang replacement that provides newer C++11 functionality. If these flags aren't present, you will likely encounter compilation errors when compiling vowpalwabbit/cbify.cc. These error messages generally contain complaints about std::to_string and std::unique_ptr types missing.

To compile:

$ sh autogen.sh --enable-libc++
$ make
$ make test    # (optional)

OSX Python Binding installation with Anaconda

When using Anaconda as the source for Python the default Boost libraries used in the Makefile need to be adjusted. Below are the steps needed to install the Python bindings for VW. This should work for Python 2 and 3. Adjust the directories to match where anaconda is installed.

# create anaconda environment with boost
conda create --name vw boost
source activate vw
git clone https://github.com/JohnLangford/vowpal_wabbit.git
cd vowpal_wabbit
# edit Makefile
# change BOOST_INCLUDE to use anaconda env dir: /anaconda/envs/vw/include
# change BOOST_LIBRARY to use anaconda lib dir: /andaconda/envs/vw/lib
cd python
python setup.py install

Code Documentation

To browse the code more easily, do

make doc

and then point your browser to doc/html/index.html.