Supervised Latent Dirichlet Allocation (LDA) using Diagonal Orthant (DO) DO-Probit classification
Clone or download
lejon Don't ensure labels in testset instances
Since test (text) instances is read separately the order of the classes may be figgerent than in the training set.
But this does not matter since, the classes used in the evaluation is from the combined loading of train and test
set and the mapping used comes from the training set.
Latest commit ddee9f8 Sep 17, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
extralibs/jdistlib/jdistlib/jdistlib
src Don't ensure labels in testset instances Sep 17, 2018
.travis.yml
README.md
pom.xml

README.md

Build Status

YourKit

DOLDA

Supervised LDA using DO-Probit

This is the repo for our Diagonal Orthant Latent Dirichlet Allocation (DOLDA) implementation described in the article arXiv:1602.00260:

   @unpublished{MagnussonJonsson2016,
      author = "Måns Magnusson, Leif Jonsson, Mattias Villani",
      title = "DOLDA - a regularized supervised topic model for high-dimensional multi-class regression",
      note = "http://github.com/lejon/DiagonalOrthantLDA",
      year = 2016}

The toolkit is Open Source Software, and is released under the Common Public License. You are welcome to use the code under the terms of the license for research or commercial purposes, however please acknowledge its use with a citation: Magnusson, Jonsson, Villani. "DOLDA - a regularized supervised topic model for high-dimensional multi-class regression"

For very large datasets you might need to add the -Xmx60g flag to Java

Please remember that this is a research prototype and the standard disclaimers apply. You will see printouts during unit tests, commented out code, old stuff not cleaned out yet etc.

Repo extras

  • extralibs Contains external dependencies which are not available from Maven Central or any other public repo yet.

Installation

  1. Install Apache Maven and run:

mvn package

in a shell of your choice.

Example Run

java -Xmx10g -cp target/DOLDA-1.6.0.jar xyz.lejon.runnables.DOLDAClassificationDistribution -normalize --run_cfg=src/main/resources/configuration/films.cfg

Acknowledgements

I'm a very satisfied user of the YourKit profiler. A Great product with great support. It has been sucessfully used for profiling in this project.

YourKit

YourKit supports open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of YourKit Java Profiler and YourKit .NET Profiler, innovative and intelligent tools for profiling Java and .NET applications.