Yahoo!'s topic modelling framework using Latent Dirichlet Allocation
Pull request Compare This branch is even with sudar:master.
Latest commit 28011b8 Sep 21, 2011 Shravan Narayanamurthy Failing LDA job on tracker and hadoop failures
Currently, I have seen that during some of these errors
the LDA job doesn't fail and continues to run though one
or few of the trackers have failed and will counted as an
unsuccessful attempt. The restart code will handle it but
failing the job early is better since all of them will be
in sync while the sampling still happens. This needs the
mapred.max.tracker.failures flag to be set for proper
handling by hadoop and mapred.map.max.attempts doesn't
suffice. Fixing that.
Failed to load latest commit information.
docs License changed to BSD from MPL as per Yahoo! requirements Jul 19, 2011
images All of the project contents May 25, 2011
license Changing to Apache License Jul 20, 2011
src Changing to Apache License Jul 20, 2011
test All of the project contents May 25, 2011
Contribution_License_Agreement_Yahoo.pdf Adding the Contribution License Agreement PDF. Sep 13, 2011
Doxyfile All of the project contents May 25, 2011
Formatter.sh Fixing some bugs in the multi-machine script for test mode Jul 14, 2011
LDA.sh Fixing some bugs in the multi-machine script for test mode Jul 14, 2011
LICENSE Changing to Apache License Jul 20, 2011
Makefile Changed hadoop to use ${HADOOP_CMD} and changed to Jul 20, 2011
README License changed to BSD from MPL as per Yahoo! requirements Jul 19, 2011
Tokenizer.java Fixing some bugs in the multi-machine script for test mode Jul 14, 2011
commons.mk All of the project contents May 25, 2011
copyright.sh All of the project contents May 25, 2011
create_dep_file_targets.sh All of the project contents May 25, 2011
create_dir_file_targets.sh All of the project contents May 25, 2011
create_obj_file_targets.sh All of the project contents May 25, 2011
functions.sh Changed hadoop to use ${HADOOP_CMD} and changed to Jul 20, 2011
install.sh License changed to BSD from MPL as per Yahoo! requirements Jul 19, 2011
runLDA.sh Failing LDA job on tracker and hadoop failures Sep 21, 2011
setLibVars.sh All of the project contents May 25, 2011

README

The Yahoo_LDA project uses several 3rd party open source libraries and tools.

This file summarizes the tools used, their purpose, and the licenses under which they're released. 

Except as specifically stated below, the 3rd party software packages are not distributed as part of

this project, but instead are separately downloaded and built on the developer’s machine as a 

pre-build step. 

* Ice-3.4.1 (GNU GENERAL PUBLIC LICENSE)
* An efficient inter process communication framework which is used for the distributed storage of (topic, word) tables.
* http://www.zeroc.com/

* cppunit-1.12.1 (GNU LESSER GENERAL PUBLIC LICENSE)
* C++ unit testing framework. We use this for unit tests.
* http://cppunit.sourceforge.net

* glog-0.3.0 (BSD)
* Logfile generation (Google's log library).
* http://code.google.com/p/google-glog/

* mcpp-2.7.2 (BSD)
* C++ preprocessor
* http://mcpp.sourceforge.net/

* tbb22_20090809oss (GNU GENERAL PUBLIC LICENSE)
* Intel Threading Building Blocks. Multithreaded processing library. Much easier to use than pthreads. We use the pipeline class.
* http://threadingbuildingblocks.org

* bzip2-1.0.5 (BSD)
* Data compression
* http://www.bzip.org/

* gflags-1.2 (BSD)
* Google's flag processing library (used for commandline options) 
* http://code.google.com/p/google-gflags/

* protobuf-2.2.0a (BSD)
* Protocol buffers (used for serializing data to disk and as internal key data structure). Google's serialization library 
* http://code.google.com/p/protobuf/

* boost-1.46.0 (Boost Software License - Version 1.0 - August 17th, 2003)
* Boost Libraries (various datatypes)
* http://www.boost.org/

Please refer to the html or pdf documentation present at docs/html/index.html & docs/latex/refman.pdf respectively for more information.