Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Yahoo!'s topic modelling framework using Latent Dirichlet Allocation
Branch: master
Pull request Compare This branch is even with sudar:master.

Failing LDA job on tracker and hadoop failures

Currently, I have seen that during some of these errors
the LDA job doesn't fail and continues to run though one
or few of the trackers have failed and will counted as an
unsuccessful attempt. The restart code will handle it but
failing the job early is better since all of them will be
in sync while the sampling still happens. This needs the
mapred.max.tracker.failures flag to be set for proper
handling by hadoop and mapred.map.max.attempts doesn't
suffice. Fixing that.
latest commit 28011b8124
Shravan Narayanamurthy authored
Failed to load latest commit information.
docs License changed to BSD from MPL as per Yahoo! requirements
images All of the project contents
license Changing to Apache License
src Changing to Apache License
test All of the project contents
Contribution_License_Agreement_Yahoo.pdf Adding the Contribution License Agreement PDF.
Doxyfile All of the project contents
Formatter.sh Fixing some bugs in the multi-machine script for test mode
LDA.sh Fixing some bugs in the multi-machine script for test mode
LICENSE Changing to Apache License
Makefile Changed hadoop to use ${HADOOP_CMD} and changed to
README License changed to BSD from MPL as per Yahoo! requirements
Tokenizer.java Fixing some bugs in the multi-machine script for test mode
commons.mk All of the project contents
copyright.sh All of the project contents
create_dep_file_targets.sh All of the project contents
create_dir_file_targets.sh All of the project contents
create_obj_file_targets.sh All of the project contents
functions.sh Changed hadoop to use ${HADOOP_CMD} and changed to
install.sh License changed to BSD from MPL as per Yahoo! requirements
runLDA.sh Failing LDA job on tracker and hadoop failures
setLibVars.sh All of the project contents

README

The Yahoo_LDA project uses several 3rd party open source libraries and tools.

This file summarizes the tools used, their purpose, and the licenses under which they're released. 

Except as specifically stated below, the 3rd party software packages are not distributed as part of

this project, but instead are separately downloaded and built on the developer’s machine as a 

pre-build step. 

* Ice-3.4.1 (GNU GENERAL PUBLIC LICENSE)
* An efficient inter process communication framework which is used for the distributed storage of (topic, word) tables.
* http://www.zeroc.com/

* cppunit-1.12.1 (GNU LESSER GENERAL PUBLIC LICENSE)
* C++ unit testing framework. We use this for unit tests.
* http://cppunit.sourceforge.net

* glog-0.3.0 (BSD)
* Logfile generation (Google's log library).
* http://code.google.com/p/google-glog/

* mcpp-2.7.2 (BSD)
* C++ preprocessor
* http://mcpp.sourceforge.net/

* tbb22_20090809oss (GNU GENERAL PUBLIC LICENSE)
* Intel Threading Building Blocks. Multithreaded processing library. Much easier to use than pthreads. We use the pipeline class.
* http://threadingbuildingblocks.org

* bzip2-1.0.5 (BSD)
* Data compression
* http://www.bzip.org/

* gflags-1.2 (BSD)
* Google's flag processing library (used for commandline options) 
* http://code.google.com/p/google-gflags/

* protobuf-2.2.0a (BSD)
* Protocol buffers (used for serializing data to disk and as internal key data structure). Google's serialization library 
* http://code.google.com/p/protobuf/

* boost-1.46.0 (Boost Software License - Version 1.0 - August 17th, 2003)
* Boost Libraries (various datatypes)
* http://www.boost.org/

Please refer to the html or pdf documentation present at docs/html/index.html & docs/latex/refman.pdf respectively for more information.
Something went wrong with that request. Please try again.