Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Yahoo!'s topic modelling framework using Latent Dirichlet Allocation

branch: master

This branch is 0 commits ahead and 0 commits behind master

Failing LDA job on tracker and hadoop failures

Currently, I have seen that during some of these errors
the LDA job doesn't fail and continues to run though one
or few of the trackers have failed and will counted as an
unsuccessful attempt. The restart code will handle it but
failing the job early is better since all of them will be
in sync while the sampling still happens. This needs the
mapred.max.tracker.failures flag to be set for proper
handling by hadoop and mapred.map.max.attempts doesn't
suffice. Fixing that.
latest commit 28011b8124
Shravan Narayanamurthy authored
Octocat-spinner-32 docs License changed to BSD from MPL as per Yahoo! requirements July 19, 2011
Octocat-spinner-32 images All of the project contents May 25, 2011
Octocat-spinner-32 license Changing to Apache License July 20, 2011
Octocat-spinner-32 src Changing to Apache License July 20, 2011
Octocat-spinner-32 test All of the project contents May 25, 2011
Octocat-spinner-32 Contribution_License_Agreement_Yahoo.pdf Adding the Contribution License Agreement PDF. September 13, 2011
Octocat-spinner-32 Doxyfile All of the project contents May 25, 2011
Octocat-spinner-32 Formatter.sh Fixing some bugs in the multi-machine script for test mode July 14, 2011
Octocat-spinner-32 LDA.sh Fixing some bugs in the multi-machine script for test mode July 14, 2011
Octocat-spinner-32 LICENSE Changing to Apache License July 20, 2011
Octocat-spinner-32 Makefile Changed hadoop to use ${HADOOP_CMD} and changed to July 20, 2011
Octocat-spinner-32 README License changed to BSD from MPL as per Yahoo! requirements July 19, 2011
Octocat-spinner-32 Tokenizer.java Fixing some bugs in the multi-machine script for test mode July 14, 2011
Octocat-spinner-32 commons.mk All of the project contents May 25, 2011
Octocat-spinner-32 copyright.sh All of the project contents May 25, 2011
Octocat-spinner-32 create_dep_file_targets.sh All of the project contents May 25, 2011
Octocat-spinner-32 create_dir_file_targets.sh All of the project contents May 25, 2011
Octocat-spinner-32 create_obj_file_targets.sh All of the project contents May 25, 2011
Octocat-spinner-32 functions.sh Changed hadoop to use ${HADOOP_CMD} and changed to July 20, 2011
Octocat-spinner-32 install.sh License changed to BSD from MPL as per Yahoo! requirements July 19, 2011
Octocat-spinner-32 runLDA.sh Failing LDA job on tracker and hadoop failures September 21, 2011
Octocat-spinner-32 setLibVars.sh All of the project contents May 25, 2011
README
The Yahoo_LDA project uses several 3rd party open source libraries and tools.

This file summarizes the tools used, their purpose, and the licenses under which they're released. 

Except as specifically stated below, the 3rd party software packages are not distributed as part of

this project, but instead are separately downloaded and built on the developer’s machine as a 

pre-build step. 

* Ice-3.4.1 (GNU GENERAL PUBLIC LICENSE)
* An efficient inter process communication framework which is used for the distributed storage of (topic, word) tables.
* http://www.zeroc.com/

* cppunit-1.12.1 (GNU LESSER GENERAL PUBLIC LICENSE)
* C++ unit testing framework. We use this for unit tests.
* http://cppunit.sourceforge.net

* glog-0.3.0 (BSD)
* Logfile generation (Google's log library).
* http://code.google.com/p/google-glog/

* mcpp-2.7.2 (BSD)
* C++ preprocessor
* http://mcpp.sourceforge.net/

* tbb22_20090809oss (GNU GENERAL PUBLIC LICENSE)
* Intel Threading Building Blocks. Multithreaded processing library. Much easier to use than pthreads. We use the pipeline class.
* http://threadingbuildingblocks.org

* bzip2-1.0.5 (BSD)
* Data compression
* http://www.bzip.org/

* gflags-1.2 (BSD)
* Google's flag processing library (used for commandline options) 
* http://code.google.com/p/google-gflags/

* protobuf-2.2.0a (BSD)
* Protocol buffers (used for serializing data to disk and as internal key data structure). Google's serialization library 
* http://code.google.com/p/protobuf/

* boost-1.46.0 (Boost Software License - Version 1.0 - August 17th, 2003)
* Boost Libraries (various datatypes)
* http://www.boost.org/

Please refer to the html or pdf documentation present at docs/html/index.html & docs/latex/refman.pdf respectively for more information.
Something went wrong with that request. Please try again.