Skip to content
This repository has been archived by the owner on Nov 21, 2018. It is now read-only.

Commit

Permalink
Fixed Issue #11
Browse files Browse the repository at this point in the history
  • Loading branch information
jimmy0017 committed Oct 20, 2015
1 parent 2a010a2 commit 70f7d74
Showing 1 changed file with 1 addition and 2 deletions.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,7 @@ $ mvn eclipse:eclipse
Corpus Preparation
------------------

Some sample data from the Associated Press can be found in this [separate repo](https://github.com/lintool
/Mr.LDA-data). This is the same sample data that is used in [Blei's LDA implementation in C](http://www.cs.princeton.edu/~blei/lda-c/).
Some sample data from the Associated Press can be found in this [separate repo](https://github.com/lintool/Mr.LDA-data). This is the same sample data that is used in [Blei's LDA implementation in C](http://www.cs.princeton.edu/~blei/lda-c/).

The repo includes a Python script for parsing the corpus into a format that Mr.LDA uses. The output of the script is stored in `ap-sample.txt.gz`. This is the data file that you'll want to load in HDFS.

Expand Down

0 comments on commit 70f7d74

Please sign in to comment.