-
Notifications
You must be signed in to change notification settings - Fork 0
stanvp/parallelnlp
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Parallelized implementation of Maximum Entropy and Naive Bayesian classifier. To run the tests: 1. Make sure that you have Java 1.6, Scala 2.9 and sbt 0.11 2. Install Menthor $ git clone git@github.com:stanvp/menthor.git $ cd menthor $ sbt > set publishArtifact in packageDoc := false // avoids scaladoc generation error > publish-local 3. Get the project $ git clone git@github.com:stanvp/parallelnlp.git $ cd parallelnlp $ sbt assembly 4. Running experiments with Movie Review corpus $ wget -c http://nltk.googlecode.com/svn/trunk/nltk_data/packages/corpora/movie_reviews.zip $ unzip movie_reviews.zip $ java -Xmx2048m -XX:MaxPermSize=256m -cp target/parallelnlp-assembly-1.0-SNAPSHOT.jar menthor.apps.BuildCorpus moviereviews 100 movie_reviews/ movie_reviews.train movie_reviews.features $ java -Xmx2048m -XX:MaxPermSize=256m -cp target/parallelnlp-assembly-1.0-SNAPSHOT.jar menthor.apps.MovieReviewClassifier naivebayes parallelbatch movie_reviews/ movie_reviews.features $ java -Xmx2048m -XX:MaxPermSize=256m -cp target/parallelnlp-assembly-1.0-SNAPSHOT.jar menthor.apps.MovieReviewClassifier maxent parallelbatch movie_reviews/ movie_reviews.features 5. Running experiments with 20 Newsgroups corpus $ wget -c http://people.csail.mit.edu/jrennie/20Newsgroups/20news-bydate.tar.gz $ tar xvvf 20news-bydate.tar.gz $ java -Xmx2048m -XX:MaxPermSize=256m -cp target/parallelnlp-assembly-1.0-SNAPSHOT.jar menthor.apps.BuildCorpus newsgroups 5000 20news-bydate/20news-bydate-train 20news-bydate/20news-bydate.train 20news-bydate/20news-bydate.features $ java -Xmx2048m -XX:MaxPermSize=256m -cp target/parallelnlp-assembly-1.0-SNAPSHOT.jar menthor.apps.NewsgroupsClassifier naivebayes parallelbatch 20news-bydate/20news-bydate.train 20news-bydate/20news-bydate-test 20news-bydate/20news-bydate.features true newsgroups.bench 1 $ java -Xmx2048m -XX:MaxPermSize=256m -cp target/parallelnlp-assembly-1.0-SNAPSHOT.jar menthor.apps.NewsgroupsClassifier maxent parallelbatch 20news-bydate/20news-bydate.train 20news-bydate/20news-bydate-test 20news-bydate/20news-bydate.features true newsgroups.bench 1 For more information about the project see the technical report report/parallelnlp.pdf
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published