Train NLTK objects with zero code
Switch branches/tags
Nothing to show
Pull request Compare This branch is 211 commits behind japerk:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


NLTK Trainer

NLTK Trainer exists to make training and evaluating NLTK objects as easy as possible.


You must have Python 2.6 with argparse and NLTK 2.0 installed. NumPy, SciPy, and megam are recommended for training Maxent classifiers.

Training Classifiers

For a complete list of usage options::
python --help
Train a binary NaiveBayes classifier on the movie_reviews corpus, using paragraphs as the training instances::
python --instances paras --algorithm NaiveBayes movie_reviews
Include bigrams as features::
python --instances paras --algorithm NaiveBayes --bigrams movie_reviews
Minimum score threshold::
python --instances paras --algorithm NaiveBayes --bigrams --min_score 3 movie_reviews
Maximum number of features::
python --instances paras --algorithm NaiveBayes --bigrams --max_feats 1000 movie_reviews
Use the default Maxent algorithm::
python --instances paras --algorithm Maxent movie_reviews
Use the MEGAM Maxent algorithm::
python --instances paras --algorithm MEGAM movie_reviews
Train on files instead of paragraphs::
python --instances files --algorithm MEGAM movie_reviews
Train on sentences::
python --instances sents --algorithm MEGAM movie_reviews
Evaluate the classifier by training on 3/4 of the paragraphs and testing against the remaing 1/4, without pickling::
python --instances paras --algorithm NaiveBayes --fraction 0.75 --no-pickle movie_reviews