Additional categories for different NLTK usages #69

alvations · 2017-04-07T02:11:49Z

We have all-corpora and all but it'll be nice if we can several new category that includes:

popular
- punkt
- stopwords
- wordnet
- averaged_perceptron_tagger
- brown
- movie_reviews
- words
tokenizers
- punkt
- snowball
- perluniprops
- nonbreaking_prefixes

That way I think it's easier to advise users to do the following to install nltk:

pip install -U nltk
python -m nltk.downloader popular

More importantly, I think all-no-third-party and all-third-party, so that we can separate issues when the third-party datasets/models don't update their checksum to nltk when they refresh their data/models.

@stevenbird Are the suggestions okay? How should we go about adding these categories?

The text was updated successfully, but these errors were encountered:

stevenbird · 2017-04-13T10:26:35Z

@alvations: great idea; simply create new collections over in nltk_data/collections/

alvations mentioned this issue May 8, 2017

how to download corpus panlex_lite package in nltk in python nltk/nltk#1253

Closed

alvations self-assigned this May 8, 2017

alvations mentioned this issue May 9, 2017

Create different categories for nltk_data #75

Merged

alvations closed this as completed May 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional categories for different NLTK usages #69

Additional categories for different NLTK usages #69

alvations commented Apr 7, 2017 •

edited

stevenbird commented Apr 13, 2017

Additional categories for different NLTK usages #69

Additional categories for different NLTK usages #69

Comments

alvations commented Apr 7, 2017 • edited

stevenbird commented Apr 13, 2017

alvations commented Apr 7, 2017 •

edited