Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional categories for different NLTK usages #69

Closed
alvations opened this issue Apr 7, 2017 · 1 comment
Closed

Additional categories for different NLTK usages #69

alvations opened this issue Apr 7, 2017 · 1 comment
Assignees

Comments

@alvations
Copy link
Contributor

alvations commented Apr 7, 2017

We have all-corpora and all but it'll be nice if we can several new category that includes:

  • popular

    • punkt
    • stopwords
    • wordnet
    • averaged_perceptron_tagger
    • brown
    • movie_reviews
    • words
  • tokenizers

    • punkt
    • snowball
    • perluniprops
    • nonbreaking_prefixes

That way I think it's easier to advise users to do the following to install nltk:

pip install -U nltk
python -m nltk.downloader popular

More importantly, I think all-no-third-party and all-third-party, so that we can separate issues when the third-party datasets/models don't update their checksum to nltk when they refresh their data/models.

@stevenbird Are the suggestions okay? How should we go about adding these categories?

@stevenbird
Copy link
Member

@alvations: great idea; simply create new collections over in nltk_data/collections/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants