Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

License #102

Open
djsutherland opened this issue Nov 27, 2017 · 7 comments
Open

License #102

djsutherland opened this issue Nov 27, 2017 · 7 comments

Comments

@djsutherland
Copy link

Can you clarify what license the nltk_data files are under? Is it the same license as nltk? Do the various data files have different licenses? conda-forge would like to begin packagaing nltk_data, because a few users have requested it (to make installing more uniform / track versioning / etc; conda-forge/staged-recipes#4463), but we'd need to know the license first.

@alvations
Copy link
Contributor

The different resources in nltk_data comes in different licenses. The licenses of the individual resources in nltk_data should be safe for re-distribution.

It'll be great to package nltk_data, would it be a pip-able data library?

@djsutherland
Copy link
Author

djsutherland commented Nov 27, 2017

It wouldn't be in pip, but you could get it with conda install nltk_data (assuming you've set up conda-forge: https://conda-forge.org).

I see now that the xml files specify the licenses of the data files. I guess the question is what license the xml files themselves have...they're so small that I doubt it really matters, but still not technically specified. Anyway, I guess we'll just say "License: Various" or whatever, still need to figure that out amongst ourselves though.

@saswata64900
Copy link

s in
One of our NLP project is completely dependent on NLTK tokenizer and POS tagger. But recently we figured out that the tokenizer and POS tagger models do not have a license and hence we are not able to use them in our project. Is it possible to add a license for those two models?
Is there any other models available in the net for tokenizer and POS tagger which is open source?

@thesamesam
Copy link

This remains a problem for distributions packaging nltk. Looking at https://www.nltk.org/nltk_data/, many of the fields have a blank licence/copyright field.

Would it be possible for nltk to construct a free/libre dataset which can be safely redistributed? Thanks.

@tomaarsen
Copy link
Member

Many of the NLTK data resources themselves contain licensing, copyright or README files that contain additional information on to what extent the data may be distributed. Perhaps that will help somewhat.

@thesamesam
Copy link

I did end up untarring the whole lot and taking a look but many of them had either no README (etc) or if they did have one, indicated they were proprietary.

@mgorny
Copy link

mgorny commented Dec 16, 2022

For the record, I'm removing NLTK from Gentoo because of this. IANAL but it looks like many of the corpora shouldn't be redistributed as part of nltk_data in the first place, and letting NLTK download them puts users at risk of copyright violation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants