Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update corpus metadata files #31

Closed
alexrudnick opened this issue Jan 17, 2012 · 2 comments
Closed

Update corpus metadata files #31

alexrudnick opened this issue Jan 17, 2012 · 2 comments

Comments

@alexrudnick
Copy link
Member

Check that the license and other information in the corpus index files is correct.

Migrated from http://code.google.com/p/nltk/issues/detail?id=96

@fcbond
Copy link
Contributor

fcbond commented Sep 18, 2015

It would be nice to have a standard interface to access the license and citation interface, as we do for the readme.

I suggest adding to the API something like:

   def license(self):
        """
        Return the contents of the corpus LICENSE file, if it exists.
        """
        if os.path.exist(self._root.join("LICENSE")):
            return self.open("LICENSE").read()
        else:
            return "No LICENSE found for this corpus (maybe check the README)"

And the same for '''citation' which would look for "citation.bib".

Then we need to add the LICENSE and citation info for each corpus. I will do it for the open multilingual wordnet and wordnet.

@stevenbird
Copy link
Member

Good idea. I've added this, but without the conditional, since I think it's fine let Python generate an error message. I see that there are many corpora containing README.txt instead of README, so that also needs to be standardized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants