Clone this wiki locally
Answers to Frequently Asked Questions about NLTK
Do you have a question that is not answered here? Please post it to the nltk-users mailing list.
- What license does NLTK use?
- What are the plans for further development of NLTK?
- I think I found a bug; where do I report it?
- How can I contribute to NLTK development?
- What data sources does NLTK use and how can more be added?
- I'm planning some long-term research using NLTK; how long is the toolkit going to be supported?
- Why is Python giving me a syntax error when I use NLTK?
- What is the difference between NLTK and NLTK-Lite?
- How can I install NLTK from the source code repository?
- How can I find out where NLTK is installed on my system?
- What papers have been published about NLTK?
- How is NLTK development supported?
- How did NLTK start?
- If I just "use" NLTK using import statements in Python, am I obliged to publish my source code as well?
- What is Natural Language Processing?
NLTK is open source software. The source code is distributed under the terms of the Apache License Version 2.0. The documentation is distributed under the terms of the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States license. The corpora are distributed under various licenses, as documented in their respective README files.
NLTK is undergoing continual development as new modules are added and existing ones are improved. NLTK 3.0 has been finalized, and we have updated the code samples in the NLTK book.
Please check if an issue report has already been filed by searching the Issue Tracker. If not, please report the problem, giving as much detail as possible. Please include a code sample that permits us to replicate the problem.
New contributions are always welcome. Please consult the list of development priorities and the issue tracker and submit a pull request. If you have particular expertise to offer, or new functionality to propose, please describe it on the nltk-dev mailing list.
Dozens of corpora are available for use with NLTK (see the list of available datasets, and the Corpus HOWTO). NLTK can be interfaced to other corpora; for instructions see section 2.1 of the NLTK book, and consult the code in the corpus module. Requests for advice in developing corpus readers for new formats should be posted to the nltk-dev mailing list. Completed corpus readers should be submitted via the Issue Tracker. Please specify the location of the corpus and whether it can be redistributed with NLTK.
We plan to continue supporting the toolkit for as long as possible. We published the NLTK book in 2009 and a second edition is due out in 2016. We plan to support the toolkit while the book is in active use, and while the developers are employed in NLP research and teaching. Bug reports will be dealt with as quickly as possible.
NLTK requires Python version 2.7, or Python 3.2 onwards. If you use an earlier version of Python you will see many syntax errors.
In mid-2005, the NLTK developers created a lightweight version of NLTK called NLTK-Lite. NLTK-Lite was simpler and faster than NLTK at that time. As of version 0.9, NLTK-Lite provided the same functionality as NLTK. Unlike the old NLTK, NLTK-Lite did not impose such a heavy burden on the programmer. Wherever possible, standard Python objects were adopted instead of custom NLP versions, so that students learning to program for the first time would be learning to program in Python with some useful libraries, rather than learning to program in NLTK. Once it reached version 1.0 (in mid 2009), NLTK-Lite took over the original NLTK name and became NLTK 2.0.
Most users should install NLTK from a distribution. Please see the installation instructions. However, if you need an up-to-the-minute version, then you will have to install NLTK from the source repository. Once you've downloaded this, you'll need to run the top level setup.py program to install this version of NLTK on your machine.
Do the following in a Python interpreter session:>>> import nltk
NLTK has been used in a wide variety of published research. Please search Google Scholar for details.
NLTK is an open source project that depends mainly on the efforts of volunteers. Occasionally we have funds for a summer intern or TA to work on specified projects. Students and teachers also donate code. In 2008, we received support from Google Summer of Code. We encourage volunteers to get involved (please consult the wiki). If you find the toolkit useful, please make a donation to support further development.
The NLTK project began when Steven Bird was teaching CIS-530 at the University of Pennsylvania in 2001, and hired his star student, Edward Loper, from the previous offering of the course to be the teaching assistant (TA). They agreed a plan for developing software infrastructure for NLP teaching that could be easily maintained over time. Edward wrote up the plan, and both began work on it right away. Here is the Version 0.2 release announcement that appeared in September 2001.
No, there is no such obligation. You can use and modify NLTK without making any code available (see question 1).
Please see our book, or http://en.wikipedia.org/wiki/Natural_language_processing