Cannot Import PunktWordTokenizer in nltk 3.3 #2122

ghost · 2018-09-13T14:42:15Z

How do I use PunktWordTokenizer in nltk 3.3? Has it been deprecated or renamed?
>>> nltk.__version__ '3.3'
>>> from nltk.tokenize import PunktWordTokenizer Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: cannot import name 'PunktWordTokenizer'
Any help/suggestion is highly appreciated.

The text was updated successfully, but these errors were encountered:

alvations · 2018-09-14T00:31:33Z

Punkt is a sentence tokenizer algorithm not word, for word tokenization, you can use functions in nltk.tokenize. Most commonly, people use the NLTK version of the Treebank word tokenizer with

>>> from nltk import word_tokenize
>>> word_tokenize("This is a sentence, where foo bar is present.")
['This', 'is', 'a', 'sentence', ',', 'where', 'foo', 'bar', 'is', 'present', '.']

Also, please do take a look at http://www.nltk.org/book/ch03.html

ghost · 2018-09-14T08:42:15Z

Yeah I usually use word_tokenize() . I might be wrong but wasn't PunktWordTokenizer present in the previous versions? Like in the nltk versions around 2014-15?

alvations · 2018-09-18T00:55:47Z

Yes, the PunktWordTokenizer was exposed previously but it wasn't a real word tokenizer, it's more like a pre-processor before Punkt decides on where to split the sentence. So now it's no longer exposed to users to avoid confusion.

If you're interested in improving Punkt, do take a look at #2008

iamRVel · 2019-01-25T06:51:57Z

Nope, It's available in 3.3 version but use the following code from nltk.tokenize.punkt import PunktSentenceTokenizer

alvations · 2019-05-07T05:36:52Z

Yes, it does seems like the PunktSentenceTokenizer has been re-exposured again https://github.com/nltk/nltk/blob/develop/nltk/tokenize/punkt.py#L1236

Closing this issue as resolved then =)

Please do reopen the issue if it's still relevant/unresolved.

alvations added the resolved label Sep 14, 2018

alvations closed this as completed May 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot Import PunktWordTokenizer in nltk 3.3 #2122

Cannot Import PunktWordTokenizer in nltk 3.3 #2122

ghost commented Sep 13, 2018

alvations commented Sep 14, 2018 •

edited

ghost commented Sep 14, 2018

alvations commented Sep 18, 2018

iamRVel commented Jan 25, 2019

alvations commented May 7, 2019

Cannot Import PunktWordTokenizer in nltk 3.3 #2122

Cannot Import PunktWordTokenizer in nltk 3.3 #2122

Comments

ghost commented Sep 13, 2018

alvations commented Sep 14, 2018 • edited

ghost commented Sep 14, 2018

alvations commented Sep 18, 2018

iamRVel commented Jan 25, 2019

alvations commented May 7, 2019

alvations commented Sep 14, 2018 •

edited