You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add common corpus reader methods to nltk.Text, including words(), sents(), tagged_words(), tagged_sents(). Use punkt for sentence tokenisation, and the stanford tagger for tagging. Possibly add parsed_sents() and interface to the stanford parser.
The text was updated successfully, but these errors were encountered:
Support loading of local files and URLs, e.g., load the contents of a URL, stripping HTML markup using BeautifulSoup.get_text(), and tokenize using nltk.word_tokenize();
I personally think it makes more sense to initialize Text with text, and provide a classmethod for loading from URL or a file.
Some API ideas can be stolen from TextBlob - hey, they even bundle NLTK inside their package :) I made a brief look and some APIs looks controversial, but it still worths to look at.
Do you intend to use a pre-trained model (do we need a parameter in Text that specifies language, or otherwise a set of models to use?), or to train and predict on the text?
Add common corpus reader methods to nltk.Text, including
words()
,sents()
,tagged_words()
,tagged_sents()
. Use punkt for sentence tokenisation, and the stanford tagger for tagging. Possibly addparsed_sents()
and interface to the stanford parser.The text was updated successfully, but these errors were encountered: