# Bibliotecas para Text Mining e NLP
Pode verificar o funcionamento online destas e de outras bibliotecas em http://textanalysisonline.com

## NLTK
NLTK is the most famous Python Natural Language Processing Toolkit<Br>
Ver também: http://textminingonline.com/dive-into-nltk-part-i-getting-started-with-nltk

In [None]:
from pprint import pprint
import nltk

In [None]:
stopwords = nltk.corpus.stopwords.words('portuguese')
stopwords[:10]

In [None]:
text = nltk.pos_tag("And now for something completely different".split())
print(text)

In [None]:
sentence="eu que disse eu de vez em quando"
tokens=nltk.word_tokenize(sentence)
fd = nltk.FreqDist(tokens)
print(fd)

In [None]:
for w in fd:
    print(w, fd[w])

In [None]:
import nltk.stem as stem
sentence = "the flies died and denied their dead stating sensational lies"

stemmer = stem.porter.PorterStemmer()
res = [ stemmer.stem(w) for w in sentence.split()]
print(" ".join(res) )

Temos também `stemmers` mais sofisticados. Por exemplo, o SnowballStemmer também funciona para Português (apesar de ser mauzito) ...

In [None]:
print("Languages:", stem.snowball.SnowballStemmer.languages)

In [None]:
frase = "lindos são os prados verdes e com muita luz deste Portugal"
stemmer = stem.snowball.SnowballStemmer("portuguese")
res = [stemmer.stem(w) for w in frase.split()]
print(" ".join(res) )

In [None]:
from nltk.corpus import floresta
print ("Contains %s words" % len(floresta.words()))
nltk.corpus.floresta.words()

## NLP using Spacy

In [None]:
import spacy

nlp = spacy.load('en_core_web_sm')
doc = nlp(u'Apple is looking at buying U.K. startup for $1 billion')

Tokens

In [None]:
for token in doc:
    print(token.text)

Lemma, POS, Tag, Dependency Tag, ...

In [None]:
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_, token.is_alpha, token.is_stop)

Chunks

In [None]:
for chunk in doc.noun_chunks:
    print(chunk.text, chunk.root.text, chunk.root.dep_,
          chunk.root.head.text)

Named entities

In [None]:
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

## TextBlob
TextBlob is a new python natural language processing toolkit, which stands on the shoulders of giants like NLTK and Pattern, provides text mining, text analysis and text processing modules for python developers.<BR>
Material baseado em: http://textminingonline.com/getting-started-with-textblob

In [None]:
from textblob import TextBlob

In [None]:
text = """Natural language processing (NLP) deals with the application of computational models to text or speech data.
Application areas within NLP include automatic (machine) translation between languages; dialogue systems, which allow a human to interact with a machine using natural language; and information extraction, where the goal is to transform unstructured text into structured (database) representations that can be searched and browsed in flexible ways."""

In [None]:
blob = TextBlob(text)

In [None]:
pprint(blob.tags)

In [None]:
pprint(blob.noun_phrases)

In [None]:
pprint(blob.sentences)

In [None]:
pprint(blob.translate(to="fr"))

### Treinar um classificador (opcional)
Additional references: [Tutorial: Building a Text Classification System](https://textblob.readthedocs.io/en/dev/classifiers.html#loading-data-and-creating-a-classifier)

In [None]:
train = [
     ('I love this sandwich.', 'pos'),
     ('this is an amazing place!', 'pos'),
     ('I feel very good about these beers.', 'pos'),
     ('this is my best work.', 'pos'),
     ("what an awesome view", 'pos'),
     ('I do not like this restaurant', 'neg'),
     ('I am tired of this stuff.', 'neg'),
     ("I can't deal with this", 'neg'),
     ('he is my sworn enemy!', 'neg'),
     ('my boss is horrible.', 'neg')
 ]

In [None]:
from textblob.classifiers import NaiveBayesClassifier

In [None]:
cl = NaiveBayesClassifier(train)

In [None]:
cl.classify("This is an amazing library!")