Uso básico
===

Creación de un objeto TextBlob
---

In [1]:
raw_text="""
Analytics is the discovery, interpretation, and communication of meaningful patterns
in data. Especially valuable in areas rich with recorded information, analytics relies
on the simultaneous application of statistics, computer programming and operations research
to quantify performance.

Organizations may apply analytics to business data to describe, predict, and improve business
performance. Specifically, areas within analytics include predictive analytics, prescriptive
analytics, enterprise decision management, descriptive analytics, cognitive analytics, Big
Data Analytics, retail analytics, store assortment and stock-keeping unit optimization,
marketing optimization and marketing mix modeling, web analytics, call analytics, speech
analytics, sales force sizing and optimization, price and promotion modeling, predictive
science, credit risk analysis, and fraud analytics. Since analytics can require extensive
computation (see big data), the algorithms and software used for analytics harness the most
current methods in computer science, statistics, and mathematics.
"""

raw_text = raw_text.replace('\n', ' ')

## Procesamiento básico de texto

In [2]:
##
## Crea un objeto TextBlob a partir del cual se realiza
## el procesamiento
##
from textblob import TextBlob

text = TextBlob(raw_text)
text

TextBlob(" Analytics is the discovery, interpretation, and communication of meaningful patterns in data. Especially valuable in areas rich with recorded information, analytics relies on the simultaneous application of statistics, computer programming and operations research to quantify performance.  Organizations may apply analytics to business data to describe, predict, and improve business performance. Specifically, areas within analytics include predictive analytics, prescriptive analytics, enterprise decision management, descriptive analytics, cognitive analytics, Big Data Analytics, retail analytics, store assortment and stock-keeping unit optimization, marketing optimization and marketing mix modeling, web analytics, call analytics, speech analytics, sales force sizing and optimization, price and promotion modeling, predictive science, credit risk analysis, and fraud analytics. Since analytics can require extensive computation (see big data), the algorithms and software used for 

In [3]:
##
## Transformaciones básicas usando las funciones propias de
## los strings de Python
## 
text.upper()

TextBlob(" ANALYTICS IS THE DISCOVERY, INTERPRETATION, AND COMMUNICATION OF MEANINGFUL PATTERNS IN DATA. ESPECIALLY VALUABLE IN AREAS RICH WITH RECORDED INFORMATION, ANALYTICS RELIES ON THE SIMULTANEOUS APPLICATION OF STATISTICS, COMPUTER PROGRAMMING AND OPERATIONS RESEARCH TO QUANTIFY PERFORMANCE.  ORGANIZATIONS MAY APPLY ANALYTICS TO BUSINESS DATA TO DESCRIBE, PREDICT, AND IMPROVE BUSINESS PERFORMANCE. SPECIFICALLY, AREAS WITHIN ANALYTICS INCLUDE PREDICTIVE ANALYTICS, PRESCRIPTIVE ANALYTICS, ENTERPRISE DECISION MANAGEMENT, DESCRIPTIVE ANALYTICS, COGNITIVE ANALYTICS, BIG DATA ANALYTICS, RETAIL ANALYTICS, STORE ASSORTMENT AND STOCK-KEEPING UNIT OPTIMIZATION, MARKETING OPTIMIZATION AND MARKETING MIX MODELING, WEB ANALYTICS, CALL ANALYTICS, SPEECH ANALYTICS, SALES FORCE SIZING AND OPTIMIZATION, PRICE AND PROMOTION MODELING, PREDICTIVE SCIENCE, CREDIT RISK ANALYSIS, AND FRAUD ANALYTICS. SINCE ANALYTICS CAN REQUIRE EXTENSIVE COMPUTATION (SEE BIG DATA), THE ALGORITHMS AND SOFTWARE USED FOR 

In [4]:
text[10:25]

TextBlob(" is the discove")

Part-of-speech Tagging
---

In [5]:
##
## Part-of-speech Tagging (POS-tag)
##
##    TAG    Descripción                            Ejemplo
##    -------------------------------------------------------------------------
##    CC     Coordination conjuntion                and, or
##    CD     Cardinal number                        one, two, 3
##    DT     Determiner                             a, the
##    EX     Existential there                      there were two cars 
##    FW     Foreign word                           hola mundo cruel 
##    IN     Preposition/subordinating conjunction  of, in, on, that
##    JJ     Adjective                              quick, lazy
##    JJR    Adjective, comparative                 quicker, lazier
##    JJS    Adjective, superlative                 quickest, laziest
##    NN     Noun, singular or mass                 fox, dog
##    NNS    Noun, plural                           foxes, dogs
##    NN PS  Noun, proper singular                  John, Alice  
##    NNP    Noun, proper plural                    Vikings, Indians, Germans
##    ...
## 
text.tags[:20]

[('Analytics', 'NNS'),
 ('is', 'VBZ'),
 ('the', 'DT'),
 ('discovery', 'NN'),
 ('interpretation', 'NN'),
 ('and', 'CC'),
 ('communication', 'NN'),
 ('of', 'IN'),
 ('meaningful', 'JJ'),
 ('patterns', 'NNS'),
 ('in', 'IN'),
 ('data', 'NNS'),
 ('Especially', 'RB'),
 ('valuable', 'JJ'),
 ('in', 'IN'),
 ('areas', 'NNS'),
 ('rich', 'VBP'),
 ('with', 'IN'),
 ('recorded', 'JJ'),
 ('information', 'NN')]

Frases Nominales
---

In [6]:
##
## Noun phrase extraction
##
text.noun_phrases

WordList(['analytics', 'meaningful patterns', 'especially', 'analytics relies', 'simultaneous application', 'operations research', 'quantify performance', 'organizations', 'business data', 'business performance', 'specifically', 'predictive analytics', 'prescriptive analytics', 'enterprise decision management', 'descriptive analytics', 'cognitive analytics', 'data analytics', 'retail analytics', 'store assortment', 'unit optimization', 'web analytics', 'speech analytics', 'sales force', 'predictive science', 'credit risk analysis', 'fraud analytics', 'extensive computation', 'big data', 'analytics harness', 'current methods', 'computer science'])

Analisis de sentimientos
---

In [7]:
##
## Sentiment Analysis
##
text.sentiment

Sentiment(polarity=0.171875, subjectivity=0.4604166666666667)

In [8]:
TextBlob("I am happy").sentiment

Sentiment(polarity=0.8, subjectivity=1.0)

In [9]:
TextBlob("I am very happy").sentiment

Sentiment(polarity=1.0, subjectivity=1.0)

In [10]:
TextBlob("I am very sad").sentiment

Sentiment(polarity=-0.65, subjectivity=1.0)

Palabras y sentencias
---

In [11]:
##
## Tokenization in words
##   Note que elimina los signos de puntuación
##
text.words

WordList(['Analytics', 'is', 'the', 'discovery', 'interpretation', 'and', 'communication', 'of', 'meaningful', 'patterns', 'in', 'data', 'Especially', 'valuable', 'in', 'areas', 'rich', 'with', 'recorded', 'information', 'analytics', 'relies', 'on', 'the', 'simultaneous', 'application', 'of', 'statistics', 'computer', 'programming', 'and', 'operations', 'research', 'to', 'quantify', 'performance', 'Organizations', 'may', 'apply', 'analytics', 'to', 'business', 'data', 'to', 'describe', 'predict', 'and', 'improve', 'business', 'performance', 'Specifically', 'areas', 'within', 'analytics', 'include', 'predictive', 'analytics', 'prescriptive', 'analytics', 'enterprise', 'decision', 'management', 'descriptive', 'analytics', 'cognitive', 'analytics', 'Big', 'Data', 'Analytics', 'retail', 'analytics', 'store', 'assortment', 'and', 'stock-keeping', 'unit', 'optimization', 'marketing', 'optimization', 'and', 'marketing', 'mix', 'modeling', 'web', 'analytics', 'call', 'analytics', 'speech', 'an

In [12]:
##
## Tokenization in sentences
##
text.sentences

[Sentence(" Analytics is the discovery, interpretation, and communication of meaningful patterns in data."),
 Sentence("Especially valuable in areas rich with recorded information, analytics relies on the simultaneous application of statistics, computer programming and operations research to quantify performance."),
 Sentence("Organizations may apply analytics to business data to describe, predict, and improve business performance."),
 Sentence("Specifically, areas within analytics include predictive analytics, prescriptive analytics, enterprise decision management, descriptive analytics, cognitive analytics, Big Data Analytics, retail analytics, store assortment and stock-keeping unit optimization, marketing optimization and marketing mix modeling, web analytics, call analytics, speech analytics, sales force sizing and optimization, price and promotion modeling, predictive science, credit risk analysis, and fraud analytics."),
 Sentence("Since analytics can require extensive computati

In [13]:
##
## Singulares
##
text.words[9], text.words[9].singularize()

('patterns', 'pattern')

In [14]:
##
## Plurales
##
text.words[3], text.words[3].pluralize()

('discovery', 'discoveries')

In [15]:
text.words.pluralize()

WordList(['Analyticss', 'iss', 'thes', 'discoveries', 'interpretations', 'ands', 'communications', 'ofs', 'meaningfuls', 'patternss', 'ins', 'datas', 'Especiallys', 'valuables', 'ins', 'areass', 'riches', 'withs', 'recordeds', 'information', 'analyticss', 'reliess', 'ons', 'thes', 'simultaneouss', 'applications', 'ofs', 'statisticss', 'computers', 'programmings', 'ands', 'operationss', 'research', 'toes', 'quantifies', 'performances', 'Organizationss', 'mays', 'applies', 'analyticss', 'toes', 'businesses', 'datas', 'toes', 'describes', 'predicts', 'ands', 'improves', 'businesses', 'performances', 'Specificallys', 'areass', 'withins', 'analyticss', 'includes', 'predictives', 'analyticss', 'prescriptives', 'analyticss', 'enterprises', 'decisions', 'managements', 'descriptives', 'analyticss', 'cognitives', 'analyticss', 'Bigs', 'Datas', 'Analyticss', 'retails', 'analyticss', 'stores', 'assortments', 'ands', 'stock-keepings', 'units', 'optimizations', 'marketings', 'optimizations', 'ands',

In [16]:
##
## Lemmatization
##
text.words[9], text.words[9].lemmatize()

('patterns', 'pattern')

Wordnet
---

In [17]:
##
## Wordnet integration.
##   Wordnet es una base de datos léxica, donde los sustantivos, verbos,
##   adverbios y adjetivos están agrupados en conjuntos de sinónimos
##   cognitios (synsets)
##
##
from textblob import Word

Word('wind').synsets

[Synset('wind.n.01'),
 Synset('wind.n.02'),
 Synset('wind.n.03'),
 Synset('wind.n.04'),
 Synset('tip.n.03'),
 Synset('wind_instrument.n.01'),
 Synset('fart.n.01'),
 Synset('wind.n.08'),
 Synset('weave.v.04'),
 Synset('wind.v.02'),
 Synset('wind.v.03'),
 Synset('scent.v.02'),
 Synset('wind.v.05'),
 Synset('wreathe.v.03'),
 Synset('hoist.v.01')]

In [18]:
##
## Synsets
##
from textblob.wordnet import Synset

Synset('wind.n.01').definition()

'air moving (sometimes with considerable force) from an area of high pressure to an area of low pressure'

In [19]:
##
## Iteración sobre los synsets usando definition()
##
for synset in Word('wind').synsets:
    print(synset.definition())

air moving (sometimes with considerable force) from an area of high pressure to an area of low pressure
a tendency or force that influences events
breath
empty rhetoric or insincere or exaggerated talk
an indication of potential opportunity
a musical instrument in which the sound is produced by an enclosed column of air that is moved by the breath
a reflex that expels intestinal gas through the anus
the act of winding or twisting
to move or cause to move in a sinuous, spiral, or circular course
extend in curves and turns
arrange or or coil around
catch the scent of; get wind of
coil the spring of (some mechanical device) by turning a stem
form into a wreath
raise or haul up with or as if with mechanical help


In [20]:
##
## Acceso directo a las definiciones
##
Word('wind').definitions

['air moving (sometimes with considerable force) from an area of high pressure to an area of low pressure',
 'a tendency or force that influences events',
 'breath',
 'empty rhetoric or insincere or exaggerated talk',
 'an indication of potential opportunity',
 'a musical instrument in which the sound is produced by an enclosed column of air that is moved by the breath',
 'a reflex that expels intestinal gas through the anus',
 'the act of winding or twisting',
 'to move or cause to move in a sinuous, spiral, or circular course',
 'extend in curves and turns',
 'arrange or or coil around',
 'catch the scent of; get wind of',
 'coil the spring of (some mechanical device) by turning a stem',
 'form into a wreath',
 'raise or haul up with or as if with mechanical help']

Corrección de textos
---

In [21]:
##
## Corrección de textos.
##   corrección de la frase
##
TextBlob("I havv goood speling!").correct()

TextBlob("I have good spelling!")

In [22]:
##
## Corrección de textos.
##   corrección de una palabra
##
Word("falibility").spellcheck()

[('fallibility', 1.0)]

Frecuencia de palabras
---

In [23]:
##
## Frecuencia de la palabras con word_counts
##
text.word_counts['analytics']

16

In [24]:
##
## Frecuencia usando count
##
text.words.count('analytics')

16

In [25]:
##
## Conteo sensitivo al caso
##
text.words.count('analytics', case_sensitive=True)

14

In [26]:
text.noun_phrases.count('analytics')

1

Parsing
---

In [27]:
##
## Parsing
##
for t in text.parse().split(' ')[0:15]:
    print(t)

Analytics/NNP/B-NP/O
is/VBZ/B-VP/O
the/DT/B-NP/O
discovery/NN/I-NP/O
,/,/O/O
interpretation/NN/B-NP/O
,/,/O/O
and/CC/O/O
communication/NN/B-NP/O
of/IN/B-PP/B-PNP
meaningful/JJ/B-NP/I-PNP
patterns/NNS/I-NP/I-PNP
in/IN/B-PP/B-PNP
data/NNS/B-NP/I-PNP
././O/O
Especially/RB/B-ADJP/O


N-gramas
---

In [28]:
##
## N-gramas
##
TextBlob("Now is better than never.").ngrams(n=3)

[WordList(['Now', 'is', 'better']),
 WordList(['is', 'better', 'than']),
 WordList(['better', 'than', 'never'])]