
## textblob: otro módulo para tareas de PLN (NLTK + pattern)

textblob es una librería de procesamiento del texto para Python que permite realizar tareas de Procesamiento del Lenguaje Natural como análisis morfológico, extracción de entidades, análisis de opinión, traducción automática, etc.


In [8]:
from textblob import TextBlob

In [9]:
texto = '''In new lawsuits brought against the ride-sharing companies Uber and Lyft, the top prosecutors in Los Angeles 
and San Francisco counties make an important point about the lightly regulated sharing economy. The consumers who 
participate deserve a  very clear picture of the risks they're taking'''
t = TextBlob(texto)

In [12]:
print(t.word_counts)

defaultdict(<type 'int'>, {u'uber': 1, u'and': 2, u'lyft': 1, u'san': 1, u'lightly': 1, u'point': 1, u'participate': 1, u'an': 1, u'brought': 1, u'sharing': 1, u'regulated': 1, u'in': 2, u'make': 1, u'consumers': 1, u're': 1, u'deserve': 1, u'lawsuits': 1, u'los': 1, u'new': 1, u'economy': 1, u'picture': 1, u'francisco': 1, u'very': 1, u'who': 1, u'important': 1, u'they': 1, u'taking': 1, u'counties': 1, u'a': 1, u'about': 1, u'prosecutors': 1, u'of': 1, u'clear': 1, u'companies': 1, u'against': 1, u'ride-sharing': 1, u'angeles': 1, u'risks': 1, u'the': 5, u'top': 1})


In [13]:
print('Tenemos',len(t.sentences),'oraciones')

('Tenemos', 2, 'oraciones')


In [18]:
for linea in t.sentences:
    print(linea)

In new lawsuits brought against the ride-sharing companies Uber and Lyft, the top prosecutors in Los Angeles 
and San Francisco counties make an important point about the lightly regulated sharing economy.
The consumers who 
participate deserve a  very clear picture of the risks they're taking


In [20]:
#Palabras
print('Hay',len(t.words),'palabras en t')

('Hay', 46, 'palabras en t')




La propiedad .noun_phrases nos permite acceder a la lista de entidades (en realidad, son sintagmas nominales) incluídos en nuestro textblob.

In [23]:
print('t contiene',len(t.noun_phrases),'entidades')
for i in t.noun_phrases:
    print(i)

('t contiene', 8, 'entidades')
new lawsuits
uber
lyft
top prosecutors
los angeles
san francisco
important point
clear picture


In [24]:
#Singulares y plurales
for word in t.words:
    if word.endswith('s'):
          print(word.lemmatize(), word, word.singularize())
    else:
        print(word.lemmatize(), word, word.pluralize())

('In', 'In', 'Ins')
('new', 'new', 'news')
(u'lawsuit', 'lawsuits', 'lawsuit')
('brought', 'brought', 'broughts')
('against', 'against', 'againsts')
('the', 'the', 'thes')
('ride-sharing', 'ride-sharing', 'ride-sharings')
(u'company', 'companies', 'company')
('Uber', 'Uber', 'Ubers')
('and', 'and', 'ands')
('Lyft', 'Lyft', 'Lyfts')
('the', 'the', 'thes')
('top', 'top', 'tops')
(u'prosecutor', 'prosecutors', 'prosecutor')
('in', 'in', 'ins')
('Los', 'Los', 'Lo')
('Angeles', 'Angeles', 'Angele')
('and', 'and', 'ands')
('San', 'San', 'Sans')
('Francisco', 'Francisco', 'Franciscoes')
(u'county', 'counties', 'county')
('make', 'make', 'makes')
('an', 'an', 'some')
('important', 'important', 'importants')
('point', 'point', 'points')
('about', 'about', 'abouts')
('the', 'the', 'thes')
('lightly', 'lightly', 'lightlies')
('regulated', 'regulated', 'regulateds')
('sharing', 'sharing', 'sharings')
('economy', 'economy', 'economies')
('The', 'The', 'Thes')
(u'consumer', 'consumers', 'consumer')


In [25]:
# Análisis Sintáctico
t.parse()

u"In/IN/B-PP/B-PNP new/JJ/B-NP/I-PNP lawsuits/NNS/I-NP/I-PNP brought/VBN/B-VP/I-PNP against/IN/B-PP/B-PNP the/DT/B-NP/I-PNP ride-sharing/JJ/I-NP/I-PNP companies/NNS/I-NP/I-PNP Uber/NNP/I-NP/I-PNP and/CC/O/O Lyft/NNP/B-NP/O ,/,/O/O the/DT/B-NP/O top/JJ/I-NP/O prosecutors/NNS/I-NP/O in/IN/B-PP/B-PNP Los/NNP/B-NP/I-PNP Angeles/NNP/I-NP/I-PNP and/CC/I-NP/I-PNP San/NNP/I-NP/I-PNP Francisco/NNP/I-NP/I-PNP counties/NNS/I-NP/I-PNP make/VB/B-VP/O an/DT/B-NP/O important/JJ/I-NP/O point/NN/I-NP/O about/IN/B-PP/O the/DT/O/O lightly/RB/B-VP/O regulated/VBN/I-VP/O sharing/VBG/I-VP/O economy/NN/B-NP/O ././O/O\nThe/DT/B-NP/O consumers/NNS/I-NP/O who/WP/O/O participate/VB/B-VP/O deserve/VBP/I-VP/O a/DT/B-NP/O very/RB/I-NP/O clear/JJ/I-NP/O picture/NN/I-NP/O of/IN/B-PP/B-PNP the/DT/B-NP/I-PNP risks/NNS/I-NP/I-PNP they/PRP/I-NP/I-PNP '/POS/O/O re/NN/B-NP/O taking/VBG/B-VP/O"

## Traducción automática

A partir de cualquier texto procesado con TextBlob, podemos acceder a un traductor automático de bastante calidad con el método .translate.

In [26]:
# De Chino a Inglés
oracion_zh = u"中国探月工程 亦稱嫦娥工程，是中国启动的第一个探月工程，于2003年3月1日正式启动"
trans =TextBlob(oracion_zh)
print(trans.translate(from_lang="zh-CN", to="en"))

China lunar exploration project, also known as Chang'e project, is the first lunar exploration project launched in China, officially launched on March 1, 2003


In [27]:
# A Español
print(trans.translate(from_lang="zh-CN", to="es"))

El proyecto de exploración lunar de China, también conocido como proyecto Chang'e, es el primer proyecto de exploración lunar lanzado en China, lanzado oficialmente el 1 de marzo de 2003


In [29]:
#con el slang no funciona tan bien
print("--------------")
t_ita = TextBlob(u"Sono andato a Milano e mi sono divertito un bordello.")
print(t_ita.translate(to="es"))

--------------
Fui a Milán y disfruté de un burdel.


## WordNet

textblob, más concretamente, cualquier objeto de la clase Word, nos permite acceder a la información de WordNet.


In [30]:
from textblob import Word
from textblob.wordnet import VERB

In [35]:
# Los synsets de la palabra "hack" como verbo
hk = Word('hack')
hk.get_synsets(pos=VERB)

[Synset('chop.v.05'),
 Synset('hack.v.02'),
 Synset('hack.v.03'),
 Synset('hack.v.04'),
 Synset('hack.v.05'),
 Synset('hack.v.06'),
 Synset('hack.v.07'),
 Synset('hack.v.08')]

In [36]:
# imprime la lista de definiciones de "car"
Word('car').definitions

[u'a motor vehicle with four wheels; usually propelled by an internal combustion engine',
 u'a wheeled vehicle adapted to the rails of railroad',
 u'the compartment that is suspended from an airship and that carries personnel and the cargo and the power plant',
 u'where passengers ride up and down',
 u'a conveyance for passengers or freight on a cable railway']

In [39]:
# recorre la jerarquía de hiperónimos
for s in hk.synsets:
    print(s.hypernym_paths())

[[Synset('entity.n.01'), Synset('physical_entity.n.01'), Synset('causal_agent.n.01'), Synset('person.n.01'), Synset('unskilled_person.n.01'), Synset('hack.n.01')], [Synset('entity.n.01'), Synset('physical_entity.n.01'), Synset('object.n.01'), Synset('whole.n.02'), Synset('living_thing.n.01'), Synset('organism.n.01'), Synset('person.n.01'), Synset('unskilled_person.n.01'), Synset('hack.n.01')]]
[[Synset('entity.n.01'), Synset('physical_entity.n.01'), Synset('causal_agent.n.01'), Synset('person.n.01'), Synset('leader.n.01'), Synset('politician.n.02'), Synset('machine_politician.n.01')], [Synset('entity.n.01'), Synset('physical_entity.n.01'), Synset('object.n.01'), Synset('whole.n.02'), Synset('living_thing.n.01'), Synset('organism.n.01'), Synset('person.n.01'), Synset('leader.n.01'), Synset('politician.n.02'), Synset('machine_politician.n.01')]]
[[Synset('entity.n.01'), Synset('physical_entity.n.01'), Synset('causal_agent.n.01'), Synset('person.n.01'), Synset('communicator.n.01'), Synset

## ANÁLISIS DE OPINIÓN

In [41]:
opinion1 = TextBlob('This restaurant is great. We had too much fun!!')
print(opinion1.sentiment)

Sentiment(polarity=0.634375, subjectivity=0.475)


In [42]:
opinion2 = TextBlob("Google News to close in Spain.")
print(opinion2.sentiment)

Sentiment(polarity=0.0, subjectivity=0.0)


In [43]:
m = TextBlob("This is fucking awesome!! :-(")
print(m.sentiment)


Sentiment(polarity=0.125, subjectivity=1.0)


In [44]:
#  corrección ortográfica
b1 = TextBlob("I havv goood speling!")
print(b1.correct())

b2 = TextBlob("Miy naem iz Jonh!")
print(b2.correct())

I have good spelling!
In name in On!
