## TextBlob Example
https://textblob.readthedocs.io/en/dev/

### Installing it

In [1]:
!pip install -U textblob

Collecting textblob
[?25l  Downloading https://files.pythonhosted.org/packages/60/f0/1d9bfcc8ee6b83472ec571406bd0dd51c0e6330ff1a51b2d29861d389e85/textblob-0.15.3-py2.py3-none-any.whl (636kB)
[K     |████████████████████████████████| 645kB 299kB/s eta 0:00:01
Installing collected packages: textblob
  Found existing installation: textblob 0.15.1
    Uninstalling textblob-0.15.1:
      Successfully uninstalled textblob-0.15.1
Successfully installed textblob-0.15.3


### Downloading the dictionaries

In [3]:
!python -m textblob.download_corpora

[nltk_data] Downloading package brown to /home/rodrigo/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.
[nltk_data] Downloading package punkt to /home/rodrigo/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /home/rodrigo/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/rodrigo/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package conll2000 to
[nltk_data]     /home/rodrigo/nltk_data...
[nltk_data]   Unzipping corpora/conll2000.zip.
[nltk_data] Downloading package movie_reviews to
[nltk_data]     /home/rodrigo/nltk_data...
[nltk_data]   Unzipping corpora/movie_reviews.zip.
Finished.


In [19]:
# For linux users!
!ls ~/nltk_data/    

corpora  taggers  tokenizers


## Simple Example

TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.

In [8]:
from textblob import TextBlob

In [10]:
text = '''
The titular threat of The Blob has always struck me as the ultimate movie
monster: an insatiably hungry, amoeba-like mass able to penetrate
virtually any safeguard, capable of--as a doomed doctor chillingly
describes it--"assimilating flesh on contact.
Snide comparisons to gelatin be damned, it's a concept with the most
devastating of potential consequences, not unlike the grey goo scenario
proposed by technological theorists fearful of
artificial intelligence run rampant.
'''

In [23]:
blob = TextBlob(text)
blob.tags           # [('The', 'DT'), ('titular', 'JJ'),
                    #  ('threat', 'NN'), ('of', 'IN'), ...]
    
blob.noun_phrases   # WordList(['titular threat', 'blob',
                    #            'ultimate movie monster',
                    #            'amoeba-like mass', ...])

for sentence in blob.sentences:
    print(sentence.sentiment.polarity)
    print(sentence.sentiment.subjectivity)
    

#  0.06000000000000001
# -0.34166666666666673

0.06000000000000001
0.605
-0.34166666666666673
0.7666666666666666


### Polarity and subjectivity
https://planspace.org/20150607-textblob_sentiment/

In [28]:
TextBlob("not a very great calculation").sentiment
# Sentiment(polarity=-0.3076923076923077, subjectivity=0.5769230769230769)

Sentiment(polarity=-0.3076923076923077, subjectivity=0.5769230769230769)

This tells us that the English phrase “not a very great calculation” has a polarity of about -0.3, meaning it is slightly negative, and a subjectivity of about 0.6, meaning it is fairly subjective.

In [47]:
# import requests

# print('Beginning file download with requests')

# url = 'https://github.com/sloria/TextBlob/blob/eb08c120d364e908646731d60b4e4c6c1712ff63/textblob/en/en-sentiment.xml'
# r = requests.get(url)

In [32]:
TextBlob("great").sentiment
## Sentiment(polarity=0.8, subjectivity=0.75)

Sentiment(polarity=0.8, subjectivity=0.75)

In [43]:
TextBlob("not great").sentiment
## Sentiment(polarity=-0.4, subjectivity=0.75)

Sentiment(polarity=-0.4, subjectivity=0.75)

In [44]:
TextBlob("not very great").sentiment
## Sentiment(polarity=-0.3076923076923077, subjectivity=0.5769230769230769)

Sentiment(polarity=-0.3076923076923077, subjectivity=0.5769230769230769)

### XML File
https://medium.com/@robertopreste/from-xml-to-pandas-dataframes-9292980b1c1c

In [152]:
import numpy as np
import pandas as pd
import xml.etree.ElementTree as et

xtree = et.parse("./data/en-sentiment.xml")
xroot = xtree.getroot()

print(xroot)

<Element 'sentiment' at 0x7f6e6c03ff48>


#### Creating a dataframe of sentiments

In [153]:
df_columns = ['sense', 'polarity', 'subjectivity', 'intensity', 'confidence']
rows = []

for node in xroot: 
    s_sense = node.attrib.get("sense")
    s_polarity = node.attrib.get("polarity")
    s_subjectivity = node.attrib.get("subjectivity")
    s_intensity = node.attrib.get("intensity")
    s_confidence = node.attrib.get("confidence")
    
    rows.append({"sense": s_sense, "polarity": s_polarity, "subjectivity": s_subjectivity,
                "intensity": s_intensity, "confidence": s_confidence})

sentimentDataframe = pd.DataFrame(rows, columns = df_columns)
print("Dataframe has", sentimentDataframe.shape[0], "rows and", sentimentDataframe.shape[1], "columns")

Dataframe has 2918 rows and 5 columns


In [154]:
sentimentDataframe.head()

Unnamed: 0,sense,polarity,subjectivity,intensity,confidence
0,coming next after the twelfth in position,0.0,0.0,1.0,0.9
1,coming next after the nineteenth in position,0.0,0.0,1.0,0.9
2,coming next after the twentieth in position,0.0,0.0,1.0,0.9
3,coming next after the first in position in spa...,0.0,0.0,1.0,0.9
4,coming next after the second and just before t...,0.0,0.0,1.0,0.9


#### Inserting TextBlob results in the dataframe

In [155]:
blob_polarity_list = []
blob_subjectivity_list = []
for index, row in sentimentDataframe.iterrows():
    sense = str(row['sense'])
#     polarity = float(row['polarity'])
#     subjectivity = float(row['subjectivity'])
    blob_polarity, blob_subjectivity = TextBlob(sense).sentiment
    
    blob_polarity_list.append(blob_polarity),
    blob_subjectivity_list.append(blob_subjectivity)

sentimentDataframe['blob_polarity'] = blob_polarity_list
sentimentDataframe['blob_subjectivity'] = blob_subjectivity_list  

In [156]:
sentimentDataframe.head()  

Unnamed: 0,sense,polarity,subjectivity,intensity,confidence,blob_polarity,blob_subjectivity
0,coming next after the twelfth in position,0.0,0.0,1.0,0.9,0.0,0.0
1,coming next after the nineteenth in position,0.0,0.0,1.0,0.9,0.0,0.0
2,coming next after the twentieth in position,0.0,0.0,1.0,0.9,0.0,0.0
3,coming next after the first in position in spa...,0.0,0.0,1.0,0.9,0.125,0.166667
4,coming next after the second and just before t...,0.0,0.0,1.0,0.9,0.0,0.0


In [157]:
print("Dataframe has", sentimentDataframe.shape[0], "rows and", sentimentDataframe.shape[1], "columns")

Dataframe has 2918 rows and 7 columns


#### Changing column type 'polarity' and 'subkectivity' from string to float

In [158]:
sentimentDataframe['polarity'] = sentimentDataframe['polarity'].astype(float)
sentimentDataframe['subjectivity'] = sentimentDataframe['subjectivity'].astype(float)

#### Comparing XML data sentimento to the TextBlob outcome

In [161]:
compareSentiment = sentimentDataframe[
    (sentimentDataframe['polarity'] == sentimentDataframe['blob_polarity']) &
    (sentimentDataframe['subjectivity'] == sentimentDataframe['blob_subjectivity'])
]

print("Dataframe has", compareSentiment.shape[0], "rows and", compareSentiment.shape[1], "columns")

# a = sentimentDataframe['subjectivity'][0]
# b = sentimentDataframe['blob_subjectivity'][0]
# a == b

Dataframe has 276 rows and 7 columns


In [167]:
print("From", sentimentDataframe.shape[0], "rows, the TextBlob matches", compareSentiment.shape[0], "rows.")

From 2918 rows, the TextBlob matches 276 rows.


### Translating with TextBlob

#### Translating to spanish

In [12]:
blob.translate(to="es")  # 'La amenaza titular de The Blob...'

TextBlob("La amenaza titular de The Blob siempre me ha parecido la mejor película.
monstruo: una masa insaciablemente hambrienta, similar a una ameba, capaz de penetrar
prácticamente cualquier salvaguarda, capaz de, como un médico condenado, escalofriante
lo describe - "asimilando carne al contacto.
Malditas comparaciones con la gelatina, maldita sea, es un concepto con la mayoría
devastador de posibles consecuencias, no muy diferente del escenario de la sustancia gris
propuesto por teóricos tecnológicos temerosos de
la inteligencia artificial corre desenfrenada.")

#### Translating to brazilian protuguese

In [16]:
blob.translate(to="pt-br")

TextBlob("A ameaça titular de The Blob sempre me pareceu o melhor filme
monstro: uma massa insaciável de fome, semelhante a ameba, capaz de penetrar
praticamente qualquer salvaguarda, capaz de - como um médico condenado, arrepiante
descreve - "assimilando carne em contato.
Comparações sarcásticas com gelatina, é um conceito com as mais
devastador de possíveis consequências, não muito diferente do cenário de gosma cinzenta
proposto por teóricos tecnológicos temerosos de
inteligência artificial corre solta.")

#### Translating to german

In [17]:
blob.translate(to="de")

TextBlob("Die Titelbedrohung von The Blob hat mich immer als ultimativen Film empfunden
Monster: eine unersättlich hungrige, amöbenähnliche Masse, die eindringen kann
praktisch jede Schutzmaßnahme, die dazu in der Lage ist - als zum Scheitern verurteilter Arzt kühlend
beschreibt es - "Fleisch bei Kontakt assimilieren".
Snide Vergleiche mit Gelatine werden verdammt, es ist ein Konzept mit den meisten
Verheerende potenzielle Konsequenzen, ähnlich wie im Szenario der grauen Gänsehaut
vorgeschlagen von Technologietheoretikern ängstlich
künstliche Intelligenz grassiert.")

## ToDos
https://github.com/shangeth/NLTK-Twitter-Sentiment-Analysis/blob/master/server.py

https://stackabuse.com/python-for-nlp-movie-sentiment-analysis-using-deep-learning-in-keras/
    
https://www.kaggle.com/ngyptr/lstm-sentiment-analysis-keras
    
https://towardsdatascience.com/machine-learning-word-embedding-sentiment-classification-using-keras-b83c28087456