In [5]:
!pip install spacy

Defaulting to user installation because normal site-packages is not writeable


In [2]:
!python -m spacy download en_core_web_sm

Defaulting to user installation because normal site-packages is not writeable
Collecting en-core-web-sm==3.5.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.5.0/en_core_web_sm-3.5.0-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 12.8/12.8 MB 5.2 MB/s eta 0:00:00
[38;5;2m[+] Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [70]:
import spacy
nlp = spacy.load("en_core_web_sm")

In [71]:
sentences = [
  'The food we had yesterday was delicious',
  'My time in Italy was very enjoyable',
  'I found the meal to be tasty',
  'The internet was slow.',
  'Our experience was suboptimal'
]

### We are going to split our sentences in such a way as to obtain the aspect (ex: food) and its expression (ex: delicious)

For each token inside our sentences, we can see the dependency through spacy's dependency analysis and POS (Part-Of-Speech)tags
https://spacy.io/usage/linguistic-features

In [72]:
for sentence in sentences:
    doc = nlp(sentence)
    for token in doc:
        print(token.text, token.dep_, token.head.text, token.head.pos_,token.pos_,[child for child in token.children])

The det food NOUN DET []
food nsubj was AUX NOUN [The, had]
we nsubj had VERB PRON []
had relcl food NOUN VERB [we, yesterday]
yesterday npadvmod had VERB NOUN []
was ROOT was AUX AUX [food, delicious]
delicious acomp was AUX ADJ []
My poss time NOUN PRON []
time nsubj was AUX NOUN [My, in]
in prep time NOUN ADP [Italy]
Italy pobj in ADP PROPN []
was ROOT was AUX AUX [time, enjoyable]
very advmod enjoyable ADJ ADV []
enjoyable acomp was AUX ADJ [very]
I nsubj found VERB PRON []
found ROOT found VERB VERB [I, be]
the det meal NOUN DET []
meal nsubj be AUX NOUN [the]
to aux be AUX PART []
be ccomp found VERB AUX [meal, to, tasty]
tasty acomp be AUX ADJ []
The det internet NOUN DET []
internet nsubj was AUX NOUN [The]
was ROOT was AUX AUX [internet, slow, .]
slow acomp was AUX ADJ []
. punct was AUX PUNCT []
Our poss experience NOUN PRON []
experience nsubj was AUX NOUN [Our]
was ROOT was AUX AUX [experience, suboptimal]
suboptimal acomp was AUX ADJ []


Below is an example of dependency visualization in a sentence:

https://spacy.io/usage/visualizers

In [9]:
import spacy
from spacy import displacy
doc = nlp("The food we had yesterday was delicious")
displacy.serve(doc, style="ent")




Using the 'ent' visualizer
Serving on http://0.0.0.0:5000 ...

Shutting down server on port 5000.


By using the linguistic characteristics and in particular the POS, we will extract the adjectives as expression of sentiment 

In [73]:
for sentence in sentences:
    doc = nlp(sentence)
    descriptive_term = ''
    for token in doc:
        if token.pos_ == 'ADJ':
            descriptive_term = token
    #print(sentence)
    print(descriptive_term)

delicious
enjoyable
tasty
slow
suboptimal


As you can see, what's missing are intensifiers like "very" (we'll avoid adverbs). we will extract them using the children property.  

In [74]:
for sentence in sentences:
    doc = nlp(sentence)
    descriptive_term = ''
    for token in doc:
        if token.pos_ == 'ADJ':
            prepend = ''
            for child in token.children:
                if child.pos_ != 'ADV':
                    continue
                prepend += child.text + ' '
            descriptive_term = prepend + token.text
    print(sentence)
    print(descriptive_term)

The food we had yesterday was delicious
delicious
My time in Italy was very enjoyable
very enjoyable
I found the meal to be tasty
tasty
The internet was slow.
slow
Our experience was suboptimal
suboptimal


We'll put that in a dictionary list

In [75]:
aspects = []
for sentence in sentences:
    doc = nlp(sentence)
    descriptive_term = ''
    target = ''
    for token in doc:
        if token.dep_ == 'nsubj' and token.pos_ == 'NOUN':
            target = token.text
        if token.pos_ == 'ADJ':
            prepend = ''
            for child in token.children:
                if child.pos_ != 'ADV':
                    continue
                prepend += child.text + ' '
            descriptive_term = prepend + token.text

    aspects.append({'aspect': target, 'description': descriptive_term})

print(aspects)

[{'aspect': 'food', 'description': 'delicious'}, {'aspect': 'time', 'description': 'very enjoyable'}, {'aspect': 'meal', 'description': 'tasty'}, {'aspect': 'internet', 'description': 'slow'}, {'aspect': 'experience', 'description': 'suboptimal'}]


### using TextBlob for sentiment extraction

In [19]:
!pip install TextBlob

Defaulting to user installation because normal site-packages is not writeable
Collecting TextBlob
  Downloading textblob-0.17.1-py2.py3-none-any.whl (636 kB)
     -------------------------------------- 636.8/636.8 kB 2.5 MB/s eta 0:00:00
Installing collected packages: TextBlob
Successfully installed TextBlob-0.17.1


TextBlob is a library that offers out-of-the-box sentiment analysis. It has a bag of words approach, which means it has a list of words such as “good”, “bad” and “excellent” that have a sentiment score attached to them. It is also able to select modifiers (such as “not”) and intensifiers (such as “very”) that affect the sentiment score. 

In [76]:
from textblob import TextBlob
for aspect in aspects:
    aspect['sentiment'] = TextBlob(aspect['description']).sentiment
print(aspects)

[{'aspect': 'food', 'description': 'delicious', 'sentiment': Sentiment(polarity=1.0, subjectivity=1.0)}, {'aspect': 'time', 'description': 'very enjoyable', 'sentiment': Sentiment(polarity=0.65, subjectivity=0.78)}, {'aspect': 'meal', 'description': 'tasty', 'sentiment': Sentiment(polarity=0.0, subjectivity=0.0)}, {'aspect': 'internet', 'description': 'slow', 'sentiment': Sentiment(polarity=-0.30000000000000004, subjectivity=0.39999999999999997)}, {'aspect': 'experience', 'description': 'suboptimal', 'sentiment': Sentiment(polarity=0.0, subjectivity=0.0)}]


looking at the results we can notice that the adjectives "tasty" and "suboptimal" are considered neutral. It looks like they are not part of TextBlob's dictionary and therefore not picked up.

TextBlob allows us to train a NaiveBayesClassifier using a very simple and easy-to-understand syntax for everyone, which we will use to improve our sentiment analysis. 

Thus, we will perform a Corpus-Based Sentiment Lexicon Acquisition using TextBlob 

In [22]:
!python -m textblob.download_corpora

Finished.


[nltk_data] Downloading package brown to
[nltk_data]     C:\Users\PC\AppData\Roaming\nltk_data...
[nltk_data]   Package brown is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\PC\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\PC\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\PC\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package conll2000 to
[nltk_data]     C:\Users\PC\AppData\Roaming\nltk_data...
[nltk_data]   Package conll2000 is already up-to-date!
[nltk_data] Downloading package movie_reviews to
[nltk_data]     C:\Users\PC\AppData\Roaming\nltk_data...
[nltk_data]   Package movie_reviews is already up-to-date!


In [77]:
from textblob.classifiers import NaiveBayesClassifier
# We train the NaivesBayesClassifier
train = [
  ('Slow internet', 'negative'),
  ('Delicious food', 'positive'),
  ('Suboptimal experience', 'negative'),
  (' enjoyable time', 'positive'),
  ('delicious food.', 'negative')
]
cl = NaiveBayesClassifier(train)# And then we try to classify some sample sentences.
blob = TextBlob("Delicious food. Very Slow internet. Suboptimal experience", classifier=cl)
for s in blob.sentences:
    print(s)
    print(s.classify())

Delicious food.
positive
Very Slow internet.
negative
Suboptimal experience
negative


We will now redo our classification using the trainer model

In [78]:
from textblob import TextBlob
for aspect in aspects:
    blob = TextBlob(aspect['description'], classifier=cl)  
    aspect['sentiment'] = blob.classify()
print(aspects)

[{'aspect': 'food', 'description': 'delicious', 'sentiment': 'negative'}, {'aspect': 'time', 'description': 'very enjoyable', 'sentiment': 'positive'}, {'aspect': 'meal', 'description': 'tasty', 'sentiment': 'negative'}, {'aspect': 'internet', 'description': 'slow', 'sentiment': 'negative'}, {'aspect': 'experience', 'description': 'suboptimal', 'sentiment': 'negative'}]


# To DO:

1. Try on other sentences using the classifier 

In [149]:
!pip install vaderSentiment

Defaulting to user installation because normal site-packages is not writeable


##### NaiveBayesClassifier

In [85]:
from textblob import TextBlob
from textblob.classifiers import NaiveBayesClassifier,DecisionTreeClassifier

train = [
    ('I love the weather.', 'positive'),
    ('it was an amazing show', 'positive'),
    ('I feel very good today.', 'positive'),
    ('I do not agree', 'negative'),
    ('I am tired of this stuff.', 'negative'),
    ("I can't deal with this", 'negative'),
    ("My boss is horrible.", "negative")
]

cl = NaiveBayesClassifier(train)
sentiment = cl.classify("I feel amazing!")
print(f"'I feel amazing!': {sentiment}")

blob = TextBlob("I hate Mondays. But I love weekends.", classifier=cl)
for sentence in blob.sentences:
    print(f" {sentence}")
    print(f"Sentiment: {sentence.classify()}")
    print()

'I feel amazing!': positive
 I hate Mondays.
Sentiment: negative

 But I love weekends.
Sentiment: positive



###### DecisionTreeClassifier

In [134]:
from textblob import TextBlob
from textblob.classifiers import DecisionTreeClassifier

# Ensemble d'entraînement
train = [
    ('I love this movie.', 'positive'),
    ('The food was delicious.', 'positive'),
    ('The weather is beautiful today.', 'positive'),
    ('I hate Mondays.', 'negative'),
    ('This book is boring.', 'negative'),
    ('I feel sick.', 'negative')
]

# Entraînement du classificateur
cl = DecisionTreeClassifier(train)
blob = TextBlob("The weekend was good . But the monday is horrible.", classifier=cl)
for s in blob.sentences:
    print(s)
    print(s.classify())


The weekend was good .
positive
But the monday is horrible.
negative


In [137]:
mytest = [
    ('The movie was amazing!', 'positive'),
    ('I cannot stand the rain.', 'negative'),
    ('The meal was disappointing.', 'negative'),
    ('I enjoy outdoor activities.', 'positive')
]
cl.accuracy(test)

0.5

In [140]:
pip install scikit-learn numpy


Defaulting to user installation because normal site-packages is not writeableNote: you may need to restart the kernel to use updated packages.



####  MaxEntClassifier

Nous obtenons  une sortie indiquant les résultats de l'entraînement du classifieur.

À chaque itération, le "Log Likelihood" diminue, ce qui indique que le modèle s'améliore dans la prédiction des étiquettes. L'"Accuracy" est de 1.000 à chaque itération, ce qui signifie que le modèle prévoit correctement toutes les étiquettes de l'ensemble d'entraînement.

In [148]:
from textblob import TextBlob
from textblob.classifiers import MaxEntClassifier

train = [
    ('I love this movie.', 'positive'),
    ('The food was delicious.', 'positive'),
    ('The weather is beautiful today.', 'positive'),
    ('I hate Mondays.', 'negative'),
    ('This book is boring.', 'negative'),
    ('I feel sick.', 'negative')
]

cl = MaxEntClassifier(train)

blob = TextBlob("The weekend was good. But the Monday is horrible.", classifier=cl)
for s in blob.sentences:
    print(s)
    print(s.classify())


The weekend was good.
  ==> Training (100 iterations)

      Iteration    Log Likelihood    Accuracy
      ---------------------------------------
             1          -0.69315        0.500
             2          -0.59581        1.000
             3          -0.52044        1.000
             4          -0.46061        1.000
             5          -0.41223        1.000
             6          -0.37244        1.000
             7          -0.33923        1.000
             8          -0.31113        1.000
             9          -0.28710        1.000
            10          -0.26632        1.000
            11          -0.24820        1.000
            12          -0.23228        1.000
            13          -0.21818        1.000
            14          -0.20561        1.000
            15          -0.19436        1.000
            16          -0.18422        1.000
            17          -0.17504        1.000
            18          -0.16670        1.000
            19          -