# SentiWordNet

SentiWordNet is a lexical resource for opinion mining. It is built on top of WordNet and assigns 3 sentiment scores for each synset: positivity, negativity, and objectivity. Read [more about SentiWordNet here](http://sentiwordnet.isti.cnr.it/).

All WordNet synsets are scored with negative, positive and objective scores. SentiWordNet scores each synset with a value for each category between 0.0 and 1.0, with the condition that the sum of the three values is always 1.0.

SentiWordNet is integrated into NLTK and can be imported as follows, assuming that that corpus is installed on your system.

In [1]:
from nltk.corpus import sentiwordnet as swn

The following examples come from the NLTK howto [link](http://www.nltk.org/howto/).

### getting scores for a synset

The first example shows how to get positive, negative and objectivity scores for a given synset.

In [3]:
breakdown = swn.senti_synset('breakdown.n.03')
print(breakdown)
print("Positive score = ", breakdown.pos_score())
print("Negative score = ", breakdown.neg_score())
print("Objective score = ", breakdown.obj_score())

<breakdown.n.03: PosScore=0.0 NegScore=0.25>
Positive score =  0.0
Negative score =  0.25
Objective score =  0.75


### lookup sentisynsets

The next example shows how to look up sentisynsets for a word. Note that this returned all pos senses.

In [4]:
senti_list = list(swn.senti_synsets('slow')) 
for item in senti_list:
    print(item)

<decelerate.v.01: PosScore=0.0 NegScore=0.0>
<slow.v.02: PosScore=0.0 NegScore=0.0>
<slow.v.03: PosScore=0.0 NegScore=0.125>
<slow.a.01: PosScore=0.0 NegScore=0.0>
<slow.a.02: PosScore=0.0 NegScore=0.0>
<dense.s.04: PosScore=0.0 NegScore=0.25>
<slow.a.04: PosScore=0.0 NegScore=0.0>
<boring.s.01: PosScore=0.0 NegScore=0.25>
<dull.s.08: PosScore=0.0 NegScore=0.5>
<slowly.r.01: PosScore=0.0 NegScore=0.0>
<behind.r.03: PosScore=0.0 NegScore=0.0>


Next we restrict the returned words to adjectives. POS codes are:

n - NOUN 
v - VERB 
a - ADJECTIVE 
s - ADJECTIVE SATELLITE 
r - ADVERB 

In [5]:
slow_a = swn.senti_synsets('slow','a')
for item in slow_a:
    print(item)

<slow.a.01: PosScore=0.0 NegScore=0.0>
<slow.a.02: PosScore=0.0 NegScore=0.0>
<dense.s.04: PosScore=0.0 NegScore=0.25>
<slow.a.04: PosScore=0.0 NegScore=0.0>
<boring.s.01: PosScore=0.0 NegScore=0.25>
<dull.s.08: PosScore=0.0 NegScore=0.5>


In [6]:
wrinkle = swn.senti_synsets('wrinkle', 'v')
for item in wrinkle:
    print(item)


<purse.v.02: PosScore=0.0 NegScore=0.0>
<wrinkle.v.02: PosScore=0.0 NegScore=0.0>
<furrow.v.02: PosScore=0.0 NegScore=0.0>
<rumple.v.03: PosScore=0.0 NegScore=0.125>


In [7]:
late = swn.senti_synsets('late')
for item in late:
    print(item)

<late.a.01: PosScore=0.0 NegScore=0.0>
<belated.s.01: PosScore=0.125 NegScore=0.25>
<late.s.03: PosScore=0.0 NegScore=0.0>
<late.s.04: PosScore=0.125 NegScore=0.0>
<late.a.05: PosScore=0.0 NegScore=0.0>
<late.a.06: PosScore=0.0 NegScore=0.0>
<former.s.03: PosScore=0.0 NegScore=0.25>
<late.r.01: PosScore=0.125 NegScore=0.25>
<deep.r.02: PosScore=0.0 NegScore=0.0>
<late.r.03: PosScore=0.0 NegScore=0.0>
<recently.r.01: PosScore=0.0 NegScore=0.0>


In [8]:
from nltk.corpus import wordnet as wn
for ss in wn.synsets('late'):
    print(ss, ss.definition())

Synset('late.a.01') being or occurring at an advanced period of time or after a usual or expected time
Synset('belated.s.01') after the expected or usual time; delayed
Synset('late.s.03') of the immediate past or just previous to the present time
Synset('late.s.04') having died recently
Synset('late.a.05') of a later stage in the development of a language or literature; used especially of dead languages
Synset('late.a.06') at or toward an end or late period or stage of development
Synset('former.s.03') (used especially of persons) of the immediate past
Synset('late.r.01') later than usual or than expected
Synset('deep.r.02') to an advanced time
Synset('late.r.03') at an advanced age or stage
Synset('recently.r.01') in the recent past


In most sentiment analysis systems, information from SentiWordNet will be used as features in a machine learning approach, but it is often used without machine learning as well.

### extracting polarity from tokens

The following example is a naive approach that uses the first synset. 

In [9]:
p = list(swn.senti_synsets('pain'))[0]
print("negative: ", p.neg_score())
print("positive: ", p.pos_score())
print("objective: ", p.obj_score())

negative:  0.75
positive:  0.0
objective:  0.25


In [10]:
sent = 'that was the worst movie ever'
neg = 0
pos = 0
tokens = sent.split()
for token in tokens:
    syn_list = list(swn.senti_synsets(token))
    if syn_list:
        syn = syn_list[0]
        neg += syn.neg_score()
        pos += syn.pos_score()
    
print("neg\tpos counts")
print(neg, '\t', pos)

neg	pos counts
1.0 	 0.0
