# Optional Exercise - Session 5 B

### Students: Nafis Banirazi & Jan Carbonell

### Lab Objective: 
Statement (unsupervised polarity system):
- Get the first synset (most frequent) of one of the next alternatives:
    - nouns, verbs, adjectives and adverbs
    - nouns, adjectives and adverbs
    - only adjectives
- Sum all the positive scores and negative ones to get the polarity
- Apply the system to the movie reviews corpus and give the accuracy
- Give some conclusions about the work

In [1]:
from nltk.corpus import movie_reviews as mr
from nltk.corpus import wordnet as wn
from nltk import pos_tag
from nltk.corpus import sentiwordnet as swn

In [2]:
def get_valid_pairs(original_pairs):
    valid_pairs = []
    for word, tag in original_pairs:
        if tag.startswith('N'):
            valid_pairs.append((word, wn.NOUN))
        elif tag.startswith('V'):
            valid_pairs.append((word, wn.VERB))
        elif tag.startswith('J'):
            valid_pairs.append((word, wn.ADJ))
        elif tag.startswith('R'):
            valid_pairs.append((word, wn.ADV))
    return valid_pairs

def unsupervised_polarity_system(words):
    pos = pos_tag(words)
    valid_pairs = set(get_valid_pairs(pos))
    
    polarity = 0
    for word, tag in valid_pairs:
        synsets = wn.synsets(word, tag)
        if len(synsets) > 0:
            synset = synsets[0]
            sentiSynset = swn.senti_synset(synset.name())
            polarity += sentiSynset.pos_score() - sentiSynset.neg_score()
    return polarity

In [3]:
neg_count = 0
for fid in mr.fileids('neg'):
    words = mr.words(fid)
    polarity = unsupervised_polarity_system(words)
    if(polarity<0):
        neg_count+=1
print('Total negatives: ', len(mr.fileids('neg')))
print('Correctly identified as negatives: ', neg_count)
print('Accuracy: ', neg_count/len(mr.fileids('neg')))
print()

pos_count = 0
for fid in mr.fileids('pos'):
    words = mr.words(fid)
    polarity = unsupervised_polarity_system(words)
    if(polarity>0):
        pos_count+=1
print('Total positives: ', len(mr.fileids('pos')))
print('Correctly identified as positives: ', pos_count)
print('Accuracy: ', pos_count/len(mr.fileids('pos')))

Total negatives:  1000
Correctly identified as negatives:  284
Accuracy:  0.284

Total positives:  1000
Correctly identified as positives:  881
Accuracy:  0.881


In [4]:
total_accuracy = (neg_count+pos_count)/(len(mr.fileids('neg'))+len(mr.fileids('pos')))
print(total_accuracy)

0.5825


## Conclusions
We are able predict positive and negative sentiments with a somewhat reasonable accuracy. It is relevant that not all negative words carry polarity and that perhaps, analyzing sentences as a whole is not the best method. Negatives seem much more harder to identify, possible due to the way the english language is structured, sentences are also constructed in that way (positive thoughts are shared openly while negative one's are often more subtle). 